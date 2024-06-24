Researchers said the exams include not only multiple-choice sections, fill-in-the-blank questions and questions with only one correct answer, but also open-response questions such as those that call for writing a short essay based on a theme. Each answer sheet was reviewed by at least three tutors who were not informed of the special identity of the test-takers until they finished grading.

Graders commented that the AI tools appeared to be more capable of comprehending Chinese text written in a contemporary style, but they had a hard time understanding pre-modern, classical Chinese passages. Few of them were capable of using techniques such as quoting adages when writing articles.

“On the math test, their subjective responses tend to be disorganized and confusing, and the answer could be correct despite errors in the process. They also exhibited a strong memorization capability for formulas but were not able to swiftly apply them to problem-solving,” the graders said.

AI participants also had mediocre results during the preliminary round of the 2024 Alibaba Global Mathematics Competition. Organizers said this month that the average score of the 500-plus AI teams was 18 out of 120, and the highest score among them was only 34, compared with the highest human score of 113.