AI software ChatGPT almost smart enough to pass tough medical exam

The artificial intelligence (AI) software programme ChatGPT, scored close to the passmark for a tough US medical exam, according to a new study whose results were published on Thursday.

The Californian start-up OpenAI launched the conversational robot last November and it has been a sensation ever since. It is easy to use and produces texts — essays, articles or even poems — on request.

For the study, published in the journal PLOS Digital Health, researchers from the AnsibleHealth company tested the software’s performance on an exam that medical students in the United States have to take. The exam covers various areas: scientific knowledge, clinical reasoning, bioethics, etc.

Called the United States Medical Licensing Examination (USMLE), this test is divided into three parts: the first one is taken after about two years of study, the second after four years, and the third is required to become a doctor.

ChatGPT scored between 52.4% and 75% correct

ChatGPT was tested on 350 of the 376 questions published on the USMLE website that were part of the June 2022 exam. Questions that used images had to be removed.

The questions were presented in three formats: open-ended questions (e.g. "What would be the diagnosis for this patient given the information presented?”), multiple choice ones without rationale (e.g.“What is the most appropriate next step in follow-up?”), and multiple choice with rationale (e.g. What is the most likely reason for the patient’s night-time symptoms? Explain your reasoning.”)

Two examiners scored the responses, and a third adjudicated the discrepancies between them.

The software scored between 52.4% and 75% correct. Typically, the score needed to pass the exam is 60%.

“ChatGPT comes close to the margin of success,” the study concluded.

'Exciting new develpoment in the field of AI'

Some external experts criticised the method used. The researchers could have introduced some degree of anonymisation by mixing human responses with those of the robot, said Nello Cristianini, professor of artificial intelligence at the University of Bath in the UK.

However, he described the work as “part of a series of exciting new developments in the field of artificial intelligence.”

According to Lucia Ortiz de Zarate, a researcher at the Autonomous University of Madrid, this study demonstrates “the potential of AI in the medical field.”

It “can be of great help to doctors when formulating diagnoses and prescribing treatments,” she said.

In late January, another study had shown that ChatGPT could pass the exams of a US law university — although it would finish last in the class.