ChatGPT cannot yet outscore most law students on exams, new research suggests, but it can eke out a passing grade.
A quartet of law professors at the University of Minnesota used the popular artificial intelligence chatbot to generate answers to exams in four courses last semester, then graded them blindly alongside actual students' tests.
ChatGPT’s average C+ performance fell below the humans' B+ average, the authors said. If applied across the curriculum, that would still be enough to earn the chatbot a law degree — though it would be placed on academic probation at Minnesota, ranked as the 21st best law school in the country by US News & World Report.
"Alone, ChatGPT would be pretty mediocre law student," said lead study author Jonathan Choi, who collaborated with professors Kristin Hickman, Amy Monahan and Daniel Schwarcz.
"The bigger potential for the profession here is that a lawyer could use ChatGPT to produce a rough first draft and just make their practice that much more effective," he said.
Choi said he and many colleagues have now banned Internet use during in-class exams to eliminate the possibility of cheating with ChatGPT, though future exams may test their ability to effectively leverage artificial intelligence programs.
The wildly popular ChatGPT debuted in late November and is free for users. It generates sophisticated, human-like responses based on requests from users and mountains of data, including from legal texts.
Other legal academics have also been experimenting with the program. Suffolk University law dean Andrew Perlman co-authored a scholarly article with the program in December. Two other law professors had ChatGPT answer multiple-choice questions from the bar exam. It did not pass but performed better than expected.
[[nid:612274]]
The Minnesota law professors had ChatGPT take exams in torts, employee benefits, taxation, and aspects of constitutional law. The tests included a total of 95 multiple choice questions and 12 essay questions.
The chatbot generally did better on the essays than the multiple-choice questions, scoring in the 17th percentile of all students and the 7th percentile, respectively. But its essay performance was inconsistent.
“In writing essays, ChatGPT displayed a strong grasp of basic legal rules and had consistently solid organization and composition,” the authors wrote. “However, it struggled to identify relevant issues and often only superficially applied rules to facts as compared to real law students.”
The program scored higher on the multiple-choice questions than it would through pure chance, according to the report, but struggled to correctly answer questions involving math.
ChatGPT’s exam grades ranged from a high of a B in constitutional law to a low of C-in torts and taxation.