Upcoming NFL Playoff Bracket: Key Matchups and Betting Lines for the Conference Championships

NFL playoff bracket: Conference championship schedule and odds for next weekend This weekend's four NFL divisional playoff games offered an interesting mix of contrasts (Saturday) and similarities (Sunday). Play began outdoors Saturday in Kansas City's 20-degree weather with the Chiefs and Texans – two teams who managed just over three touchdowns a game in the
HomeSocietyStatistical Insights Reveal ChatGPT's Use in Chemistry Exam Cheating

Statistical Insights Reveal ChatGPT’s Use in Chemistry Exam Cheating

Research has uncovered methods to identify the use of ChatGPT for cheating on general chemistry multiple-choice exams through certain statistical techniques.
As generative artificial intelligence becomes increasingly incorporated into educational environments, much of the worry about its influence on cheating has predominantly centered on essays, open-ended exam questions, and similar narrative tasks. However, the use of AI tools like ChatGPT for deception in multiple-choice exams has not received much attention.

A chemist from Florida State University is part of a research collaboration that is reshaping our understanding of this form of cheating. Their findings demonstrate how specific statistical methods can detect the use of ChatGPT for cheating on general chemistry multiple-choice exams. This research was published in the Journal of Chemical Education.

“While many educators and researchers focus on identifying AI-assisted cheating in essays and open-ended responses, such as Turnitin AI detection, to our knowledge, this is the first effort proposing a method to detect it in multiple-choice exams,” stated Ken Hanson, an associate professor in the FSU Department of Chemistry and Biochemistry. “By analyzing performance differences between students and those who utilized ChatGPT in multiple-choice chemistry exams, we identified instances of ChatGPT usage across all exams with almost no false positives.”

This publication is the latest outcome of a seven-year collaboration between Hanson and machine learning engineer Ben Sorenson.

Hanson and Sorenson, friends since third grade, both obtained their undergraduate degrees from St. Cloud State University in Minnesota and continued their friendship into their professional lives. As a faculty member at FSU, Hanson grew interested in measuring how well his students absorbed the knowledge from lectures, courses, and laboratory work.

“I discussed this with Ben, who is excellent with statistics, computer science, and data analysis,” said Hanson, who is associated with a group of FSU faculty dedicated to enhancing student success in foundational STEM courses like general chemistry and college algebra. “He suggested we could utilize statistical tools to assess the effectiveness of my exams, and in 2017, we began analyzing them.”

The essence of the Rasch model is that a student’s likelihood of answering any test question correctly hinges on two factors: the question’s difficulty and the student’s ability. Here, ‘ability’ denotes the knowledge a student possesses and the components required to answer the question. This approach to evaluating exam results provides valuable insights, according to the researchers.

“Even though Ken and I are working from different locations, our collaboration has been incredibly smooth and efficient,” Sorenson remarked. “Our work offers substantial evidence when educators suspect that cheating could be occurring. What we were surprised by was how easy it was to recognize the patterns associated with artificial intelligence.”

Hanson obtained his Ph.D. in chemistry from the University of Southern California in 2010 and completed a postdoctoral fellowship at the University of North Carolina at Chapel Hill before joining the faculty at FSU’s chemistry department in 2013. His research group, the Hanson Research Group, concentrates on molecular photochemistry and photophysics, which involves studying light (photons) and how it interacts with molecules. A member of the American Chemical Society, Hanson has authored over 100 publications and holds more than a dozen patents.

The researchers gathered exam responses from FSU students over five semesters, input nearly 1,000 questions into ChatGPT, and analyzed the results. Average scores and basic statistics alone could not indicate ChatGPT-like behavior since certain questions were either always answered correctly or incorrectly by ChatGPT. Therefore, its overall scores appeared indistinguishable from those of students.

“The key aspect of ChatGPT is that it can produce content, but that doesn’t mean the content is correct,” noted Hanson. “It’s merely an answer generator. It attempts to present itself as knowledgeable, which might deceive someone unfamiliar with the material.”

By applying fit statistics, the researchers adjusted the ability parameters and reevaluated the results, discovering that the response pattern of ChatGPT was distinctly different from that of the students.

During exams, high-achieving students typically answer both difficult and easy questions correctly, whereas average students often get some difficult questions and most easy ones right. Low-achieving students usually can only answer easy questions correctly. However, in repeated attempts to take an exam, ChatGPT sometimes answered all the easier questions wrong and all the hard ones right. Hanson and Sorenson exploited these behavioral discrepancies to detect ChatGPT usage with nearly 100% accuracy.

The pair’s method of using Rasch modeling and fit statistics is applicable to any generative AI chatbots, which will exhibit their own distinctive patterns to help educators identify the utilization of these chatbots in completing multiple-choice exams.