A research team has developed a machine-learning algorithm capable of identifying up to 94% of fraudulent academic papers, which is almost twice as effective as typical data-mining methods.
With tools like ChatGPT and other generative AI capable of creating scientific articles that may appear legitimate—especially to those not deeply entrenched in specific research areas—how can we best discern the authentic from the phony?
Ahmed Abdeen Hamed, a visiting research fellow at Binghamton University, State University of New York, has designed an algorithm termed xFakeSci that can effectively spot up to 94% of fake articles, significantly outperforming standard data-mining techniques.
“My primary focus is biomedical informatics, but given my involvement with medical publications, clinical trials, online materials, and social media analysis, I constantly worry about the credibility of the information being disseminated,” said Hamed, who is affiliated with Luis M. Rocha’s Complex Adaptive Systems and Computational Intelligence Lab at George J. Klir Professor of Systems Science. “Biomedical articles were especially affected during the global pandemic, as some individuals shared misleading research.”
In a recent study featured in the journal Scientific Reports, Hamed and his co-author Xindong Wu, a professor at Hefei University of Technology in China, generated 50 fictitious articles for each of three prominent medical subjects—Alzheimer’s, cancer, and depression—and compared these with an equal number of genuine articles on those topics.
When requesting AI-generated papers from ChatGPT, Hamed stated, “I aimed to use the exact keywords that I utilized to gather literature from the [National Institutes of Health’s] PubMed database, ensuring a consistent basis for comparison. I suspected there might be identifiable patterns between the fictitious and authentic articles, though I was initially unsure what those patterns might be.”
After some trials, he programmed xFakeSci to examine two key characteristics of the writing. The first feature involves the presence of bigrams—pairs of words that frequently occur together, such as “climate change,” “clinical trials,” or “biomedical literature.” The second feature explores how these bigrams relate to other words and concepts within the text.
“One of the most noticeable findings was that the number of bigrams in the fake articles was quite limited, whereas the real papers were rich with bigrams,” Hamed noted. “Additionally, while the fake articles contained fewer bigrams, those they had were highly interconnected with other content.”
Hamed and Wu speculate that the difference in writing styles stems from the distinct objectives of human researchers compared to AI systems tasked with generating content on specific subjects.
“As ChatGPT has limitations in its knowledge, it attempts to persuade using prominent keywords,” Hamed explained. “A scientist’s role isn’t to make persuasive arguments; instead, a genuine research article reports transparently on experimental results and methodologies. ChatGPT tends to focus deeply on a single point, while real science encompasses a broader spectrum.”
To enhance xFakeSci, Hamed plans to widen the topic scope to determine if identifiable word patterns are consistent across other research fields, extending beyond medicine to include engineering, various scientific disciplines, and the humanities. He anticipates that as AI technology becomes more advanced, distinguishing between real and fake will become increasingly challenging.
“We will continually find ourselves in a reactive position unless we create something more comprehensive,” he remarked. “There is significant work ahead to uncover a general pattern or universal algorithm that remains effective, regardless of the generative AI version utilized.”
He pointed out that even with their algorithm successfully identifying 94% of AI-generated papers, this still allows for six out of every 100 fakes to slip through: “We must remain modest about our achievements. We have made a crucial step by increasing awareness.”