Scientific research may soon undergo fundamental changes – artificial intelligence has proven itself to be an unsurpassed human tool for analyzing unimaginable volumes of specialized literature. In the experiment, AI was able to evaluate fake and real scientific discoveries more accurately than human experts. This will make scientific research easier for humans by allowing machines to sift through tons of raw information in search of promising directions.
From the very beginning, generative AI developers (ChatGPT and others) have focused on the ability of large language models (LLMs) to answer questions by summarizing the vast data on which they were trained. Scientists from University College London (UCL) have set themselves a different goal. They wondered if LLMs could synthesize knowledge—extract patterns from scientific literature and use them to analyze new scientific papers? Experience has shown that AI has managed to outperform humans in the accuracy of grading peer-reviewed papers.
«Scientific progress is often based on trial and error, but every careful experiment requires time and resources. Even the most experienced researchers may miss important findings from the literature. Our work explores whether LLMs can identify patterns in large scientific texts and predict the results of experiments,” the authors of the work explain. It’s not hard to imagine that bringing AI into peer review will go far beyond simple knowledge discovery. This could prove to be a breakthrough in all areas of science, saving scientists time and money.
The experiment was based on the analysis of a package of scientific papers in neurobiology, but can be extended to any field of science. The researchers prepared many pairs of abstracts, consisting of one real scientific paper and one fake one – containing plausible but incorrect results and conclusions. The document pairs were reviewed by 15 general LLMs and 117 specifically selected human neuroscience experts. All of them had to separate real works from fake ones.
All LLMs outperformed the neuroscientists, with AI accuracy averaging 81% and human accuracy averaging 63%. When the best human experts analyzed the work, accuracy increased to 66%, but did not even come close to the accuracy of the AI. And when LLM was specifically trained on neuroscience data, prediction accuracy increased to 86%. The researchers say the discovery paves the way for a future in which human experts can collaborate with well-calibrated models.
The work done also shows that most of the new discoveries are not new at all. AI perfectly reveals this feature of modern science. Thanks to the new tool, scientists will at least know whether their chosen area of research is worth pursuing or whether it is easier to search for its results on the Internet.