Large language models (LLMs) have proven capable of generating new research ideas at the expert level. Moreover, according to a new study, these ideas turned out to be more original and interesting than those proposed by experts. This calls into question the uniqueness of human intelligence in the field of scientific innovation and opens new horizons for the development of AI in the scientific community.

Image source: NickyPe/Pixabay

Advances in large language models have sparked a wave of enthusiasm among researchers. It turns out that AI models such as OpenAI’s ChatGPT and Anthropic’s Claude are capable of independently generating and confirming new scientific hypotheses. It was believed that the creation of new knowledge and making scientific discoveries are the exclusive prerogative of humans, in contrast to the mechanical combination of AI knowledge from training data. However, having already supplanted humans in the areas of artistic expression, music, and programming, AI has now taken aim at science, showing the ability to generate research ideas that are, on average, newer than those proposed by scientists.

To test this hypothesis, research was conducted in the field of natural language processing (NLP). NLP is a field of AI that deals with communication between humans and AI in a language that both parties understand. It covers not only basic syntax, but also the nuances of language, understanding of context, and, more recently, even verbal tone and emotional nuances of speech. The study involved 100 NLP experts (PhDs and doctors from 36 different institutes), who entered into a kind of scientific competition with “idea agents” based on LLM. The goal was to find out whose research ideas would be more original, interesting and feasible.

To ensure the integrity of the experiment, 49 experts formulated ideas on 7 specific topics in the field of NLP, while an AI specially trained by the researchers generated ideas on the same topics. To motivate brainstormers to produce quality ideas, $300 was paid for each concept the experts proposed, and each of the top five human ideas received an additional $1,000. Once the project was completed, LLM was used to standardize the writing styles of each paper while maintaining the original content to even the odds and make the research as unbiased as possible.

Comparing the novelty of scientific ideas proposed by humans and AI under three conditions: human ideas, AI ideas, and human-edited AI ideas. Image Source: @ChengleiSi/X

All submitted papers were then reviewed by 79 external experts who blindly assessed all research ideas. The panel of experts submitted 298 reviews, giving each idea two to four independent reviews. The results were amazing. AI-generated ideas received statistically significantly higher ratings for novelty and excitement compared to human ideas. However, AI ideas were slightly lower in feasibility and slightly higher in effectiveness than human ideas, although these differences were not statistically significant.

The study also revealed some shortcomings in AI performance, such as a lack of diversity of ideas. Even with clear instructions not to repeat themselves, the AI ​​quickly forgot about it. Additionally, the AI ​​was unable to consistently test and evaluate ideas and received low scores for agreeing with human judgments. It is important to note that the study also revealed certain limitations in the methodology. In particular, assessing the “originality” of an idea, even by a group of experts, remains subjective, so it is planned to conduct a more comprehensive study in which ideas generated by both AI and humans will be fully formalized into projects, which will allow for a more in-depth study of their impact in real life. scenarios. However, the first results of the study are certainly impressive.

Compare the ratings of scientific ideas proposed by humans and AI according to five key criteria: novelty, exciting, feasibility, effectiveness and overall assessment. Image Source: @ChengleiSi/X

Today, when AI models, although becoming incredibly powerful tools, they still suffer from their unreliability and tendency to “hallucinate,” which in the context of a scientific approach that requires absolute accuracy and reliability of information becomes critical. By some estimates, at least 10% of scientific papers are now co-authored by AI. On the other hand, do not underestimate the potential of AI to accelerate progress in some areas of human activity. A striking example of this is DeepMind’s GNoME system, which in a few months has achieved the equivalent of about 800 years of research in materials science, generating the structure of about 380,000 new inorganic crystals, capable of revolutionizing a variety of fields.

AI is now the fastest growing technology humanity has ever seen, and so it is reasonable to expect that many of its shortcomings will be corrected within the next couple of years. Many AI researchers believe that humanity is approaching the birth of general superintelligence—the point at which general-purpose AI will surpass human expertise in virtually every field. The ability of AI to generate more original and exciting ideas than scientists can lead to a rethinking of the process of scientific discovery and the role of humans in it.

admin

Share
Published by
admin

Recent Posts

GPUs limit programming freedom, so more chips will appear in the field of AI – Lisa Su

GPUs, originally created for creating three-dimensional images, have performed well in the field of accelerating…

13 mins ago

Samsung Display will build an OLED display plant in Vietnam

South Korean electronics maker Samsung Display plans to invest $1.8 billion this year to build…

19 mins ago

Intel’s takeover by Qualcomm is unlikely to be approved by antitrust regulators, especially in China

Those wishing to believe in a successful outcome of Qualcomm's initiative to acquire Intel assets…

2 hours ago

AT&T reluctantly agreed to remove tens of tons of lead from the bottom of Lake Tahoe

US telecom operator AT&T has agreed to remove abandoned lead-sheathed cables that have led to…

3 hours ago