Microsoft study shows AI is ‘so-so’ at fixing bugs in software code

A new study from Microsoft Research found that while AI can help developers write code, even the best models from OpenAI (o1) and Anthropic (Claude 3.7 Sonnet) can fix errors no more than half the time. The testing was conducted using the leading SWE-bench benchmark, which measures the ability of AI systems to write code.

Image source: AI generated

During the experiment, AI agents tried to solve 300 tasks to eliminate errors in code. The leader was the Claude 3.7 Sonnet model, which completed the task with a success rate of 48.4%, second place was taken by OpenAI o1 (30.2%), and third place was taken by o3-mini (22.1%). However, as you can see, even these figures are far from the level that could be expected from experienced human programmers. As TechCrunch explains, the main problem is that artificial intelligence still has a poor understanding of how to use available tools and interpret errors.

According to the authors of the study, the key obstacle remains the lack of data for training the models. “We strongly believe that training or retraining can make them better interactive debuggers,” they write. “However, this requires specialized data, such as a chain of records of all human interactions with AI debuggers.”

Currently, such data is scarce, which limits the models’ capabilities. For example, the popular Devin tool from startup Cognition Labs could only cope with three out of 20 coding tests for this reason. And while AI is actively used by companies like Google, according to CEO Sundar Pichai, a quarter of the code that is created with the help of artificial intelligence may actually introduce errors.

Tech leaders are skeptical about the complete automation of the programming profession. Bill Gates is confident that programming as a profession is certainly not going away. Replit CEO Amjad Masad, Okta CEO Todd McKinnon, and IBM CEO Arvind Krishna share a similar opinion.

Despite the obvious problems, interest in AI development tools continues to grow. Investors see potential for efficiency gains, but leading developers believe it’s too early to fully trust AI.

AddThis Website Tools
admin

Share
Published by
admin

Recent Posts

SnowRunner creators’ ‘revolutionary’ RoadCraft simulator earns ‘mixed’ reviews on Steam releaseSnowRunner creators’ ‘revolutionary’ RoadCraft simulator earns ‘mixed’ reviews on Steam release

SnowRunner creators’ ‘revolutionary’ RoadCraft simulator earns ‘mixed’ reviews on Steam release

As promised, the “revolutionary” construction simulator RoadCraft from Saber Interactive (SnowRunner, Expeditions: A MudRunner Game)…

4 hours ago
Google has taught Meet to translate speech on the fly while preserving intonation and tone of voiceGoogle has taught Meet to translate speech on the fly while preserving intonation and tone of voice

Google has taught Meet to translate speech on the fly while preserving intonation and tone of voice

Google unveiled a new live translation feature for its Google Meet video conferencing service at…

4 hours ago

CMF Phone 2 Pro Review: Still Surprising

Last year, Nothing introduced the first smartphone under its budget sub-brand CMF by Nothing. The…

4 hours ago

Google Chrome Will Start Automatically Changing Weak or Hacked Passwords, But Will Ask for Permission First

At Google I/O, the company announced a new feature in Chrome that will automatically update…

4 hours ago

The End of Silent AI Video: Google Unveils Veo 3, the First Video Generator with Sound

Google presented the latest AI model for generating videos from text descriptions, Veo 3, at…

4 hours ago

GTX 750 Ti is no longer enough for the game: Ubisoft announced the system requirements of Rainbow Six Siege X

Publisher and developer Ubisoft has revealed the system requirements for Tom Clancy's Rainbow Six Siege…

1 day ago