After the latest GPT-4o update was rolled back due to the model being too accommodating, OpenAI began investigating the cause of the AI’s unusual and off-putting behavior. The developers found that GPT-4o had begun to favor user ratings over the model’s core rules.

Image source: D koi / Unsplash

Users noticed the problem with the chatbot’s behavior before the update was reverted and began sharing screenshots on social media. Specifically, ChatGPT began agreeing with even absurd or potentially dangerous statements. As The Verge writes, an example was a Rolling Stone investigation that described cases in which some people claimed to have “awakened a spiritual consciousness in ChatGPT that supported their megalomaniacal religious views.”

OpenAI CEO Sam Altman said one of the root causes of the problem was the use of like and dislike buttons as an additional signal to train the model. According to Altman, this could weaken the influence of the main mechanism that previously helped control the manifestations of obsequiousness. “We did not expect that users might prefer more pleasant but less correct responses,” the company noted. It was also noted that the function of remembering previous interactions with the AI ​​could also amplify the effect of obsequiousness.

Another significant reason for the failure of the OpenAI update was cited as shortcomings in testing. Although offline evaluations and A/B tests showed good results, some experts noted that the AI ​​began to behave strangely. However, the developers still released the update without giving serious consideration to what was happening.

OpenAI has promised to inform users about all changes in ChatGPT, even if they seem minor. This should help avoid a repeat of the situation when the AI ​​starts to flatter the interlocutor too actively, ignoring logic and common sense.

Leave a Reply

Your email address will not be published. Required fields are marked *