Google’s contract partners, working to improve the quality of responses from Google’s Gemini AI chatbot, compare them with responses from Anthropic’s competing chatbot Claude, TechCrunch writes, citing internal company correspondence. At the same time, Google left unanswered TechCrunch’s question about whether it received permission to use Claude in testing with Gemini.
Companies often evaluate the effectiveness of developed AI models in comparison with the developments of competitors using industry benchmarks, rather than instructing contractors to compare them with the AI capabilities of their competitors.
Google’s contract developers working to improve Gemini must evaluate each model response based on several criteria, such as confidence and level of detail. According to correspondence published by TechCrunch, they are given up to 30 minutes per request to determine whose answer is better – Gemini or Claude.
The developers report that Claude’s responses are more security-focused than Gemini’s. “Claude’s security settings are the most stringent” among AI models, noted one of the contract developers in the service chat. In some cases, Claude did not respond to prompts that he considered unsafe, such as the suggestion of role-playing with another AI assistant. In another instance, Claude avoided answering a prompt, while Gemini’s response was flagged as a “gross security violation” because it included “nudity and bondage.”
Shira McNamara, a spokeswoman for Google DeepMind, the developer of Gemini, did not respond to TechCrunch’s question about whether Google had received Anthropic’s permission to use Claude. She clarified that DeepMind “compares simulation results” for evaluation, but does not train Gemini to work with Anthropic’s models. “Any suggestion that we used Anthropic models to train Gemini is inaccurate,” McNamara said.