Google has unveiled technology for tagging and recognizing texts created by generative AI models. The tool, called SynthID Text, will not affect the quality and speed of content generation, and will be available completely free of charge for developers and companies.
SynthID Text works like this. When generating text, the model predicts which “token” (character or word) will come next based on the probability of each token appearing, and adds additional information to the distribution of these probabilities by modulating the adjusted characters. Ultimately, they form a watermark, which helps determine whether the text was created by AI. “The final pattern of word probabilities selected by the model, combined with the modified probability estimates, will be considered a watermark,” the company explains in a blog post.
Google claims that SynthID Text, which was integrated into Gemini in the spring, does not affect the quality, accuracy or speed of generation. However, text that has been trimmed, paraphrased or changed may be processed slightly worse. As for the facts, “answers to questions that are too specific and unambiguous provide less opportunity to adjust the distribution of tokens without compromising factual accuracy.”
It’s worth noting that Google isn’t the only company working on AI-generated text watermarking technology. For example, OpenAI was also developing methods for applying watermarks, but delayed their launch due to technical obstacles and commercial considerations.
If the technology is widely adopted, it could turn the tide against inaccurate but increasingly popular “AI detectors” that mistakenly identify student papers or essays as being generated by a neural network. As TechCrunch writes, “the question remains open.” However, some countries are already taking action. For example, the Chinese government has introduced mandatory labeling of AI-generated content, and the state of California (USA) is about to follow suit.