Despite the impressive capabilities of large language models (LLMs) like GPT-4o and Claude at writing essays and solving equations in seconds, they are still imperfect. The latest example, which has become a viral meme, demonstrates that these seemingly omniscient AIs cannot correctly count the number of r’s in the English word “strawberry.”
The problem lies in the LLM architecture, which is based on transformers. They break up text into tokens, which can be full words, syllables, or letters, depending on the model. “LLMs are based on this Transformers architecture, which essentially doesn’t read text. When you enter a query, it is converted into an encoding,” explains Matthew Guzdial, an artificial intelligence researcher and associate professor at the University of Alberta, in an interview with TechCrunch. That is, when the model sees the article “the,” it only has one encoding of the meaning of “the,” but it knows nothing about each of the three letters individually.
Transformers cannot effectively process and output actual text. Instead, the text is converted into numerical representations, which are then contextualized to help the AI create a logical response. In other words, the AI may know that the tokens “straw” and “berry” make up “strawberry”, but it does not understand the order of the letters in that word and cannot count the number of letters. If you ask GPT, “how many times does the letter R appear in the word strawberry,” the bot will give the answer “3.”
«It’s difficult to define what exactly should count as a word for a language model, and even if we gathered experts to agree on an ideal dictionary of tokens, models would probably still find it useful to break words into even smaller parts, explains Sheridan Feucht ), a graduate student at Northeastern University (Massachusetts, USA) studying LLM interpretability. “I think there is no perfect tokenizer because of this vagueness.” Feucht believes it would be better to allow models to parse characters directly without imposing tokenization, but notes that this is simply not computationally feasible for Transformers right now.
Things get even more complicated when the LLM is studying multiple languages. For example, some tokenization methods may assume that a space in a sentence always precedes a new word, but many languages such as Chinese, Japanese, Thai, Laotian, Korean, Khmer, and others do not use spaces to separate words. Google DeepMind developer Yennie Jun found in a 2023 study that some languages require 10 times more tokens than English to convey the same meaning.
While memes are circulating online about many AI models being unable to correctly spell or count the number of “r”s in the English word strawberry, OpenAI is working on a new AI product codenamed Strawberry that is expected to be even more more skilled at reasoning and will be able to solve The New York Times crossword puzzles, which require creative thinking, as well as solve highly complex mathematical equations.