Modern AI models demonstrate impressive abilities in natural language processing and text generation. However, according to Yann LeCun, chief AI specialist at Meta✴, they do not yet have the capabilities of memory, thinking, planning and reasoning, as is typical for humans. They are just imitating these skills. According to the scientist, overcoming this barrier will require at least 10 years and the development of a new approach – “models of the world.”
Earlier this year, OpenAI introduced a new feature for the ChatGPT AI chatbot called “memory,” which allows the AI to “remember” previous interactions with a user. In addition to this, the company has released a new generation of AI models, GPT-4o, which displays the word “think” when generating responses. At the same time, OpenAI claims that its new products are capable of complex reasoning. However, according to LeCun, they only create the illusion of complex cognitive processes – these AI systems still lack a real understanding of the world.
Although such innovations may seem like a significant step towards the creation of artificial general intelligence (AGI), LeCun opposes the optimists in this field. In a recent speech at the Hudson Forum, he noted that the excessive optimism of Elon Musk and Shane Legg, co-founder of Google DeepMind, may be premature. According to LeCun, the creation of human-level AI may be decades, not years, despite optimistic forecasts of its imminent appearance.
LeCun emphasizes that to create AI that can understand the world around it, machines must not only remember information, but also have intuition, common sense, the ability to plan and reason. “Today’s AI systems, despite the claims of the most passionate enthusiasts, are not capable of any of these actions,” LeCun noted.
The reason for this is simple: large language models (LLMs) work by predicting the next token (usually a few letters or a short word), and modern AI models for images and videos predict the next pixel. In other words, LLMs are one-dimensional predictors, while models for images and videos are two-dimensional predictors. These models have achieved great success in making predictions in their dimensions, but they do not truly understand the three-dimensional world accessible to humans.
Because of this, modern AI cannot perform simple tasks that most people can do. LeCun compares the capabilities of AI to the way people learn: by the age of 10, a child can clean up after himself, and by 17, he can learn to drive a car. Both of these skills are learned in a matter of hours or days. At the same time, even the most advanced AI systems, trained on thousands or millions of hours of data, are not yet able to reliably perform such simple actions in the physical world. To solve this problem, LeCun proposes developing world models—mental models of how the world behaves that can perceive the world around us and predict changes in three-dimensional space.
Such models, he says, represent a new type of AI architecture. You can imagine a sequence of actions, and your model of the world will predict what impact that sequence will have on the world. Part of the advantage of this approach is that world models can handle significantly more data than LLMs. This, of course, makes them computationally intensive, which is why cloud providers are rushing to collaborate with AI companies.
World models are a big concept that several research labs are currently chasing, and the term is quickly becoming the new buzzword for attracting venture capital. A group of established AI researchers, including Fei-Fei Li and Justin Johnson, recently raised $230 million for their startup World Labs. The AI Godmother and her team are also confident that models of the world will lead to significantly smarter AI systems. OpenAI also calls its yet-to-be-released video generator Sora a world model, but does not disclose details.
LeCun introduced the idea of using world models to create human-level AI in his 2022 paper on object-oriented or goal-oriented AI, although he notes that the concept itself dates back more than 60 years. Briefly, the world model is loaded with basic representations of the environment (e.g., a video of an untidy room) and memory. Based on this data, the model predicts what the state of the world around us will be. She is then given specific goals, including the desired state (e.g., a clean room), and limits are set to ensure there is no potential harm to the person while achieving the goal (e.g., “when cleaning a room, do not harm the person”). After this, the world model finds the optimal sequence of actions to complete the assigned tasks.
World models are a promising concept, but according to LeCun, significant progress has not yet been made in their implementation. There are many extremely difficult problems that need to be solved to move forward from the current state of AI, and in his opinion, everything is much more complicated than it seems at first glance.