The Google DeepMind team unveiled Genie 2, the second version of a fundamental AI model capable of generating new interactive digital environments, or game worlds, on the fly.
To recap, the original Genie was released in February and could generate 2D virtual worlds from synthesized images. The Genie 2 is capable of doing this in 3D and based on text commands.
The user can describe the desired world, select a suitable rendering and step into a new environment. At each step, a person/agent performs an action (moving a mouse, pressing a key on a keyboard), and Genie 2 simulates its consequences.
According to Google DeepMind, Genie 2 can generate sequential interactive worlds lasting about a minute, although most examples shown (see video below) last 10-20 seconds.
Compared to the first version of Genie 2:
- Can remember elements of the world that are not in the field of view;
- Can create environments with different perspectives (first or third person, isometric camera, and so on);
- Can create complex three-dimensional scenes;
- Can simulate a variety of interactions with objects, such as popping balloons, opening doors, or blowing up explosive barrels with a shot;
- Learned to animate different types of characters;
- Learned to model NPCs and interactions with them;
- Learned to simulate the effects of water, smoke, gravity, lighting, reflections;
- Learned to simulate an interactive environment based on real photographs.
According to Google DeepMind, Genie 2 demonstrates the potential of fundamental models of the world to create a variety of three-dimensional environments and speed up the training/testing of AI agents (like SIMA).
Google DeepMind clarifies that the research is at an early stage and requires significant improvements in the areas of agent capabilities and environment generation, but already sees Genie 2 as a solution to the structural problem of safely training AI agents.