Google’s DeepMind lab has unveiled two new AI models that will help robots “perform a wider range of real-world tasks than ever before.” Gemini Robotics is a vision-language-action model that can understand new situations without prior training. And Gemini Robotics-ER is described by the company as an advanced model that can “understand our complex and dynamic world” and control the robot’s movements.
Image source: Google DeepMind
Gemini Robotics is built on Gemini 2.0, the latest version of Google’s flagship AI model. According to Carolina Parada, head of robotics at Google DeepMind, Gemini Robotics “takes Gemini’s multimodal understanding of the world and brings it into the real world by adding physical actions as a new modality.”
The new model is particularly strong in three key areas that Google DeepMind says are necessary to create truly useful robots: versatility, interactivity, and dexterity. In addition to being able to generalize to new scenarios, Gemini Robotics is better at interacting with people and their environments. The model is able to perform very precise physical tasks, such as folding a piece of paper or opening a bottle.
«While we’ve made progress in each of these areas individually in the past, we’re now delivering [dramatically] increasing performance in all three areas with a single model,” Parada said. “This allows us to create robots that are more capable, more responsive, and more resilient to changes in their environment.”
The Gemini Robotics-ER model is designed specifically for roboticists. With its help, specialists can connect to existing low-level controllers that control the robot’s movements. As Parada explained using the example of packing a lunch box – there are objects on the table, you need to figure out where everything is, how to open the lunch box, how to take the objects and where to put them. This is the chain of reasoning that Gemini Robotics-ER follows.
The developers have paid serious attention to safety. Google DeepMind researcher Vikas Sindhwani explained how the lab uses a “layered approach” in which Gemini Robotics-ER models “learn to assess whether it is safe to perform a potential action in a given scenario.”
Google DeepMind has also developed a number of benchmarks and frameworks to help further safety research in the AI field. Most notably, last year the lab introduced the “Robot Constitution,” a set of rules inspired by Isaac Asimov’s “Three Laws of Robotics” in his 1942 short story “Round Dance.”
Google DeepMind is currently working with Apptronik to develop the “next generation of humanoid robots.” The lab has also made its Gemini Robotics-ER model available to “trusted testers,” including Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools.
«”We’re completely focused on creating intelligence that can understand the physical world and act in that physical world,” Parada said. “We’re very excited to use that in multiple incarnations and in multiple applications for us.”
Recall that in September 2024, researchers from Google DeepMind demonstrated a learning method that allows a robot to perform certain dexterity-requiring actions, such as tying shoelaces, hanging shirts, and even repairing other robots.
Nvidia has announced that it will partner with Microsoft to add previously announced neural shader…
Microsoft is gearing up to launch Copilot for Gaming, an AI-powered assistant that will act…
Biwin has expanded its range of Black Opal OC Lab Gold Edition DW100 RGB RAM…
Modders from the Orbifold Studios team, with the support of Nvidia, have announced a demo…
Houston-based Intuitive Machines' Athena lunar lander flipped on its side during its March 6 landing…
British Prime Minister Kier Starmer said that artificial intelligence could replace some civil servants. “No…