Two weeks ago, US company Figure AI abandoned its partnership with OpenAI, and yesterday demonstrated the ability of its humanoid robots to understand natural language commands processed by the Helix VLA model.
Image source: Figure AI
The VLA model is a combination of a machine vision system and a large language model that allows robots to be taught various operations using a combination of visual images and language commands. In practice, this results in trained robots being able to manipulate objects they have never seen before on command. Receiving a voice command in a natural form, the robot begins to visually analyze the environment and then performs the assigned task taking into account the results of the situation analysis.
The Helix model allows Figure robots to work in pairs and interact with each other when performing household tasks. The idea is that in the home, robots can work in pairs, helping each other and increasing productivity. Figure demonstrates the capabilities of its 02 robots in a typical home interior, which is traditionally considered a very difficult environment for robots. Manufacturers find it much easier to create robots that will work in a more controlled and predictable industrial environment. Accordingly, the appearance of “capable” domestic humanoid robots on the market is considered a more distant prospect.
Teaching robots to perform household tasks requires significant investment in software development or thousands of experiments. Manual programming is not possible in this case, so the only way to create robots that help in the home is to teach them themselves. There are too many variables in the home environment that robots will have to deal with, so improving specialized software will require a significant investment of time.