Researchers at the Massachusetts Institute of Technology (MIT) have developed their own method for teaching robots new skills. Instead of the standard set of focused data that is usually used to train robots, they used large data sets, thereby simulating the process of training large language models (LLMs).
According to researchers from MIT, imitation learning, when a robot learns from the actions of a person performing a particular task, may not be effective if the environment changes insignificantly. For example, the robot may have difficulties after training if it is placed in an environment with different lighting or objects.
In their work, the researchers used different LLMs, such as GPT-4, to improve the quality of brute-force learning. “In the field of language models, all data is just sentences. In robotics, given all the heterogeneity of data, if you want to do pre-training in a similar way, then you need a different architecture,” said Lirui Wang, one of the authors of the study.
The researchers have developed a new architecture called Heterogeneous Pretrained Transformers (HPT), which combines information from different sensors and different environments. The data collected in this way is combined into trainable models using a “transformer”. The end user only needs to specify the robot’s design, its configuration, and the skill it needs to learn.
«Our dream is to create a universal robot brain that you can download and use in your robot without any training. We are still in the early stages, but we are going to continue to work hard and hope that scaling will lead to a breakthrough in robotics, as it did with large language models,” said one of the authors of the study, David Held.