OpenAI has unveiled a “research version” of an AI agent that can independently perform various tasks on the Internet at the user’s request. For example, you can ask him to find airline tickets or pick up goods. The virtual assistant, called Operator, can visit and interact with web pages using text input, clicks, and scrolling.

Image source: OpenAI

This AI agent is based on the Computer-Using Agent model, which combines the visual perception capabilities of the GPT-4o model with “advanced reasoning through reinforcement learning”, allowing the AI ​​to interact with graphical interfaces. As The Verge writes, Operator analyzes the code of web pages and interacts with content through a virtual mouse and keyboard, which allows it to work without integration with the Application programming interface.

It is noteworthy that the AI ​​agent has the ability to self-correct and, if any difficulties arise, transfers control to the user. He will also need the person’s permission if it is necessary to enter confidential data, such as logins and passwords, including sending emails. OpenAI also emphasizes that Operator is designed to “reject malicious requests and block prohibited content.”

However, the company warns that the tool does not yet work perfectly. For example, there are certain difficulties with more complex interfaces, such as creating slide shows or managing a calendar.

The new AI agent is currently only available in the US for $200 per month ChatGPT Pro subscribers, but there are plans to expand access to Operator to other plans including Plus, Team and Enterprise in the future. The company also intends to integrate the capabilities of the new agent directly into ChatGPT to make it even more convenient.

Leave a Reply

Your email address will not be published. Required fields are marked *