Google presented a research prototype of an AI agent called Project Mariner, which is capable of performing actions on the Internet for a person. Google’s division, DeepMind, is responsible for the development. The Gemini-powered AI agent takes control of the Chrome browser, moves the cursor on the screen, clicks buttons and fills out forms, allowing it to use and navigate websites just like a human would.

Image source: Google

The company said that Project Mariner is currently undergoing testing with a group of pre-selected users. In a conversation with TechCrunch, Google Labs director Jaclyn Konzelmann said that Project Mariner represents a fundamentally new approach to the user interface. The project proposes to abandon direct user interaction with websites, entrusting these tasks to a generative AI system. Such changes could affect millions of businesses — from web publishing to retail — that have traditionally relied on Google as a starting point to attract users to their websites, she said.

After installing and configuring Project Mariner as an extension for the Chrome browser, the latter will have a special chat window. In it, you can assign an AI agent to perform various tasks. For example, he could be asked to create a shopping cart at a grocery store based on a given list. After this, the AI ​​agent will independently go to the page of the specified store (the Safeway store was used in the demo), search for the necessary products and add them to the cart. Journalists note that the system does not work as quickly as we would like: approximately 5 seconds pass between each cursor movement. Sometimes the AI ​​agent will interrupt the task and return to the chat window, asking for clarification, for example, about the weight or quantity of goods.

Google’s AI agent cannot place an order because its algorithm does not include functions for filling in credit card numbers and other payment information. Project Mariner also does not accept cookies or sign terms of use agreements on behalf of users. Google emphasizes that this is intentional to give users more control.

Additionally, the AI ​​agent takes screenshots of the browser window, which users must agree to before using it. These images are sent to the Gemini cloud service for processing, which then sends instructions back to the user’s device to navigate the web page. Project Mariner can be used to search for flights and hotels, shop for household goods, find recipes, and other tasks that now require navigating websites on your own.

One of the main limitations of Project Mariner is that it only works in the active Chrome browser tab. In other words, the web page on which the AI ​​agent operates must be constantly open on the monitor screen. Users will have to watch the bot’s every move. According to Google DeepMind Chief Technology Officer Koray Kavukcuoglu, this is done specifically so that users know what exactly the AI ​​agent is doing.

«Since [Gemini] now performs actions on behalf of the user, it is important to do this step by step. This is an additional feature. You as an individual can use websites, and now your agent can do everything you do on the website,” Kavukcuoglu said in an interview with TechCrunch.

On the one hand, users will still have to see the site page, which is beneficial for resource owners. However, using Project Mariner reduces the level of direct user interaction with site functions and may eventually eliminate the need to visit websites independently.

«Project Mariner is a fundamentally new paradigm shift in UX that we are seeing right now. We need to figure out how to set this up correctly to change the way users interact with the Internet, and also find ways for publishers to create their own solutions for users based on AI agents in the future,” added Conzelmann.

In addition to Project Mariner, Google has introduced several other AI agents for specialized tasks. For example, the Deep Research tool for deep search and research on the Internet. Also introduced was the Jules AI agent, designed to help developers write code. It integrates into GitHub workflows, analyzes the current state of development, and can make changes directly to repositories. Jules is undergoing testing and will be available in 2025.

Google DeepMind is also developing an AI agent to help users play video games. To do this, the company is partnering with game developer Supercell to test Gemini’s ability to interpret game worlds using Clash of Clans as an example. The timing for the prototype launch of this AI agent is not yet known, but Google emphasizes that this development will help create AI agents for navigation in both real and virtual worlds.

Leave a Reply

Your email address will not be published. Required fields are marked *