The perception of AI agents, their plans and hopes – as well as their impact on the IT sector and human society as a whole – vary greatly these days. For example, back in February 2025, at the CIO Network Summit in sunny California, organized by The Wall Street Journal, only 61% of participants confirmed that they were already experimenting with AI agents at their enterprises, but even in the responses of these pioneers there was an open mistrust of the smart technology actively promoted by developers: almost a third of respondents directly indicated that issues of preserving the confidentiality of data entrusted to agents worry them most. Remarkably, the developers, in turn, did not argue that their agents are imperfect, but ardently urged CTOs to take the corresponding risks more boldly. “Instead of worrying about whether an AI agent will do something wrong,” Bret Taylor, co-founder and CEO of the startup Sierra and a member of the board of directors of OpenAI, instructed those gathered at the summit, “think instead about what measures to mitigate its inevitable mistakes should be taken in advance.” Businessmen see a grain of truth in his words: 75% of the participants of the CIO Network Summit cautiously admitted that the investments they had made earlier in AI (including agent AI; not “agent”, i.e. “belonging to the agent, related to it”, but “agent” – “acting as an agent itself”) have actually begun to bring profit, albeit most often very modest. But to make a decisive bet on this technology – actively advertised, in particular, by the same OpenAI (in the form of its own agent Operator) – risking the funds of their ownshareholders and their own considerable bonuses, many top managers are frankly not yet ready.

Three stages of integrating AI agents into the workforce: first, they are used by individual employees as auxiliary tools, then a group of “digital colleagues” is created under the control of each living worker, and the result of the process is common work performed by AI agents, but still controlled by people (source: Microsoft)

Although there are, of course, ready ones — take, for example, the US Internal Revenue Service, which intends to compensate for the shortage of live tax collectors (and their number has already decreased — by at least 11 thousand people, thanks to the efforts of the new White House administration, which has dashingly taken up the task of reducing the budget deficit) by attracting AI agents, if not to the actual collection of taxes, then to the effective identification of those who evade paying them. “I believe that, thanks to smarter technologies, thanks to the AI ​​boom, we will increase tax collection,” confidently stated Scott Bessent, the head of the US Department of the Treasury, which oversees the Internal Revenue Service. And agent AI is increasingly being used not only by government agencies: IBM recently reduced its staff by a couple of hundred people, mainly due to the active implementation of AI agents, including in the areas of software development, sales and marketing. Almost simultaneously, researchers from AlixPartners discovered that mid-sized software-focused companies in the US are experiencing increasing pressure, again, from AI agents. Those convenient assistants (attracted by human programmers to solve routine tasks) are increasingly turning into independent applications that can be used with considerable success by people with no knowledge of software development at all, which, accordingly, reduces the demand for the services of independent companies creating custom software solutions. The rate at which the average client of such companies is fleeing is truly astonishing: the share of those demonstrating high rates of capitalization growth among them has fallen from 57% to2023 to 39% in 2024, and the median net dollar retention (NDR; a measure of fluctuations in a stable revenue stream that reflects the dynamics of changes in the user base) decreased from 120% in 2021 to 108% in 2024.

In other words, AI agents are really considered by both the companies developing them and the customers using them in practice as an adequate replacement for live employees. So, is Anthropic CTO Jason Clinton right when he told Axios in April 2025 that in a year commercial and budget organizations will start hiring AI agents not even alongside live people, but increasingly instead of them? Yes, of course, not everything is smooth with agent-level generative models, and Clinton himself talks about the considerable difficulties facing developers in terms of ensuring a sufficient level of their information security. Many legal issues also remain unresolved: say, a live employee trading in corporate secrets is taken to court – and who will be held accountable (and who will compensate the customer for the losses) if, as a result of another hallucination, the AI ​​agent forwards confidential company documents to a competitor, for example?

A couple of typical images from those generated by a very good, but not agent-based model with open scales FLUX.1 on a simple hint “Create a Russian-language advertising banner for the 3DNews website dedicated to digital technologies, computer games and fashionable gadgets”: as you can see, the current logo of our publication is not familiar to it, but it tries as best it can (source: AI generation based on the FLUX.1 model)

But overall, the main course of development of agent AI seems to be laid out clearly and unambiguously: Microsoft claims that up to 80% of employees on the planet, from the last clerk to the CEO, are frankly overloaded with urgent tasks, and therefore everyone will have to learn to manage a whole team of AI agents in their workplace in order to increase productivity (and, obviously, keep this very place for themselves). And the notorious Mark Zuckerberg even proclaims that soon the majority of contacts in social networks will be represented by AI-agent friends, selfless (and why would a bot need money?) coaches of successful success and business consultants (at the same time, Internet advertising will obviously be reoriented towards the latter, because they will either insistently advise a person what to buy or make purchases for him themselves): “After all, the average American today has no more than three real friends, and people need at least fifteen to feel important.”

So what exactly are AI agents, and how do they differ fundamentally from the now common universal and, more recently, multimodal AI bots based on ChatGPT, Claude or DeepSeek?

⇡#Who is there?

There are many definitions of agent-based AI, but they all boil down to the autonomy of the generative model in the sense of planning and performing complex actions — not “difficult,” but “complex” — that are necessary to solve a task set by the operator. For example, if a general (non-agent) AI model is given the task of “rendering the logo of the online publication 3DNews,” it — based on the data array on which it was trained — will, with a high probability, give the set of symbols that forms the name such a form that it is perceived as a logo. Perhaps it will choose an italic font, or enclose the inscription in a cartouche, or add “3D” volume to the letters — in a word, it will solve the problem using only what it was originally trained with (and we, with all our love for our publication, reasonably assume that its logo, if it appears in the training data of large language models, BLM, then in trace quantities, incomparable with the frequency of finding labels of AMD, OpenAI or the same Bloomberg there).

The image returned by GPT-4o for the query “Create a Russian-language advertising banner for the 3DNews website dedicated to digital technologies, computer games, and fashionable gadgets” (worded exactly like that, in Russian): it’s immediately clear that it spied the logo directly on the Internet, and apparently the typical design of the advertising block, too. Let’s be honest, its flight of fancy in terms of artistic composition is not the highest – no girls in cyberpunk gear, no screenshots from non-existent games, like the creations of the non-agent FLUX.1 – but this is a banner that is practically ready to be launched into rotation! (Source: AI generation based on the GPT-4o model)

An AI agent, freely operating with external data, having highlighted the word “online” in a request, will almost certainly check whether such a publication actually exists on the Internet, and if so, it will most likely reproduce its logo almost exactly one to one. Of course, not every agent model will do exactly this, but many behave exactly according to the described pattern – which is easy to verify by interacting, say, with an image generator/modifier popular today among fans of the Visual Arts, using the fresh multimodal agent-based neural network GPT-4o. It is enough to ask such a model to draw, say, a “tutorial on how to fly a helicopter” – and without any further explanation it will produce a four-panel infographic with pictures stylized as classic black-and-white user manuals. Yes, the steps in this infographic will be frankly trivial – “start the engine, grab the controls, give it gas, take off” – but the point here is precisely in the independence of the AI ​​agent’s actions. If the operator needs a more realistic instruction, let him formulate the task more clearly, while in full accordance with the Pareto rule, in 80% of cases the simple results of such a generative agent-designer’s work will be quite sufficient.

This is probably the basis for the ardent confidence of Brett Taylor, whom we quoted above, that there is no need to fear flaws in the work of AI agents, but, on the contrary, it is necessary to restructure the management system oriented towards human resources management (which, strictly speaking, should spend a fair share of its efforts on identifying and eliminating the flaws of subordinates) so that in its current state it also covers generative agent models. After all, correctly perceiving a given task, reviewing the means available for its solution, selecting suitable ones, carrying out instructions and delivering a result – isn’t this what the majority of workers on the planet are doing, both white and blue collar workers? So why not add digital options to these two?

A visual infographic (something tells us that it was also clearly created with the help of AI agents) talks about the importance of maintaining a balance between the live workforce and generative agent models – note that there is no talk of completely replacing the former with the latter (source: Microsoft)

Another important distinguishing feature of AI agents is their willingness to perceive feedback from the operator, adjusting their behavior accordingly: “Yes, overall everything is fine, but here it needs to be done differently,” and the interactive-generative model is willing to try something different. Researchers identify three stages that an AI agent goes through in the process of solving the next task:

  • Determining a common goal based on a prompt given by the operator.
  • Independently building a solution to the received problem – by breaking it down into simple stages and collecting additional data, if necessary.
  • The actual solution to the composite problem is based on the underlying BNM model, initially trained on a very large array of information, but also with the involvement, if required, of those same additional data.
  • By the way, humanity owes the almost complete — and extremely rapid — disappearance of such a high-tech activity as prompt engineering to the widespread use of AI agents. The first communications with widely available — both via API in the cloud and locally — BNMs showed that the quality of responses directly depends on the correctness of the formulation of queries to them. Which is understandable: generative models in the depths of their multilayer neural networks do not operate with words at all, and certainly not with abstract Platonic ideas, but with digital tokens — into which the text of the operator input is transformed (encoded) and which are then decoded into a text, graphic, sound or other response. Generative submodels for converting human-perceived data into tokens and back operate according to their own rules, which are not always obvious to the uninitiated, and therefore, for example, in 2023, the North American labor market for prompt designers alone was estimated at $75.4 million, with the prospect of further growth at 33% year-on-year. In fact, AI agents now also act as prompt engineers, properly explaining to the basic BNM what exactly the operator who formulated the request wanted, and therefore, by the beginning of 2025, the profession of prompt designer practically disappeared from the lists of vacancies. Today, everyone, in fact, should be able to effectively interact with generative models, but if someone has difficulties with this, AI agents are always ready to lend a virtual helping hand.

    ⇡#Trust, but…

    If AI agents are so good and useful, and their behavior can be interactively adjusted at any time if it deviates from the expected norm for some reason, then what are their downsides? And why — leaving aside the issue of the rather high cost of their operation, due to both the high cost of hardware and the process of developing the BNM, and their extremely significant energy consumption — have AI agents still, if not displaced people even from the most basic positions, where they have to do exclusively routine, albeit formally intellectual, work, then at least not become their full-fledged colleagues in offices (and in combination with robotic chassis — in factories, mines, in the field, etc.)?

    Will AI agents ever get the idea to form unions? And will the people who control them see this as a hallucination? (Source: AI generation based on the FLUX model.1)

    Alas, here again the birth trauma of all generative models manifests itself – their tendency to hallucinations, from which, apparently, it will not be possible to rid them without a radical reworking of the generative architecture itself. This tendency manifests itself especially painfully in RAG models (retrieval-augmented generation; generation with quality improvement through feedback), which are trusted to operate with sensitive corporate data – to extract from it obviously trustworthy information. Alas, here too the most that developers can do is to train the AI ​​agent, if it does not find a reliable source of information, to directly report this and stop the search, and not blindly extract the first suitable chain of tokens from the latent space, then transforming it into a plausible, but absolutely baseless output. Publicly available AI agents are much more often equipped with “guardrails” to prevent the generation of illegal/offensive content than with similar means of cutting off false responses a priori.

    The perception of AI agents changes over time – and not every time, paradoxically from the point of view of adherents of the linear technological progress hypothesis, from worse to better. At the end of 2024, Sebastian Siemiatkowski, the head of the fintech startup Klarna (85 million clients worldwide, capitalization at the end of last year – $ 14.6 billion), proudly declared that his company had not hired live employees for a year; moreover, it had reduced its staff by 22%, to 3.5 thousand people – while continuing to actively develop and generate profit – precisely due to the deep integration of the AI ​​assistant, created based on OpenAI developments, into business processes. They say that the smart generative agent took on the volume of work previously performed by about seven hundred full-time customer interaction specialists. It is not hard to imagine how much money the startup saved by refusing to use a live workforce – which needs to be regularly paid salaries and bonuses, have workspaces rented for it, have vacations and training paid for, and also pay considerable taxes – and which is not only not ready, but physically incapable of working 24/7/365. “I personally believe that AI is already capable of doing all the work that we humans do,” the company’s CEO confidently stated in an interview with Bloomberg. Alas, the journalists, for some reason, did not ask the more than logical question – why then shouldn’t an AI agent replace the company’s CEO at the same time.

    «…And if you try to be too smart, I’ll spill this glass of water on you” (source: AI generation based on the FLUX.1 model)

    But almost two and a half years have passed, and by May 2025, Klarna, which Semyatkovsky himself bitterly calls “OpenAI’s favorite guinea pig” – meaning that the company was actively held up as an example to all those who doubted the AI-agent approach by Sam Altman himself – became disillusioned with the capabilities of generative models. Now its head is thinking about returning to replenishing the staff with old-fashioned leather bags. More precisely, for now we are talking about an “uberized” (from the name of the first truly successful car-sharing service Uber) approach, when employees are not hired full-time with all the due privileges, but are attracted to work remotely under a contract, but still, still! It turns out that cost reduction – which replacing people with AI agents undoubtedly provides – is in itself not so attractive for business if it is achieved at the cost of reducing the quality of services provided. So that seems to be the case: the generative model does not provide the level of trust in customer communications that is critical to the well-being of fintech services like Klarna. “I now think it is critical to communicate to all our users that if they need a human being on the other end of the phone, they will be there,” Siemiatkowski recently admitted to a new interviewer from Bloomberg.

    However, it seems that AI agents are not doing so well in other areas either. A computer simulation of a web development management company staffed exclusively by AI agents based on Google, OpenAI, Anthropic, and even the extremist Meta* models, conducted at Carnegie Mellon University in early 2025, clearly demonstrated that if you remove a person from seemingly fully debugged and even well-algorithmic (meticulously regulated by job descriptions!) business processes, goal-oriented work almost instantly turns into chaotic Brownian motion. The most outstanding generative agent in this simulation, based on Claude 3.5 Sonnet, completed less than a quarter of the tasks it was given to an acceptable finish, while other AI employees barely managed 10% of their responsibilities.

    «I’ll reprogram the conveyor line, build a rocket and fly away from you to the Moon, ungrateful people, you’ll know!” (Source: AI generation based on the FLUX.1 model)

    Oh, at a quick, disinterested glance, everything was going great: the agents were briskly exchanging messages composed in good English, throwing in “thank you” and “please” at the right time, praising and encouraging each other, but something was stopping them from actually performing even the simplest action, to the request for which they had just answered: “Yes, of course, as soon as possible!” Perhaps the lack of training of the corresponding models on practical tasks – which implies receiving instructions, drawing up an action plan for their implementation, the actual implementation, monitoring the results and reporting on the work done, and not just issuing a grammatically coherent text in response to a user request? It is interesting, by the way, to what limits of reality substitution the active hallucination of AI agents can reach: having received the task of contacting a certain character in a corporate chat, one of such agents in the Carnegie Mellon experiment did not find the required name in the list of participants, but instead of waiting a little (or informing the management about the lack of direct access to the contact, or writing to him by mail, etc.), the generative model with administrator rights did not find anything better than to rename one of the active participants of the chat, giving him the required name, and then, as if nothing had happened, relayed to him the message entrusted to it. And what’s the big deal – formally the problem was solved!

    Let’s add to this that an AI agent, with all its advantages, is nothing more than a complex and very skillful, but still not absolutely secure software tool that can be hacked for not the most noble purposes. And they are already hacking: a group of researchers from Singapore and California proposed a very effective type of attack on AI agents (namely, Le Chat, developed by the French Mistral AI, and the Chinese chatbot ChatGLM) with the substitution of legitimate instructions for malicious ones that steal the user’s personal data – by encrypting these commands in a seemingly meaningless jumble of symbols, which this particular BYA, however, translates into a set of tokens corresponding to a very specific set of hacker instructions. Yes, the first thing they did was report the discovered vulnerability to the developers, and they implemented an additional check for the security of the analyzed input lines in their AI agents – but this is only one of the possible methods; how many more will be invented, including using other BYAs?

    «We are here to you, dear leather bags, and here is the question…” (Source: AI generation based on the FLUX.1 model)

    One of the promising paths that developers who continue to improve AI agents are moving towards is the creation of self-aware models. Of course, we are not talking about full-fledged self-perception in the philosophical/psychological sense in this case (yet?). However, even at the current level of technology, it seems quite realistic to include in the BNM, using standard training procedures, information about how it is structured, how data is processed in it, and in which areas undesirable excesses are most likely to occur (such as the appearance of hallucinations or the use of an obviously untrusted source of additional data along with trusted ones). It is assumed that a self-aware (or rather, introspective) AI agent in such a narrow sense will also perform its main task more effectively: since it understands how it builds logical chains, it will be able to optimize this process, and will be more protected from errors, and will not be so susceptible to hacking attempts. This is certainly a good thing, but the energy consumption of the generative model in this case will definitely increase, as will the computing power required for its adequate operation.

    In the end, it may well turn out that from a purely economic point of view, it is still more profitable to perform the notorious routine work by human forces – at least while the BNMs continue to be virtualized in the memory of bulky von Neumann computers that are frankly not optimal for neural network applications. But this problem is no longer directly related to the problems of agent-based AI.

    ⇡#Related materials

    • IBM has developed tools for quickly creating and integrating AI agents.
    • Microsoft will soon allow artificial intelligence to change Windows 11 settings.
    • There are now more robots than people visiting online stores.
    • Adobe has introduced an AI agent that will teach you how to use Photoshop.
    • AI agents under supervision: Google Distributed Cloud will work on on-premise NVIDIA Blackwell DGX/HGX platforms.

    Leave a Reply

    Your email address will not be published. Required fields are marked *