An algorithm can be defined as a specific method of solving a fairly strictly defined application problem, defined once and for all (in particular, by program code). The incredible popularity of artificial intelligence these days is due, to a large extent, precisely to the fact that it is capable of solving very vaguely defined problems – like “identify a person in a crowd captured by an external surveillance camera with not the most advanced optics based on a portrait photo taken in the studio,” “draw a funny cat” or “explain what’s wrong with this piece of Python code.” At the same time, the very principles of building AI systems are subject to very clearly defined patterns, known as methods (sometimes also called algorithms) of machine learning (ML). Nowadays, everyone is hearing about generative artificial intelligence – a way of organizing machine learning, implemented in such popular models and services as ChatGPT, Midjourney, Kling, etc. Why did it happen that all other machine learning algorithms turned out to be from the point of view of a broad public in the shadow of the generative – and will this situation change in the near future?
⇡#MO? II?
Strictly speaking, machine learning can be considered as a subsection of artificial intelligence in a broad sense – including, for example, such a sphere that is still far from practical implementation as “strong” AI, capable of independently formulating problems for itself and finding ways to solve them. The ML approach does not imply any kind of analytics on the part of the computer system, not to mention awareness (whatever that is meant in the application to computer emulation of a neural network) of the actions performed on the data. Machine learning is nothing more than the automated extraction of patterns from a large array of data according to certain rules. These rules themselves, in turn, are determined by the goals that the developers of this particular ML model set for themselves when preparing a training dataset for its training.
One of the most generally accepted divisions of problems solved by means of ML is dichotomous, i.e. strictly into two groups: either for classification (distinguishing by certain characteristics) of processed objects/entities, or for generation (generation – in particular, visual) of digital images such objects according to certain clues. Accordingly, the first broad category of models is called discriminative (dividing), the second – generative (creative). However, if the matter were limited to this, there would be no special reason to delve into the question further. It is intuitively clear that, say, a model that sorts photos of crocodiles and alligators into different folders is discriminative; the same one that is capable of creating, based on a lapidary hint, photorealistic images of a crocodile and an alligator that are clearly distinguishable by a zoologist; on the contrary, it is generative. The goals and objectives of these two types of systems are self-evidently different: discriminative models are applied to data obtained from somewhere outside to classify them; generative – on the contrary, to generate new information (the same pictures, videos, audio) according to a given template.
In any case, ML implies initial training, which comes down to shoveling by a computer system – digital emulation of a more or less complex neural network structure – an array of data in order to identify certain patterns in it. Identify what is important in an implicit way, that is, without their explicit expression, amenable to a harmonious and logical (algorithmic) description. It is interesting that in the living brain, implicit learning, as has been confirmed by neuroscientists in practice, is based on processes different from explicit (logical) thinking – and proceeds independently of it. Roughly speaking, a foreign language can be taught explicitly, offering words for memorization with their translations and explaining the subtleties of foreign syntax, or it can be taught implicitly, by the method of “complete immersion” in a foreign speech environment. Both methods will ultimately lead, as the student accumulates a sufficient amount of data, to a practical result; the quality and speed of mastering a foreign language in each case will be determined by the individual characteristics of that particular brain.
Since in its current state AI is in any case incapable of explicit reasoning, the strength of its generative models (more precisely, of course, it would be to talk about generative machine learning, but the term GenAI has already been established) lies precisely in the implicit nature of the implementation of all those wonderful abilities by them , which the general public has so sincerely admired for almost two years now. Namely: to create static and moving pictures, to compose music, to maintain a meaningful (from a human point of view, that is, informationally and emotionally rich) conversation in natural language, etc. Let us note that until recently, relatively live developers did not care considerable effort had to be made to mark up (accompanied by a comprehensive text description) the original data sets – those, for example, images on which the model was trained to convert cue words into visual images. However, today there has essentially been a transition from classical “supervised learning,” when a person had to personally index training data arrays fed to the model, to self-supervised learning, which uses implicit generation of labels for unstructured data. It is thanks to self-learning that the most advanced generative models, starting with GPT 3.5 (which became the basis for the sensational ChatGPT in the fall of 2022), have opened up previously inaccessible horizons for humanity – which, in fact, is confirmed by considerable excitement that has not subsided to this day.
In fact, the insufficient amount of resources (man-hours) that had to be spent manually marking training data sets held back the development of generative models for a long time. With the training of discriminative ones used to classify heterogeneous entities (cat – dog, motorcycle – car, etc.), everything is somewhat simpler: here it is enough for the operator conducting the training to mark the choice made by the ML system as correct or incorrect, thereby facilitating recalibration through feedback weights at the inputs of model perceptrons. Generative AI is capable of creating itself—more precisely, generating, starting from implicitly “captured” images—quite complex entities. A synthetic voice, for example, in timbre is almost (for now – almost) indistinguishable from that belonging to a particular person. Or a visual image of the same person, static or moving. Or a text written in a given manner and on a given topic. It’s clear that GenAI’s output of models is not without hallucinations—such is the nature of implicit “knowledge.” But the benefits of using self-learning ML systems are so significant that consciously accepting the possibility of them hallucinating in many cases seems a completely reasonable price to pay.
⇡#Alone, alone, alone
We already talked about how supervised learning for a discriminative model is implemented in practice in one of our previous materials on the topic of artificial intelligence. The input of the system—roughly speaking, a multilayer perceptron—is fed with an array of pre-labeled data: conditionally, cards with handwritten numbers, each of which is accompanied by the same number in machine-readable form. The system passes a pixelated handwritten image through its perceptrons and, in accordance with the weights available at their inputs, initially produces a certain result: a “guess” about what kind of figure was presented to it. The result is then (in this case automatically, although initially the cards were marked manually by the operator – this is important) is compared with the machine-readable value of a given number from the same card, and if the system did not work correctly, then through the process of backpropagation of the weights at the inputs of certain The perceptrons are adjusted a few times, after which the procedure is repeated again. And so on – until a specific implementation of the ML model learns to identify with acceptable accuracy all the handwritten digits of their training array. After that, you can provide her with numbers written in a different hand and with a different style – and with a fairly high probability she will also recognize them correctly.
This procedure, which is simple in description, is in fact fraught with a considerable number of problems – such, in particular, as under-adaptation and over-adaptation (underfitting and overfitting, respectively). In our example, the handwriting of the person who formed the training array may turn out to be so pretentious that, having learned to perfectly identify the numbers he wrote, the system will experience considerable difficulties in recognizing other examples. But in general, supervised discriminative learning is a reliable classic of ML: for example, long-used anti-spam filters for email are built on the basis of precisely such models, which also, ideally, undergo continuous additional training every time the next user clicks in the interface of your email client to the “This is spam” icon. In addition to assigning the presented entity to clearly defined categories (“crocodile – alligator”, “three – seven – ace”), which is usually characterized as classification, a discriminative model trained with a teacher can also produce values from a continuous series – say, estimate density of human flow (persons/min) at the entrance of a metro station depending on the time of day, date, weather conditions, etc.; then we are talking about solving a regression problem. Appropriate algorithms are used to build ML models specializing in classification and regression, and they find wide application in a wide variety of practical applications – in computer vision systems, for example.
However, as we have already noted, supervised learning has a significant disadvantage – it requires either the direct presence of an operator next to the trained system (in order to tell it whether it made the classification/regression correctly or not in each specific case), or preliminary marking the same training data array. Is it possible to train a ML model without a teacher? Yes, of course; and such a procedure – unsupervised learning – is also implemented by a whole bunch of algorithms. Unsupervised learning tasks fall into two broad groups: clustering and dimensionality reduction. Clustering involves assigning objects to certain classes – but, unlike classification in supervised learning, neither the number of these classes nor the specifics of each of them are initially specified. ML with clustering is especially in demand in trade and marketing, since it allows, for example, to stratify, for example, clients by their types of preferences and behavior patterns with good accuracy, and to do this implicitly – without considerable investment in preliminary market research using traditional means. Reducing the dimensionality is associated with algorithmic archiving problems that are classic for the computer industry, as well as with the principal component method known from mathematics – here the volume of input data accepted by the system for calculation is also reduced without compromising the result of their processing. An important application of ML models trained unsupervised using the dimensionality reduction method is the preprocessing of redundantly informative data sets to speed up the performance of other machine learning algorithms.
Let’s return now to self-supervised learning (SSL), which is often defined as a fairly fresh, hybrid approach to ML – using unsupervised learning to work on problems that were previously solved exclusively through supervised learning. In essence, self-learning involves the formation of a labeled dataset by the ML system itself to generate reinforcing feedback signals (supervisory signals), on the basis of which the model is trained. In other words, by analyzing an array of unlabeled data, the SSL model itself identifies features (labels) by which they can be ordered and which they then use to solve classification/regression problems.
Probably the most intelligible example of using SSL for training ML models dealing with text (not excluding the notorious ChatGPT) is selective masking of words in sentences. The initial data array is represented simply by texts taken in digital form – the main thing is that they were created by people and not by other ML systems (otherwise the likelihood of hallucinations in the output of a model trained on such an array will increase significantly). The model receives from itself as input sentences with selectively omitted words – and, passing them through a multilayer perceptron network, forms a “guess” about which word should appear in the place of the omission. Then it compares the source text with the generated one – and, if there is no match, it applies the standard backpropagation method to correct the weights, after which everything is repeated again. In the same way, you can self-train a model, for example, to draw in the style of a certain artist – in this case, fragments of his original paintings presented in the training array will be selectively shaded – or to compose music of a chosen genre (in this case, individual measures and their sequences are masked).
⇡#Well, don’t get torn apart
The similarity between SSL and unsupervised learning is obvious – in both cases, unlabeled data is used to train models, so the search for internal patterns and connections is implemented implicitly, without involving externally specified (and especially operator-verified) classifications. But the differences are no less clear: first of all, SSL has predictive power, albeit burdened by the possibility of hallucinations. For example, one of the widespread applications of unsupervised trained ML models is issuing recommendations to customers of online stores in the spirit of “They often buy this product…” – since such a system is able to quickly identify significant correlations between pairs of dissimilar products in a large array of data on completed purchases. at first glance commodity items. The use of the SSL model makes it possible for interactive machine interaction with each specific client: if across the entire sample, along with product A, product B is purchased significantly often, but this particular user has ignored the hint given to him by the system once and again, it is much wiser not to continue to break into open the door, irritating the client with intrusiveness, and offer some other product item in pair with A, with a lower correlation indicator – perhaps this option will work?
Thus, SSLs are similar to models trained with a teacher, since in the same way they appeal to certain fundamental principles for the training data set (the English term is ground truth), only not specified by a live operator, but selected from the input unlabeled data array implicitly. Auto-optimization of a self-learning model through backpropagation of errors is carried out in accordance with the same principles of gradient descent in a multidimensional space as for models learning with a teacher. This makes it possible to use SSL to solve problems of classification and regression – and, since the self-learning model searches for patterns in the training array implicitly, the categories it “captures” may either not at all correspond to those that would be operated by people marking the same array, or noticeably different from them differ. This, in fact, is one of the most important reasons for the incomprehensibility of the “logic” of SSL in general and generative AI in particular: formally, the system feels for some patterns in the source data and is guided by them in its further actions, but there is no way to somehow express them in an accessible way It does not have a form that is perceptible to humans. At least in the basic SSL implementation; attaching “explanatory modules” to it is a separate and extremely exciting direction in the development of ML.
An example of a computer vision task for which supervised learning is prohibitively resource-intensive is instance segmentation, which determines exactly which pixels in an image belong to a given particular sample of an object. For example, in a frame from a high-resolution video camera, where a person stands against the background of a car or another person, for many applications it is necessary to clearly determine not just the rough contours of these objects themselves (this task, object detection, is solved quite well by simpler models), but which of them each specific point in the picture belongs to. You can imagine the amount of labor required for manual pixel-by-pixel marking of even one frame in Full HD, but to form reliably effective patterns in a model trained with a teacher, hundreds, if not hundreds of thousands of such frames will be required. SSL solves such problems much more efficiently – precisely because there is no need to involve live operators.
Semi-supervised models are also known, which during training rely partly on both labeled and unlabeled data by humans. They are often used where relying entirely on self-learning is unreasonable: for example, in modern speech recognition systems. The array marked by people – audio recordings transcribed manually – for such systems amounts to tens, maximum hundreds of hours; based on it, the model is trained to translate the voice into text with understandable restrictions – in vocabulary, in the manner of pronunciation of the corresponding speakers, etc. Then another, already unannotated, more extensive is added to this array – hundreds and even thousands of hours, and the training continues already in independent mode. The result is a system that can quite reliably transcribe the speech of a wide variety of people on a variety of topics – and with an acceptably low level of, alas, inevitable errors.
A natural development of SSL was self-predictive learning (SPL), otherwise called autoassociative self-supervised learning. This is approximately what the founder of paleontology and the inventor of the comparative anatomical method, Georges Cuvier, once said: “Give me one bone, and I will reconstruct the animal from it.” The SLP method allows you to train a ML model in such a way that, based on the fragments of a certain object proposed to it, it will model its missing parts with a sufficient degree of reliability – and, accordingly, the entire object. SLP finds the widest application in a variety of generative models – in particular, those that are used for outpainting images beyond the boundaries of the canvas they originally occupied. Variational autoencoders (VAE), which are responsible for “translating” the image generated in latent space by modern generative models for converting text to pictures, into a human-comprehensible graphic format, also fall into the SLP category. As well as autoregressive models, “predictors of the future based on the past,” they are the basis of such widely known large language models as GPT, LLaMa and Claude.
In a word, discriminative and generative ML models go hand in hand today – and, strictly speaking, the overwhelming majority of the most significant implementations of “generative AI” today are precisely combined, hybrid systems. Sometimes models that implement reinforcement learning (RL) are classified into a special class – their peculiarity is that during the training process they operate not with a set of pre-prepared data, labeled or not, but directly with a certain environment. Formally, there is a complete analogy with supervised learning, only the role of the teacher is not the person who presses the “correct” or “incorrect” buttons, but the environment itself, which the agent (in this case, a reinforcement learning ML system) influences , receiving feedback in return. Well, let’s say that it is the RL system that is best suited for creating an artificial gamer – who will receive negative reinforcement if he makes a wrong action in the game and his avatar suffers, and positive reinforcement if he does everything right.
Which of the areas of ML is most preferable for creating “real” (in the sense of strong) AI in the future? Experts cite both RL and SLP as the most promising methods, but the main emphasis is on explainable machine learning models—those whose “mindset” will not remain a mystery to the biological specialists who create, train, and operate them. In addition, we will have to cope with a number of challenges – including the lack of data for training new super-large language models, the contamination of this same data with “secondary” data (i.e. those generated by already existing generative AI), as well as possible architectural difficulties: the risk is not illusory that the virtual implementation of “strong” machine learning models in the memory of von Neumann computers will turn out to be excessively expensive – and will require a rapid transition to neuromorphic hardware systems of specialized architectures. One way or another, this direction of high technology development will clearly continue to be a priority in the foreseeable future – which means it makes sense to expect relatively quick results and new achievements in the field of machine learning.
Related materials
In addition to the WD_Black SN7100 series of NVMe drives and the WD_Black C50 Storage…
Western Digital today introduced a range of solid-state drives under the WD_Black and SanDisk brands.…
The season of major video game awards in 2024 opens with the Golden Joystick Awards.…
Amid the long-awaited release of S.T.A.L.K.E.R. 2: Heart of Chornobyl, the GOG digital store has…
Xiaomi's Redmi brand is 11 years old. In honor of this event, a new logo…
SpaceX has explained why, during the last test flight of the Starship rocket, it sank…