What makes machines based on von Neumann principles truly good is their broad ability to emulate a wide variety of processes – not excluding the same computing systems, but produced by systems of radically different architectures. And the denser von Neumann computers become, i.e., the more of their basic elements (namely transistors, since semiconductor technologies for their production are best developed today) fit per unit area, the more profitable it turns out to be to emulate computing means of a completely different nature using this technology . So, although artificial neurons – perceptrons, which we have already talked about more than once, can be implemented physically without any particular difficulties, including on a semiconductor material base, it turns out to be much more effective (in terms of the total amount of engineering effort and money spent) to form multilayer neural networks in memory von Neumann servers or even consumer-grade PCs.

At least, this kind of reasoning remained valid until the rapid and widespread surge of interest in generative AI, which occurred in the fall of 2022, and to this day this topic continues to be very hot.

Changes in electricity consumption by region of the world in 2022-2023. and forecast for 2026, TWh (source: IEA)

«Hot” not only in a figurative sense: according to the International Energy Agency (IEA), in 2026, global energy consumption by data centers will double compared to 2022, reaching 2% of the total volume generated by all power plants in the world . Of course, not all of these countless terawatts will be spent on powering AI servers that animate, at the request of Internet users, photorealistic images of cats in funny hats, but the contribution of the execution of existing (and training of new) artificial intelligence models for various purposes to the rapid growth of the energy budget of civilization is enormous . For example, Microsoft has already signed a 20-year power supply contract with one of Constellation Energy’s nuclear power plants in Pennsylvania, and AWS is hiring a chief engineer with experience in the nuclear industry to develop the direction of small modular reactors and establish links with traditional nuclear power plants – all this is precisely because of the urgent need for the most inexpensive and accessible energy possible for AI data centers.

⇡#Less noise, more power

According to The Economist, training the GPT-4 model alone (which today already has many direct competitors) cost more than 50 GWh – this is approximately 0.02% of the annual energy budget of the state of California and 50 times more than the GPT-3 model required for its training. Forbes experts fear that the overly rapid development of AI is pushing the planet towards an energy crisis – and their concern is understandable: if a classic 19-inch server rack in a typical data center consumes about 7 kW of power, then the same rack, but with equipment oriented to perform AI tasks will require from 30 to 100 kW. Something clearly needs to be done about this, and the option “let’s take it and stop creating cats in funny hats with all of humanity at once” – at least in the medium term – does not seem realistic. Yes, the hype around AI, especially purely stock trading, is clearly on the decline, but a decrease in the activity of speculators on a hot topic may be an indication of a wider and deeper penetration of this new technology into a variety of industries, including key ones for national and global economies.

«Da Vinci machine gun – two design options on one sheet (source: Wikimedia Commons)

Here, both developers and users of artificial intelligence models face an obvious dilemma. On the one hand, in order to demonstrate to skeptics not just the positive economic effect, but the fundamental superiority achieved by the introduction of AI in various types of economic activities, the widest possible application of these new technologies is necessary. On the other hand, the monstrous energy costs for operating (and, separately, training) AI models call into question the economic feasibility of such a transition. The situation turns out to be approximately the same as with the potential availability of automatic weapons already in the Renaissance: there is no doubt that Leonardo da Vinci, who was working then, would have been able to move from the 33-barreled rapid-fire rifle battery he developed (at the sketch level), having come up with a unitary cartridge, to a machine gun – similar in design to, say, a rather simple Gatling scheme. I could, if there were economic prerequisites for it. However, the mass production of cartridges and bullets that exactly correspond to given sizes (and the slightest discrepancy in dimensions within a single caliber inevitably leads to jamming of rapid-fire self-loading weapons) in the pre-industrial era was either completely impossible or so labor-intensive that one machine-gun belt filled with cartridges made by hand by workshop craftsmen , would probably cost more than equipping some squad of crossbowmen. Empty waste!

The excessive energy appetites of modern AI systems can also be considered as an indication of the discrepancy between the idea of ​​neuromorphic, i.e., “brain-like” computing and the hardware basis used for their practical implementation. Semiconductor computing tools in general are characterized by extremely low energy efficiency: it is enough to compare the energy budgets of any data center, spent on operating the servers themselves, on the one hand, and the heat removal systems from them, on the other. At the same time, a person – say, a professional artist who manually draws the same notorious cats in funny hats – spends on his mental activity (we don’t even take muscle movements into account, they are even more economical) hardly more than ten watts, i.e. . more than three orders of magnitude less than a rack full of server graphics accelerators in a data center.

And the point here is not at all in the speed of processing simple tasks: multiplying, say, three-digit numbers in the head, even for a mathematics professor, is not easy, while for the simplest calculator, even three digits, even thirty-three, it’s all the same – the main thing is that there are enough positions on the screen to display result. The advantage of the biological brain is the highest parallelism of its work, thanks to which it is able to solve problems that are obviously superior in complexity to integer multiplication – and which, in an amicable way, should be imitated by the hardware engaged in neuromorphic calculations. So what prevents us from moving from the inefficient (from an energy point of view) emulation of neural networks on von Neumann PCs to the implementation of the same networks in hardware, as they say, form?

The Taichi optical microprocessor uses photonic circuits to both receive information/control signals and process them (source: Tsinghua University)

In principle, there are no physical obstacles to increasing the energy efficiency of the hardware basis of neuromorphic computing. We have already addressed the idea of ​​using photonics (instead of or together with microelectronics) to save energy when solving machine learning (ML)-related problems – and this direction is being implemented quite consistently around the world. In particular, researchers from Beijing Tsinghua University, together with the Chinese National Research Center for Information Science and Technology, presented in the spring of 2024 Taichi, a microchip created on the principles of photonics for organizing optical neural networks. By combining interference and diffraction approaches to optical signal processing, one multi-chip Taichi is capable of operating a neural network with almost 14 million operating parameters. As a result, its demonstrated accuracy in recognizing more than 1.5 thousand handwritten characters from fifty different alphabets (taken from the Omniglot database, well known in ML circles) reaches almost 92%. In addition to discriminative AI tasks, Taichi allows you to solve generative ones, such as creating music in the style of Bach or landscapes in the spirit of Van Gogh. Compared to the Nvidia H100, this computer based on the principles of photonics, according to its developers, is more than a thousand times more energy efficient – and this, it would seem, is the right path that neuromorphic computing hardware should follow in the foreseeable future.

Alas, in practical terms, everything is far from going smoothly with Taichi – and for a number of reasons, such technologies can so far only lay claim to the laurels of replacing semiconductor microprocessors, including for complex AI calculations. The chip itself solves classic MO problems of the “matrix-vector multiplication” category efficiently and quickly, while occupying a very modest area – quite comparable to the dimensions of the silicon accelerator that underlies the same H100. However, in order to supply the necessary data to the Taichi input in the form of light pulses with certain characteristics suitable for processing on it, and then read the results of the calculations at the output and convert them into electrical signals for further operations on them, a fair amount of rather bulky and energy-hungry machinery is required – such as lasers and high-frequency filters.

One of the most important limitations of silicon photonics is the inability to lithograph an optical coherent radiation generator (laser) on a silicon substrate; for this reason, micro- and nanolasers have to be created separately, and then integrated onto a blank plate with optical circuits (using the flip-chip method, for example; in the photo shown, the laser modules are large dark gray ribbed blocks), which does not simplify the manufacture of such chips, and does not make it cheaper (source: IEEE Spectrum)

As a result, the entire installation, built around a single such chip, will only fit on a desk. Of course, researchers are working to reduce the size and energy consumption of their system – but so far, photonics as a hardware basis for neuromorphic computing is clearly not in first place among potential replacements for Von Neumann semiconductor computers. Now, if it is possible to move from hybrid electro-optical circuits to purely photonic ones, then the conversation will go differently; but for now this is a matter of a rather distant future.

⇡#What’s under the neuro-hood

Before moving on to describing more practical implementations of neuromorphic computers, let’s dive briefly into the design features of natural, biological neural networks. Any cell, including nerve cells, is delimited from the environment by a membrane. Due to the presence of such a boundary and regulated passages (channels) in it, the physicochemical properties of the internal and external spaces of the cell can differ quite noticeably. In particular, the concentration and composition of ions on the inner and outer surfaces of the membrane of a cell that is in a quiet state are different (the total charge of all ions inside a normal cell in a quiet state is, of course, zero – it’s just that different types of ions are retained on the membrane and in different quantities; both inside and outside). From the point of view of physics, this automatically leads to the appearance of a stable potential difference at the boundary between the internal volume of the cell and its surrounding environment – about 70-90 mV in absolute value and with a negative sign.

The simplest nerve impulse (an area with a negative charge on the outside of the membrane and a positive charge on the inside), propagating along the surface of an axon (the charge on the outside of which is initially positive) due to a change in the transmembrane potential (source: Wikimedia Commons)

So, under the influence of various stimuli – most often for a nerve cell these are chemical signals entering through the synapse – the membrane potential can change. The change – usually manifested as a short-term local replacement of a positive external charge with a negative one – is initiated at a certain point in the membrane and spreads across its surface. The terms “positive” and “negative” are relative here: the potentials of the outer and inner sides of the membrane in each specific area are compared. The process of such propagation of excitation along the surface of the membrane, otherwise an excitation wave, is called an action potential (or simply a spike, a tracing-paper from the English term spike). And this is one of the key concepts for neurophysiology, since this is how information signals—nerve impulses—are transmitted through biological tissue.

From the point of view of neuromorphic systems, it is important that an action potential is generated in a nerve cell upon reaching a certain level of depolarization (i.e., reaching a certain threshold positive value) of the neuronal membrane. The “all or nothing” law, known in neurophysiology since the 19th century, states that the cell membrane of an excitable tissue either does not respond to a stimulus at all, or responds with the maximum strength possible for it at the moment – of course, this is ideal, model case; With real cells, everything is often much more complicated. One way or another, the “all or nothing” law turned out to be the most convenient mathematical abstraction, which formed the basis of a whole class of artificial neural networks, now for obvious reasons called spiking neural networks (SNN). Spike perceptrons differ from the perceptrons of the “ordinary” artificial neural networks that we previously considered (called in English literature simply artificial neural networks ANN) both in the type of input signals and in the way the function of its operation is implemented in the “body” of the artificial neuron.

Schematic representations of the principles of operation of perceptrons of a “regular” artificial neural network (ANN) and a pulsed neural network (SNN) (source: PubMed)

Let us recall that the general principle of operation of a perceptron is to produce a certain signal at a single output, depending on what signals arrive at its numerous inputs. This dependence is determined by the activation function, which for ANN perceptrons can be played by the Heaviside “step”, sigmoid, hyperbolic tangent, ReLU and others. The input arguments of these functions—signals modified by weights—are considered as continuous quantities, and in general, a fully connected deep (i.e., with the number of internal—hidden—layers exceeding two) artificial neural network deals precisely with continuous (in a particular case, fixed) input data streams. And it is extremely useful for a variety of ANN applications, including pattern recognition. The picture – even if it is one of the frames of the video broadcast – is stationary; each of its pixels that enters the input of a convolutional neural network, for example, generates a signal with certain constant parameters (brightness, color composition) – and it turns out to be extremely simple to process it with a neural network that operates with continuous quantities at the input.

But biological nervous tissue works fundamentally differently! Action potentials propagate in it as single impulses separated by certain time intervals. This affects the nature of the work of the neural network as a whole: say, impulses that come too often one after another will not be detected by it (and a reaction to them in neurons, accordingly, will not be formed) – since the receiver, which is a section of the dendrite membrane under the corresponding synaptic cleft, does not will have time to return its electrical potential to its original value in order to trigger the next operation in response to a new signal arriving too quickly.

The idea of ​​reproducing this behavior of nervous tissue by artificial means logically leads to the concept of a spiking neural network, the first mathematically based approach to which – the Hodgkin-Huxley (HH) Spiking Neuron Model – was proposed by Alan Hodgkin and Andrew Huxley back in 1952. Perceptrons in such a network process not continuous incoming quantities, but sequences of single pulses, using systems of ordinary differential equations of varying degrees of complexity. The HH model itself (built by Hodgkin and Huxley based on observations of how the concentrations of potassium and sodium ions change throughout the membrane of the giant squid axon as the action potential propagates) still remains one of the most accurate in terms of compliance with the biological prototype. However, today it is almost not used due to its extremely high computational complexity – many more practical implementations of SNN have been developed to date.

Various SNN models in terms of sales price – biological plausibility (source: PubMed)

⇡#Semiconductors don’t give up

It makes sense to talk about the “practicality” of such implementations precisely in the case when these models are emulated digitally on von Neumann computers, and it is possible to accurately measure the computational resources spent on the execution of one or another. It seems that on a suitable hardware basis, pulsed artificial neural networks should function with much higher efficiency – if not exactly the same as a biological brain, then at least noticeably superior in this parameter to emulation on conventional x86 or ARM servers with semiconductor processors .

Let us note, by the way, that it is not so necessary to abandon semiconductor microelectronics in order to create a truly neuromorphic computer. It is quite possible to create “brain-like” structures by photolithography on a silicon basis; another thing is that a significant limitation here is the fundamental lack of spatial connectivity between individual perceptrons. Biological neurons in the brain can have thousands of synaptic connections with their neighbors (sometimes physically separated by quite significant distances), while planar semiconductor structures formed on the surface of silicon must rely on metal interconnect buses – which are lithographed sequentially in higher and higher layers – to exchange information. layers of an integrated circuit. The more layers there are, the higher the likelihood of errors/inaccuracies occurring during chip manufacturing, plus the more difficult it is to design optimal lengths (after all, the delay time during signal passage also matters!) contacts between individual perceptrons. As a result, semiconductor artificial neurons made using classical means can only boast dozens, at best hundreds, of synaptic connections with their neighbors, which is frankly not enough for truly universal neuromorphism. And if we also take into account the monstrous heat dissipation of semiconductor devices during calculations…

A group of Indian researchers pose with engineering prototypes of their neuromorphic BTBT computers (source: Indian Institute of Technology, Bombay)

Engineers and microelectronics scientists, of course, do not admit final defeat in this area – and regularly propose new ways to develop semiconductor neuromorphics. Thus, in 2022, a group from the Indian Institute of Technology in Bombay (Mumbai) proposed an ingenious and very economical (in terms of energy consumption) method for constructing a neuromorphic – namely, pulsed – perceptron network on semiconductors using the effect of inter-band tunneling (inter-band or band-to-band-tunneling, BTBT). It is interesting that for classical semiconductor devices this quantum effect is considered extremely undesirable, since it is one of the reasons for the appearance of parasitic leakages between the drain and the source (more precisely, from the valence band of the source to the conduction band of the drain) in a switched-off transistor.

The idea of ​​Indian researchers is to compose (on one plane for now, as a proof of concept) from capacitors that are obviously susceptible to the BTBT effect, a rectangular matrix, to each node of which an input bus is supplied to transfer charge to the capacitor. After it accumulates to a certain value, interband tunneling occurs – and all this charge, which is important, flows further along the output bus according to the artificial neural network circuit; to the next perceptron layer, for example. As a result, just a pulsed flow of data is formed – even if at the input to the first layer of capacitors they (in the form of electric current entering the circuit) were continuous – and the capacitor that sent this impulse on its further journey is “zeroed”, going into a state of readiness for new filling.

The proposed scheme is attractive from several points of view, and primarily from the energy point of view. Since we are talking about a quantum effect, its manifestation requires relatively weak currents flowing through fairly miniature semiconductor devices. Thus, at the same time, the issue of compactification of a neuromorphic computer is resolved: it will initially simply have to be carried out using a photolithographic method, albeit using fairly mature technological processes, since semiconductor capacitors have much stricter restrictions on the permissible dimensions than transistors. Thus, the engineering prototype of the described SNN chip was manufactured at a contract factory according to “45-nm” production standards.

Just to understand the scale of the tasks that the creators of neuromorphic computers face: the diagram shows a connectome (map of interconnections) of neurons in the densest part of the Drosophila nervous system. Just fruit flies! (source: Nature Physics)

Attempts to implement a pulsed neural network on classical semiconductor devices (in which the BTBT effect is considered nothing less than parasitic) have been made for a long time, but the resulting devices do not go into wide series – in no small part due to their significant energy intensity. The Indian development, according to researchers, increases the efficiency of neuromorphic computing in the most fundamental way: the peak energy of an artificial action potential propagating along such a capacitor SNN turns out to be five thousand times less than for similar semiconductor systems, where transistors play the main role (and are made using similar technological standards), and the power consumption of a single node at rest is an order of magnitude lower. To demonstrate the practical applicability of their idea, a group from the Indian Institute of Technology organized a small BTBT neural network (20 artificial neurons encoding the signal and another 36 for processing it), inspired by the structure of the auditory cortex of the brain, and, according to them, achieved reliable recognition of the words spoken by the experimenters even on such a meager hardware base.

Neuromorphic computing—and computing—has its fair share of challenges today, of course; and perhaps the most important among them is the limited scope of applicability. Thus, the already mentioned problem of image recognition is solved much better (namely, more accurately and faster) by more familiar deep ANNs that operate with continuous quantities at the inputs of perceptrons, and even a dramatic gain in energy efficiency does not become a decisive argument in favor of SNNs. Another thing is direct imitation of biological nervous structures: here impulse neural networks truly have no equal (the Indian example with a model of an area of ​​the auditory cortex directly confirms this). And in this direction – on the path of turning into reality the wildest scientific fantasies about artificial intelligence, about the notorious strong artificial intelligence – they definitely have a great future.

⇡#Related materials

  • Scientists have created a new quantum memory element – a superconducting microwave memcapacitor
  • SpiNNcloud introduced the first commercial “neuromorphic supercomputer” SpiNNaker2 based on Arm
  • Intel introduced the neuromorphic computer Hala Point on 1152 Loihi 2 chips with brain-like architecture
  • A super-efficient AI chip has been created in South Korea, combining classical and neuromorphic approaches
  • Neuromorphic AI supercomputer DeepSouth will appear in Australia to imitate the human brain

Leave a Reply

Your email address will not be published. Required fields are marked *