20 Feb 2024
Q & A
5 minute read

In search of AI algorithms that mimic the brain

IBM researcher Dmitry Krotov is a theorist on the hunt for artificial neural networks that can crunch data as efficiently as the brain.

IBM researcher Dmitry Krotov is a theorist on the hunt for artificial neural networks that can crunch data as efficiently as the brain.

AI is advancing so rapidly it can be easy to forget how mysterious deep neural networks, the building blocks of modern AI, still are. Dima Krotov was finishing his PhD in theoretical physics just as deep learning was taking off. Intrigued by its potential, he dropped quantum field theory to focus on understanding neural nets and their amazing computational capabilities.

He joined deep-learning pioneer John Hopfield as a postdoc at Princeton’s Institute for Advanced Study (IAS), the academic home of Einstein and Oppenheimer. Hopfield was a physicist at Bell Labs who in the early 1980s had described a new type of neural net inspired by associative memory in the brain: the Hopfield network. Together, they expanded on the work, which Krotov continued upon arriving at IBM Research in 2018.

A fringe idea for decades, Hopfield networks are now having a moment. They were recently the focus of a daylong workshop at NeurIPS 2023, attended by some of the biggest names in AI. Foundation models built on transformers are still all the rage, but their limitations have also become clear. They have a short attention span, can hallucinate, and their decision-making process is opaque. They also require a lot of computation to train and run. Krotov and others see Hopfield networks as a potential solve for at least some of these problems.

Deep neural networks come in two flavors. Feedforward networks, which include transformers, process information in one direction. Recurrent neural networks (RNNs) process information iteratively, getting closer with each pass to the desired answer, like real neurons in the brain. Hopfield networks are a simple type of RNN with the potential to store large amounts of information. They also provide a window into the retrieval process, making them potentially more interpretable than feedforward networks.

Their similarity to real neurons has inspired Krotov in his quest to improve AI. But they may also show us something new about the brain. We caught up with Krotov in Cambridge to talk about Hopfield networks and what they can tell us about the future of AI and intelligence itself.

Why are Hopfield networks special?

They are the simplest mathematical models with built-in feedback loops. By contrast, 90% of AI models are feedforward networks which means they process information in one direction only. If you prompt a large language model (LLM) with a question, each word that it generates must be compared to the prompt and the prior words it generated. This span of text is known as the context window. As the context window gets longer, computational complexity rapidly increases. Once the next word is predicted, the process must begin again. This explains why training and running transformer-based models can be so slow and computationally intensive.

The brain, by contrast, uses recurrent feedback loops to summarize and store past information in memory. Hardly any pathways in the brain are strictly feedforward. Our eyes flicker over an image and process it in pieces. A full memory reveals itself gradually, after that first associative flicker. Since Hopfield networks work in a similar way, we think they could be a promising alternative to today’s feedforward networks.

What are their limitations?

The Hopfield networks of the 1980s had limited memory storage. The number of memories they can store and retrieve scales linearly with the number of input neurons, making them impractical for modern AI applications. But John and I realized in 2016 we could expand their memory by introducing more interactions among neurons. We called this enhanced network, “Dense Associative Memory,” because it packs more memories into the same space. Traditional Hopfield networks were limited to two-neuron interactions, but in their modern incarnation, three or more neurons can interact at a single point. The more interactions, the greater the memory storage capacity.

A year later, Mete Demircigil and colleagues showed you could increase memory exponentially by adjusting these interactions. Though memories become densely packed inside the state space, they retain their associative properties. Presented with a degraded memory, Dense Associative Memory can correct the errors. It works with both binary and continuous variables, making it suitable for modern AI applications.

What have we learned about Hopfield networks recently?

Hopfield networks flew under the radar until the pandemic. Then, a group led by Sepp Hochreiter, coinventor of the LSTM model, showed that the attention operation in transformers could be derived from Dense Associative Memory by carefully picking the many-neuron interactions. Before this, people thought attention was a global convolution operation that tracked long-range correlations in the data. Hochreiter argued that attention is really a memory system. When we type a query into an LLM, the text is effectively loaded into short-term memory (a Dense Associative Memory). The attention operation retrieves data, and the transformer acts on it.

But Dense Associative Memory has important differences. Unlike the transformer, its memory vectors remain constant at runtime. Transformers also have operations in addition to attention, which include feedforward multilayer perceptrons, layer normalization, and skip connections. In our 2023 NeurIPS paper, Energy Transformer, we incorporated these operations into a single architecture that describes the entire transformer block as an associative memory.

Do dense associative memories exist in the brain?

They may – we don’t know for sure. Dense Associative Memory is a mathematical theory of computation that can be used to model both artificial and biological neural networks. To build one mathematically, three or more neurons must connect at a hypothetical synapse. But most neuroscientists believe real neurons connect only in pairs: a pre-synaptic neuron linked to a post-synaptic neuron via the synaptic cleft, where chemical signals, or neurotransmitters, are exchanged.

There are two theories for building Dense Associative Memory in biological neurons. In one, the brain has hidden neurons to account for these many-neuron interactions. In the other, the brain’s astrocyte cells connect several neurons at a time, effectively creating a multi-neuron synapse. Astrocytes make up a significant percentage of all brain cells. They could potentially explain how our mathematical model of Dense Associative Memory might be implemented in the brain.

We also know that Dense Associative Memory can be reduced to a transformer mathematically. Could the computation done by transformers be implemented using neurons and astrocytes? We created a mathematical model in our PNAS paper last year that suggests they could. These ideas are still just models, but we hope they inspire neuroscientists to test them on living cells.

Where do you see AI heading?

Memory is essential to human cognition but plays a minimal role in modern AI. Researchers are currently trying to augment transformers with additional memory. I expect many clever ideas to emerge on this front in the next year.

Energy-based models are appealing because you can design their computation by shaping the energy landscape instead of forcing the neural network to produce the desired answer. They are still a fringe idea but have intriguing potential.

Finally, the brain has evolved many forms of specialized computation. Fruit flies, for example, have a strong sense of smell. We modeled the network that makes this happen in their brain and applied it to natural language processing. In our 2021 ICLR paper, we showed how a mathematical model of the fruit fly olfactory system could efficiently “learn” word embeddings from raw text. Fruit flies don’t speak language, of course, but the way their brains “compute” smell could help us design better AI models that could be applied to other tasks. Nature is full of examples of specialized intelligence which could be a model for future AI architectures.