30 Mar 2022
Research
5 minute read

Open-sourcing analog AI simulation

The IBM Research AI Hardware Center is working on devices, accelerators, system architectures, and associated software stacks to scale AI workflows as efficiently as possible. Now, you can access the new inference feature in our Analog AI Composer to further explore that paradigm.

Open-sourcing analog AI simulation

The IBM Research AI Hardware Center is working on devices, accelerators, system architectures, and associated software stacks to scale AI workflows as efficiently as possible. Now, you can access the new inference feature in our Analog AI Composer to further explore that paradigm.

So much of what we do every day is powered by AI. From speech-to-text messaging, or talking with a customer-service chatbot, AI is automating many aspects of our lives. But these processes consume a lot of energy: To train one large natural-language processing model today can require roughly the same carbon footprint as running five cars over their lifetime.

We want to reduce that footprint at the IBM Research AI Hardware Center in Albany, New York. One method we're actively exploring is analog in-memory computing — or analog AI — which significantly reduces Traditional computer processors and memory are connected by a bus, which facilitates data transfer back and forth between the two. These data transfers through the bus require time and energy, negatively impacting performance. This is known as the von Neumann bottleneck. As the use of data-intensive AI tasks increases, we need to find innovative ways to get around this bottleneck and make these processes more efficient.the von Neumann bottleneck and allows highly parallel computations. In this novel approach, deep neural networks (DNNs) are mapped to crossbar arrays of non-volatile memory (NVM) elements. These elements act as artificial synapses and their conductance encode the weights of the neural network and enable computations directly in-memory. This removes the need to pass data back and forth between CPU and memory, which results in highly energy-efficient chips. We’re on a path towards a hundredfold performance improvement compared to today’s state-of-the-art accelerators.

NVM crossbar arrays and analog circuits, however, have inherent non-idealities such as noise, which can lead to imprecision and noisy computation. These effects need to be properly quantified and mitigated to ensure high accuracy.

conventional-memory@2x.pngcomputational-memory@2x.png

We're building a vibrant ecosystem and platform around analog AI, and as a result open-sourced part of our analog AI simulation framework, which we’re calling the IBM Analog Hardware Acceleration Kit, or AIHWKIT1. AIHWKIT is a first-of-a-kind, open-source toolkit that simulates analog NVM crossbar arrays and allows us to estimate what impact analog devices’ non-idealities might have on the accuracy of any DNN. We chose to integrate AIHWKIT with PyTorch, a popular machine-learning framework; this platform allows us to easily understand, evaluate, and experiment with emerging analog AI accelerators.

crossbar@2x.pngPhase Change Memory (PCM) devices arranged in a crossbar configuration.

The AIHWKIT now supports both inference and training, and provides unique simulation capabilities. It includes our latest algorithmic innovations, such as hardware-aware-training2, mixed-precision training3, and advanced analog training optimizers (like Tiki-taka4). It offers preset device configurations fitted to published material device data from IBM (including Resistive Random Access Memory [ReRAM], Phase Change Memory [PCM], Electrochemical RAM [ECRAM], and Capacitors). We have also optimized key operations (such as analog pulsed updates for training) with dedicated CUDA kernels to be able to scale the simulations to larger DNNs.

The AIHWKIT allows the larger research community to utilize IBM’s algorithms and large-scale device data, and to extend the toolkit with new analog devices or new algorithms. The AIHWKIT comprises 9,829 lines of Python code, 2,1775 lines of C++, and 9,802 C++/CUDA lines of code. We also added a large amount of unit testing code, examples, notebooks, and comprehensive documentation.

AIHWKIT can be used online though our web-based Cloud Composer. The AI hardware composer provides a set of templates, which can be used by just about anyone, regardless of their coding experience, to introduce the concept of analog AI, configure experiments, and launch training experiments in the cloud. We recently introduced new functionalities for inference experiments in simulation and are actively working to connect the composer with access to real phase-change memory (PCM)-based analog AI chips. Here’s a quick video on how to get started with the composer:

Using the Analog AI Web Composer

The AIHWKIT has fostered partnerships between IBM Research and academic institutions in the US, Europe, and Asia. Eventually we’re aiming to bring additional capabilities to AIHWKIT, including algorithmic innovations from IBM Research around hardware-aware training, mixed-precision training, and advanced analog training optimizers using parallel rank-update in analog. We hope to give the research community the ability to extend the toolkit with new devices, analog presets, and algorithms — among other advancements.

Notes

  1. Note 1Traditional computer processors and memory are connected by a bus, which facilitates data transfer back and forth between the two. These data transfers through the bus require time and energy, negatively impacting performance. This is known as the von Neumann bottleneck. As the use of data-intensive AI tasks increases, we need to find innovative ways to get around this bottleneck and make these processes more efficient. ↩︎

References

  1. Rasch, M. J., Moreda, D., Gokmen, T., Le Gallo, M., Carta, F., Goldberg, C., El Maghraoui, K., Sebastian, A., and Narayanan, V. A Flexible and Fast PyTorch Toolkit for Simulating Training and Inference on Analog Crossbar Arrays. AICAS 2021.

  2. Joshi, V., Le Gallo, M., Haefeli, S. et al. Accurate deep neural network inference using computational phase-change memory. Nat Commun 11, 2473 (2020).

  3. Nandakumar S. R., Le Gallo Manuel, Piveteau Christophe, Joshi Vinay, Mariani Giovanni, Boybat Irem, Karunaratne Geethan, Khaddam-Aljameh Riduan, Egger Urs, Petropoulos Anastasios, Antonakopoulos Theodore, Rajendran Bipin, Sebastian Abu, Eleftheriou Ev

  4. Tayfun, G. and Wilfried, H. Algorithm for Training Neural Networks on Resistive Device Arrays. Frontiers in Neuroscience, Volume 14, 26 February 2020.