High Precision Neural Network Inference using Analog-In-Memory Computing
Peter Vrancx, Debjyoti Bhattacharjee, Arindam Mallik
With the massive growth in size of neural networks, energy consumption of neural network inference has also risen. From a computation perspective, quantization techniques are used widely for lowering computational complexity. From the hardware perspective, analog-in-memory computing (AiMC) based solutions have grown popular in recent years. AiMC based solutions have extremely high energy efficiency of the order of 1000Tops/W.
AiMC based compute cells which store the neural network weights, natively support very low precisions (often binary or ternary). For mapping higher precision networks, each high precision weight needs to be mapped to multiple compute cells. Furthermore, the accumulation of the multiplication results using ADCs suffer from non-ideal behavior. These factors make it imperative to train the neural networks aware of the non-ideal hardware behavior to retain the original inference accuracy.
The goal of this project is to develop strategies for mapping state-of-the-art quantized networks (such as CNNs, etc.) onto ternary precision AiMC hardware and evaluate baseline accuracy of inference on such hardware. Specifically, the thesis would focus on the following aspects:
1. Develop mapping technique for transferring weights of existing quantized neural network onto AIMC hardware.
2. Report baseline accuracy after multi-bit weight transfer for different bit-widths.
3. Retrain the quantized network with noise information to recover accuracy.
4. Compare against the accuracy of other AIMC based results from literature.
Skills:
Mandatory: Python, experience with PyTorch or any other deep neural network framework, fundamentals with number representation (2’s complement, 32-bit floating point, etc.).
Optional: Familiarity with quantization techniques for neural networks, multi-GPU training