Machine learning – a successful spinoff of AI
Artificial intelligence (AI) is a term that has kindled the imagination of computer scientists for the last six decades. It was their moniker for the dream to build applications that resembled or even surpassed human intelligence. The field started in the late 1950s and has since gone through several cycles of optimism and doom. Each time the ambition and enthusiasm rose to great heights, to eventually clash with the complexity of the task and the available computational power.
The last years have seen a new upturn. Lately, we’ve seen computers beat champions of chess and even Go, said to be the most complex game played by humans. We’ve seen computers dialogue, discuss, and drive cars. So this time around, AI seems to be successful. The reason? The available computational power has risen exponentially to where we are today. And the ambition of general AI has been curbed sufficiently to match that power: computers now learn to recognize patterns in huge, seemingly random datasets. It’s called machine learning (ML) and it is no mean feat.
Machine learning as we know it is either supervised or unsupervised.
Supervised ML is the technique that has been reaping the most successes so far. This is how it works. An ML system is presented with a large dataset and a task. Say an array of pixels, a map of weather data, or a history of body parameters in which it has to recognize a face, a storm, or a disease. In the learning phase, the system will be confronted with input, make predictions and then get feedback through the labels that were pre-attached to the input – correct or not. If its prediction is false, the ML system will tune its parameters (also called weights) and make a new prediction. Over and over again, until the parameters are fine-tuned to make correct predictions most of the time. After the learning phase, the system is ready to mine huge data streams on the lookout for meaningful patterns, a process we call inference.
The available computational power has risen exponentially to where we are today. And the ambition of general AI has been curbed sufficiently to match that power: computers now learn to recognize patterns in huge, seemingly random datasets. It’s called machine learning (ML) and it is no mean feat.
What makes these systems so successful is that they require relatively little human effort and preprocessing to fine-tune. With feedback from pre-labeled input, they learn the parameters needed to recognize for example faces, forming a filter that had to be tediously hand-coded previously. But ML systems as they are implemented today are power hungry, especially in the learning phase where they have to mulch over the same data over and over. But even in the inference phase where the data pass only once through the system, potentially millions of weights have to be taken into account and billions of calculations have to be made. That puts even inference out of reach of where it could be most useful: the fingers and toes of the IoT where the data are sensed and gathered. To nevertheless make inference possible at the edge, imec’s researchers are working on hardware solutions that drastically minimize that energy usage of inference down to the level that fits into autonomous, wireless sensors.
Unsupervised ML, in contrast, requires no human intervention and training. It’s the holy grail of ML, and it would allow applications that get customized for specific uses and for individual people – on the spot, and not with pre-learned parameters. The techniques and algorithms used are loosely inspired on how the human brain learns and functions. But even more than with supervised ML, energy consumption is an issue. Not so much in the cloud, but certainly where customization is most useful – at the edge, on the sensors. As an added challenge, learning and inference cannot be separated. Customization is learning, and it has to be included on the sensors. So even more than with supervised ML this will call for specialized hardware, a research topic that is also in imec’s pipeline. An interesting use case that the researchers are working on, with promising results so far, is wearable health technology, were ideally each sensor should be able to customize itself to the person who is wearing it.
The requirements for smart inference at the edge
The dominant hardware platforms involved in supervised machine learning today have top-of-line GPUs, consuming up to 200 watts. Some systems use FPGAs which are on average a tad more power efficient, but which also have a corresponding lower performance. Top of the line in the performance/energy tradeoff are a number of ASICs, processors specifically built for deep learning. But even these will still use between 50 and 100W.
No wonder then that both machine learning and inference are now done centrally, in the cloud. It’s simply not feasible to run a 100W dissipating chip in a mobile phone, let alone in IoT sensors that have to keep on running for longer times on much smaller energy budgets.
However, the IoT sensors are where most of the future data will be captured: wearable health sensors, vehicle-mounted radars, building-integrated stress sensors… In most cases, technical or energy constraints make it impossible to stream all that data to the cloud where the AI resides. In addition, there are also use cases where patterns should be recognized instantaneously, such as with radars that need to detect people or vehicles in the path of a self-driving vehicle. There, the time delay of a round-trip to the cloud is simply prohibitive.
So there is a great need to bring machine learning to the edge of the IoT. For supervised learning, that doesn’t have to include the learning phase; the parameters can still be learned in the cloud. But surely inference, the smart pattern recognition, should be brought to the sensors.
But what are the energy budgets available at those nodes? Applications in vehicles, e.g., can deploy chips that use a maximum of between 10 and 30W. But if you go to the mobile space, you’d have to do inference with 1W. And in IoT, sensors on or in the body for example, the available budget may even be below 10mW, even going towards 1mW.
Inference hardware using minimal energy
Many AI inference systems fetch, over and over again in successive layers, data and weights from memory. In each layer, they perform multiplications and additions – so-called convolutions – and store the output. The most important priority in designing any low-energy AI chip is therefore minimizing both the amount of data that needs to be moved, and the distance that the data need to be moved over.
Standard, the AI systems work with 32bit floating-point arithmetic. Minimizing the amount of data could be done by lowering that precision, e.g. to 8-bit arithmetic. It has been proven that for inference, this can be done with hardly any loss of accuracy. So many 8-bit implementations have been made, but they don’t yet bring inference in the energy range of edge computing. A more extreme measure is to bring the precision down to 1-bit, resulting in a so-called binary CNN (convolutional neural network). Unsurprisingly, there is an added accuracy loss compared to 8-bit implementation, but there is enough left to make it useful for many practical applications.
A second measure is creating an architecture that lowers the energy needed to fetch and store millions of weights and input values over and over again. One of the most advantageous solutions is to store the learned weights in memory and keep them there, doing inference using a form of analog in-memory computation. The heart of such an AI processor are thus memory arrays that permanently store the values of the learned weights using analog non-volatile devices, e.g. resistive RAM technology. Each such array represents one layer of the neural network. And in the array, the learned weights are encoded in the individual device conductances. So how are we then to multiply and add these weights with the input value? By setting the input values as the word line voltages of the ReRAM arrays. Each cell’s current will then be the multiplication of the weight and the input value (Ohm’s law). And the word line’s current will be the summation of the cell currents in that line (Kirchhoff's law). That way, we can effectively implement convolutions without having to fetch and move the weights over and over again.
There are a number of challenges to this approach. The variability of the memory chip, for example, will limit the precision with which the weights can be coded. This will especially be an issue with 8-bit precision, but not so much for binary solutions, where STT-MRAM is well suited. There is also the added complexity of integrating an analog memory in a digital system, requiring e.g. digital to analog conversions (and vice versa). But the bandwidth gains by not having to move around data far outweigh this added complexity.
A pipeline of AI solutions to make sensors smarter
Designing innovative hardware is imec’s lifeblood. And innovative hardware is what is needed to make the IoT smarter and more individual. So our scientists are working on a pipeline of promising solutions that will be demoed in the coming months and years.
Hardware with nonvolatile analog memories will allow implementing neural network convolutions that use minimal energy, down to the order of milliwatts. They will bring inference to the edge of the IoT, be it with binary or later maybe with a higher precision. This will allow doing smart pattern matching, mining wisdom from huge amounts of sensed data, making the IoT a lot smarter.
The next frontier is hardware for unsupervised machine learning, hardware that allows for sensors without learned parameters, sensors that can adapt on the fly to individual people and situations. Wearable health sensors for example that really and intimately know their wearers. These will make the IoT smarter, but also allow for a more individual experience.
Diederik Verkest is a Distinguished Member of Technical Staff responsible for imec’s INSITE en Machine Learning programs. After earning a Ph.D. in micro-electronics engineering from the KU Leuven, Diederik joined imec in 1994, where he has been responsible amongst others for hardware/software co-design. In 2009, he started imec’s INSITE program focusing on co-optimization of design and process technology for advanced technology nodes. The program offers the fab-less design-community insights into advanced process technologies and provides a platform for foundries and fab-less to discuss directions for next generation technologies.
31 August 2018