Since the introduction of tensor processors a few years ago, a new wave of silicon-specific architectures has been developed, including hybrid CPUs. ARM and Intel are now introducing new products specifically designed for machine learning and other forms of artificial intelligence.
In the mid-2000s, British-Canadian researcher Geoffrey Hinton, the great-great-grandson of logician George Boole, made one of the most important discoveries in computer science: how to efficiently train new layers of neural networks.[1] Hilton’s theories paved the way to machine learning (ML), the foundation for most artificial intelligence applications in use today.
Neural networks require large amounts of two essential components: computing power and data.
While theories and experiments on the use of neural networks for AI have existed since the 1950s, both the processing capabilities and massive amounts of data necessary for real applications did not arrive until this century.
Today, our smartphones have millions of times more processing power than the computers that NASA used to send the first man to the moon. Additionally, the internet is collecting billions of data units of all kinds every second of the day, wherever there are images, text, videos, clicks, tweets, and so on.
Tensor processors provide new capabilities for ML and AI
As traditional software gives way to new AI algorithms, the requirements for technologies for computer processing is shifting. Machine learning requires on-the-fly processing of complex mathematical models, and the usual processing cores from companies like Intel and ARM are not explicitly designed for that.
The success of ML and demand for AI in many different fields has set off a race, if you will, for building the next-generation AI chip.
Initially, this void was filled by NVIDIA, which used its expertise in hardware for video games to leverage the algorithms behind graphics processing for the new requirements of AI. Between 2016 and 2018, NVIDIA became the go-to player in the chip market for everything from machine learning to crypto mining, and its stock price has increased tenfold.
Companies such as Microsoft, which long avoided making its own chips, are now heavily investing in it. Intel has also joined the race, and it is working with Facebook to test-drive its first AI-specific silicon.[2]
In 2016, Google announced a new processor architecture for “deep-learning inference” called the Tensor Processing Unit (TPU).[3] From the beginning, Google’s TPUs were tasked with improving the accuracy of mapping applications such as Google Maps and Street View.
The second and third generation of TPUs was announced by Google in May 2017 and May 2018. Second-generation design increased bandwidth to 600 GB/s and performance to 45 Teraflops, and the third generation doubled the previous generation’s performance.
In July 2018, Google announced the Edge TPU, a purpose-built ASIC chip designed to run ML models for edge computing.[4]
Today, NVIDIA and other AI chip vendors use TPUs in combination with their own technology to produce SoCs capable of handling different applications, including autonomous driving and facial recognition. NVIDIA also sells the Jetson, an ML powerhouse for those non-mobile devices that need strong deep neural network performance at a given power draw.[5]
Recently, NVIDIA announced Grace[6] — named for Grace Hopper, the U.S. computer-programming pioneer —, a new chip design scheduled to arrive in mammoth supercomputers in 2023. This new architecture, mostly using ARM technology, will make it possible to run complex AI computing tasks that are not possible with today's chip designs, bringing computers a step toward general artificial intelligence.
ARM has developed new neural network architectures
In the past few years, ARM, known for its popular Cortex architecture, has developed a new generation of neural processing units (NPUs), the Ethos NPU[7] series. The Ethos series is designed to work in conjunction with Cortex cores. This combination results in improved performance and power efficiency compared with conventional NPUs, enabling the development of cost-effective yet high-performance edge ML products.
The advantage of ARM technology is that it performs better in low-power applications, specifically targeting the internet of things market. Its technology is specially designed to be integrated into the low-power SoCs used in millions of connected devices.
The Ethos processing cores can take advantage of trained machine-learning algorithms developed in the cloud and run applications on the edge for immediate results. While the cores are not initially designed for complex ML training, their performance can match some of their larger competitors in edge-computing tasks at a fraction of the cost and power usage.
For example, according to ARM, the 512-GOPS implementation of Ethos-U65 running at 1 GHz is capable of object recognition in less than 3 ms when running the popular MobileNet_v2 deep neural network.[8]
NXP Semiconductors is currently using the Ethos-U65 microNPU, working in concert with the Cortex-M core and on-chip SRAM already present in NXP’s i.MX family.[9]
In addition to the Ethos U-series, ARM also designs the Ethos N-series,[10] targeting mainstream consumer products such as smartphones, in-vehicle infotainment, and digital TVs. The Ethos N-series is now being used in connected vehicles to deliver new smart features such as access control, driver alertness, or speech recognition, all of which help to make driving safer.
On smartphones, the Ethos N-Series is suitable to expand the processing capabilities of on-board CPUs, with features such as augmented reality, virtual reality, and ML capabilities.
Quantum computing will power the next generation of AI solutions
“Physicists have been talking about the power of quantum computing for over 30 years, but the questions have always been: Will it ever do something useful and is it worth investing in?” said John Martinis, chief scientist of quantum hardware at Google.[11]
For more than three years, Volkswagen has been working with Google to use quantum computing for various applications, including speeding up the time it takes to train neural networks, one of the critical technologies for self-driving cars.[12]
“There’s a lot of high-performance compute requirements coming our way,” said Martin Hofmann, CIO at Volkswagen. “The question is, are there better ways of doing it?”
According to IBM, “access to today’s limited quantum computers has already provided benefits to researchers worldwide, offering an unprecedented look at the inner workings of the laws that govern how nature works, as well as a new lens through which to approach problems in chemistry, simulation, optimization, artificial intelligence, and other fields.” [13]
While it is still too soon to realize the power of quantum computing in most applications, the results obtained in experiments are at least impressive. Google’s “Sycamore,”[14] a new 54-qubit processor, took 200 seconds to complete a quantum supremacy experiment that would take the world’s fastest supercomputer 10,000 years to produce similar output.
“On Google Cloud servers, we estimate that performing the same task for m = 20 with 0.1% fidelity using the Schrödinger–Feynman algorithm would cost 50 trillion core-hours and consume one petawatt hour of energy.”[15]
Towards artificial general intelligence
The implementation of these technologies has done more than making AI and ML possible. It has also created the belief that we are close to achieving what is considered the Holy Grail of AI research, Artificial General Intelligence (AGI): machines that can think by themselves and perform intellectual tasks mimicking humans, and much more.
Powerful processors such as NVIDIA’s Grace and Google’s Sycamore, paired with new algorithms and massive amounts of new data, move the world forward to a new era of AI.
[1] http://www.cs.toronto.edu/~hinton/absps/cbpweb.pdf
[2] https://finance.yahoo.com/news/intel-working-facebook-ai-chip-013746099.html
[3] https://cloud.google.com/blog/products/gcp/google-supercharges-machine-learning-tasks-with-custom-chip
[4] https://coral.ai/docs/edgetpu/benchmarks/
[5] https://www.forbes.com/sites/patrickmoorhead/2017/03/15/nvidia-introduces-jetson-tx2-for-edge-machine-learning-with-high-quality-customers/
[6] https://nvidianews.nvidia.com/news/nvidia-announces-cpu-for-giant-ai-and-high-performance-computing-workloads
[7] https://www.arm.com/solutions/artificial-intelligence
[8] https://developer.arm.com/ip-products/processors/machine-learning/arm-ethos-u/ethos-u65
[9] https://www.nxp.com/company/blog/why-the-arm-ethos-u65-micronpu-is-a-big-deal-and-how-it-came-to-be-this-way:BL-ARM-ETHOS-U65-MICRONPU
[10] https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-n57
[11] https://ai.googleblog.com/2019/10/quantum-supremacy-using-programmable.html
[12] https://blogs.wsj.com/cio/2017/11/07/vw-expands-its-quantum-computing-research-with-google/
[13] https://www.ibm.com/blogs/research/2021/04/quantum-accelerate-discoveries/
[14] "Sycamore processor - Wikipedia." https://en.wikipedia.org/wiki/Sycamore_processor.
[15] Arute, F., Arya, K., Babbush, R. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019). https://doi.org/10.1038/s41586-019-1666-5