In March, 2015, an online forum in China posted an unremarkable job advertisement: “Chinese Academy of Sciences-Huawei joint program, 20 interns for smart chips.”
That, apparently, was the modest and quiet beginning of a project that culminated this year in Huawei CEO Yu Chengdong standing on a stage at the Berlin IFA expo with the world’s first AI-powered mobile processor, the Kirin 970.
In fact, the story is a bit more involved than that. Huawei has had a partnership with CAS for several years now, but the AI technology behind this particular chip didn’t begin in their joint lab, strictly speaking. Instead, it comes from an AI unicorn that was incubated by the lab, Cambricon.
Cambricon was founded by two of the researchers in the lab who also happened to be brothers, Chen Yunji and Chen Tianyou. Yunji, the older brother by two years, tackled research on processor chips, while Tianyou concentrated on AI.
In March of last year, Cambricon introduced its 1A chip, the first commercial deep learning processor, intended to be integrated with all types of systems-on-a-chip (SoC). According to the company, the chip is capable of processing 16 billion virtual neurons per second for processing power two orders of magnitude greater than a general purpose chip, yet with only a tenth of the power consumption. In an interview, Chen Tianyou said that the chip would enter the market within a year and a half. His timing seems to have been dead on.
In a later talk presenting the company’s research achievements, Chen Tianyou said: “In the PC era, the graphics rendering abilities of CPUs weren’t good enough, so we created GPUs. Signal processing wasn’t good enough, so we had DSPs. Now, we need specialist processors for AI, and this is the field that Cambricon leads in.”
Cambricon’s model as a business is comparable to that of ARM, relying on licensing of its designs. Last year it had already struck a profit, a fact that no doubt helped carry its recent A-series funding round to $100 million, with Alibaba, Lenovo, and other major investors backing them.
The Kirin 970 is Cambricon’s first commercial integrated processor, what Huawei is calling an NPU (neural network processing unit).
Cambricon’s chips are designed specifcally for deep learning neural networks, optimizing the kinds of vector and matrix operations that are so common to machine learning, and thus making them ideal for voice and image recognition applications.
Take image editing software, for example. Cambricon’s AI chip, together with image editing software introduced by Weibo, China’s popular microblogging service, can restyle photos, much like Prisma. The difference is that, rather than applying a style over an entire image as Prisma does, the new deep learning software distinguishes features within an image and edits them selectively. And with the Kirin 970 chip, that can be done instantly, and locally, without any need to connect to the cloud.
Huawei touts the strength of the NPU, claiming “image recognition speeds can reach up to 2000 images a minute.” Anyone familiar with the image recognition in Google Photos—which can take minutes, or even hours to register—will appreciate what a leap forward that is.
Before the arrival of this latest processor, Huawei had been hard at work developing SoCs for ten years, producing a series of chips such as the early K3V2. The Kirin 970 processor, however, shows the company has now caught up to the leading edge of the field, with 5.5 billion transistors and 10nm process technology. For comparison, Qualcomm’s 835 chip released less than a year ago carries 3.3 billion transistors. And of course, the Kirin 970 supports UFS 2.1 and LPDDR 4X.
Huawei says its new Mate 10 phone will be outfitted with the Kirin 970, to be released October 16.
But while chugging along at the fast clip of Moore’s law, mobile processors have begun to push up against a bottleneck. Transistors can only be so small, after all. Hence, the competition has shifted direction towards AI and the design of more specialized processors, and Huawei has put its bets on Cambricon.
Last year, Apple hired AI researcher Ruslan Salakhutdinov, and this year released its OpenML machine learning framework, along with an independently developed AI chip called Neural Engine. Likewise, last year Google formally revealed its own AI-enabling processor, the TPU, and at this year’s I/O conference CEO Sundar Pichai introduced the second generation product, a motherboard with 4 TPUs, with a theoretical processing speed of 180 teraflops. Nvidia then announced its AI chip, the Tesla P100, a product of some two billion dollars of investment, along with the Nvidia DGX-1 deep learning supercomputer.
Of course, Qualcomm, as the longstanding leader in mobile processors, has designs of its own, having bought machine learning company Scyfer, while Samsung has put money on AI chip design company Graphcore.
But for all of these giants, their already mature products are either intended for server-end use, or are still in the development stage. The Kirin 970 is the first production ready AI-enabled mobile processor.
There’s no need to belabor the significance of that. Such a chip may well be able to resolve many of the frustrations and inadequacies of AI applications on smartphones that have so troubled Huawei and other manufacturers. Chen Yunji perhaps expressed it best: Cambricon’s 1A can solve two types of problems. One is enhancing the performance of AI operations on a system by orders of magnitude beyond what traditional CPUs can achieve. The other is bringing intelligence out of the cloud, and making it locally available without an internet connection. “Especially for the latter, a lot of user data won’t need to be uploaded, and that ensures information security.”