Literacy for Robots

Du Chen

In the film Arrival, the alien heptapods communicate with humans via their own system of writing, which the linguist and protagonist Dr. Louise Banks must learn in order to communicate with them. In the process, she gains the ability to look past time and see the future.

The assumption here is the controversial Sapir-Whorf hypothesis, which does two things:

1) It gives social science students a place in a sci-fi film;

2) It argues that human cognition is influenced by the language we use, and that therefore different languages will affect how we conceive of and view any given subject.

Sapir-Whorf may still be a matter of some debate for linguists, but in the field of AI research it has already achieved a level of recognition beyond anything it has ever enjoyed in its home discipline.

Microsoft’s Harry Shum, after twenty years at the company, was this year appointed as the head of its US engineering division. His research has mainly been on computer vision, a field in which he has become a world-leading expert, but in recent years he has started to turn his attention to another field that a Dr. Banks would certainly appreciate: natural language processing (NLP).

The reason is that, for all the progress that has been made recently on deep learning, the dream of general AI remains remote. And in Shum’s estimation, even the best AI at present can’t compare to a four year old child in intelligence.

Systems like AlphaGo demand huge amounts of sample data on which to train. And while AlphaGo might excel at other games besides Go, such as Famicom games and StarCraft, it’s still far short of a general AI. Really, what people expect a general AI to be is something that can think and act in a human way. It wouldn’t even have to be all that good at Go. What people expect in actuality is something that can reliably handle most common problems, and not something that can surpass humans in narrowly defined situations.

How is it an infant can learn after just a few times that its mother will come tend to it when it cries? Learning quickly to solve problems, drawing inferences from just a handful of cases—these are things humans do easily, but as yet are beyond the abilities of any AI.

Shum tends to think that our extraordinary general intelligence as a species is related to our linguistic ability. And if that is so, then realizing general AI will require first mastering NLP.


This image, from Hackernoon, shows a pyramid of AI stages (although it’s arguably a bit backwards in the numbering). The bottom level, machine learning, started to come together in the 80s and 90s, and in the last few years has seen broad application in things like Apple’s Siri and Microsoft’s Cortana or Toutiao’s news recommendations algorithms. The second level is machine intelligence, using more complex machine learning algorithms to allow machines to tackle more complicated problems, such as with AlphaGo.

The top level is machine consciousness, in which a machine is capable of learning on its own and acquiring new knowledge without needing researchers to spoonfeed it data. That still isn’t necessarily general AI, but it would be a machine with a rudimentary level of awareness.

The importance of natural language to AI isn’t merely in creating something that could write a poem (or replace a reporter). For millennia, humans have been accumulating information in the form of text, preserving it in histories, anthologies, dissertations, Twitter, Facebook, Weibo, and websites.

There have been plenty of books, papers, and essays dealing with the question of how humanity succeeded in covering the Earth, and Yuval Noah Harari’s Sapiens, for example, suggests it is our ability to tell ourselves useful fictions (money, ideologies, community, etc.). But as Shum sees it, writing and language are some of the greatest inventions in human history. It is with and through language that we are able to organize, to create companies, religions, and even nations. Language has amazing power. And if we want AI to understand humans, then undoubtedly the most important thing is to give it a means of efficiently absorbing and using human knowledge.

We hope for AI to become a companion for humans, a partner, and as a species that is both rational and emotive, we naturally expect AI will be both as well. In the movie Her, Samantha and a myriad of other OS1 AIs are able to make people fall in love with them, but the funny thing is that Samantha doesn’t have a visual interface. Her interactions with her human user Theodore are entirely through dialogue. She can hear the feeling in Theodore’s voice—or even know he’s hurt when he doesn’t speak at all.

Gaining from human knowledge and understanding human feelings are both pretty high demands for NLP, but that is also why Shum believes it may be the key to general AI.

Shum is responsible for Microsoft’s AI efforts, and the company is currently throwing a lot of its research strength behind mastering NLP. Microsoft has released a visual, speech, knowledge graph, and search AI API package called Cognitive Services, and one part of that is the Language Understanding Intelligent Service, or LUIS.

What LUIS is meant to achieve, Shum says, is: “You say something, and it can parse what you say into verbs and nouns.” This way, a machine can understand the contextual background to a conversation through one or more passes, along with sentiment and intentions, so that eventually it can also converse with the user in a familiar style. LUIS’s technology can be used for developing chatbots with more human conversational skills.


Microsoft’s XiaoIce chatbot in China has been something of a success already, but in spite of its ability to chat, sing songs, and even write short poems, it still isn’t capable of fully human interaction. And yet, Microsoft has been very pleased with it nonetheless, because it has reached a level that few chatbots are able to, hitting an average of 23 conversational turns per session (CPS), already far surpassing Siri and Alexa.

As more and more chatbots come online, Microsoft hopes to be able to train a genuinely conversational AI, and Shum reiterates that this is the approach Microsoft believes will, someday, lead to a true general AI.

In Arrival, once Dr. Banks learns the aliens’ language, she is able to perceive future events, which allows her to play a part in averting disaster for both humanity and aliens.

Our fate doesn’t likely hang on our learning a new language, nor is it likely to give us superpowers. But language itself is already a superpower, in a way. And whatever AI we create in the future, whether it will be able to understand us or not, sympathize with and learn with us or not, will depend on its ability to comprehend natural human language.