Microsoft Computational Network Toolkit offers most efficient distributed deep learning computational performance

Microsoft Computational Network Toolkit offers most efficient distributed deep learning computational performance

Posted by Xuedong Huang, Chief Speech Scientist

Xuedong Huang

For more than 20 years, Microsoft has invested in advanced speech recognition research and development. It's great to see the return on that investment in products and services such as Windows Cortana, Skype Translator, and Project Oxford Speech APIs. Microsoft researchers pioneered using deep neural networks for speech recognition, and earlier this year, our speech researchers shared our deep learning tools with the speech research community when we introduced the Computational Network Toolkit (CNTK ) under an open source license at the ICASSP Conference in April 2015.

Since the debut of CNTK in April, we've significantly improved machine learning efficiency with Azure GPU Lab.

…(read more)

Microsoft Computational Network Toolkit offers most efficient distributed deep learning computational performance

Microsoft Computational Network Toolkit offers most efficient distributed deep learning computational performance

Posted by Xuedong Huang, Chief Speech Scientist


Xuedong Huang (Photo by Scott Eklund/Red Box Pictures)

For more than 20 years, Microsoft has invested in advanced speech recognition research and development. It’s great to see the return on that investment in products and services such as Windows Cortana, Skype Translator, and Project Oxford Speech APIs. Microsoft researchers pioneered using deep neural networks for speech recognition, and earlier this year, our speech researchers shared our deep learning tools with the speech research community when we introduced the Computational Network Toolkit (CNTK) under an open source license at the ICASSP Conference in April 2015.

CNTK is a unified computational network framework that describes deep neural networks as a series of computational steps via a directed graph. In a directed graph, each node represents an input value or a network parameter, and each edge represents a matrix operation upon its children. CNTK provides algorithms to carry out both forward computation and gradient calculation. Most popular computation node types are predefined and users can easily extend node types under the open source license. With the combination of CNTK and Microsoft’s upcoming Azure GPU Lab, we have a modern, distributed GPU platform that the community can utilize to advance AI research.

Since the debut of CNTK in April, we’ve significantly improved machine learning efficiency with Azure GPU Lab. The combination of CNTK and Azure GPU Lab allows us to build and train deep neural nets for Cortana speech recognition up to 10 times faster than our previous deep learning system. Our Microsoft colleagues also have used CNTK to run other tasks, such as ImageNet classification and a deep structured semantic model. We've seen firsthand the kind of performance CNTK can deliver, and we think it could make an even greater impact within the broader machine learning and AI community. It’s our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code.

For mission critical AI research, we believe efficiency and performance should be one of the most important design criteria. There are a number of deep learning toolkits available from Torch, Theano and Caffe to the recently open sourced toolkits from Google and IBM. We compared CNTK with four popular toolkits. We focus on comparing the raw computational efficiency of different toolkits using simulated data with an effective mini batch size (8192) in order to fully utilize all GPUs. With a fully connected 4-layer neural network (see our benchmark scripts), the number of frames each toolkit can process per second is illustrated in the chart. We include two configurations on a single Linux machine with 1 and 4 GPUs (Nvidia K40) respectively. We also report our 8-GPU CNTK speed on Azure GPU Lab with 2 identical Linux machines (2 x 4 GPUs) as used in the baseline benchmark. CNTK compares favorably in computational efficiency for distributed deep learning (4 GPUs or 8 GPUs) on all these toolkits we tested. CNTK can easily scale beyond 8 GPUs across multiple machines with superior distributed system performance.

We understand there are many design considerations to balance between computational performance and flexibility, and each toolkit has its unique strengths. TensorFlow offers a user-friendly Python interface; Theano is unique with its symbolic operation; Torch uses Lua programming language; Caffe is popular for computer vision researchers due to its efficient performance; and CNTK on Azure GPU Lab offers the most efficient distributed computational performance.

Until now, our focus with CNTK has primarily been within the speech research community. As a result, its superior distributed computational performance capabilities aren’t well known within the broader AI community. We hope to change that with our workshop on CNTK this Friday at the Neural Information Processing Systems (NIPS) Conference. Dong Yu and I are looking forward to sharing CNTK’s capabilities with the broader AI community. We’re also looking forward to sharing future developments as we work together to deliver computing systems that can see, hear, speak, understand, and even begin to reason.

Related:

At Microsoft Research Asia, artificial intelligence is informing, and informed by, the human experience

At Microsoft Research Asia, artificial intelligence is informing, and informed by, the human experience

Posted by Allison Linn

Artificial intelligence and the human experience

When most people use automated speech recognition technology today, it's because they have a task that needs to get done: A person to call, directions to get, a quick text to send.

In China, millions of people are using this type of natural language processing in a much more human way: To carry on a casual conversation with a Microsoft technology called XiaoIce.

Hsiao-Wuen Hon, corporate vice president in charge of Microsoft Research Asia, sees XiaoIce as an example of the vast possibility that artificial intelligence holds — not to replace human tasks and experiences, but rather to augment them. This way in which advanced technology is increasingly being used to create very human experiences is just beginning, he noted.

"We've just barely scratched the surface," he said.

…(read more)

At Microsoft Research Asia, artificial intelligence is informing, and informed by, the human experience

At Microsoft Research Asia, artificial intelligence is informing, and informed by, the human experience

Posted by Allison Linn

Artificial intelligence and the human experience
Hsiao Wuen-Hon, corporate vice president in charge of Microsoft Research Asia, demonstrates
XiaoIce at the 21st Century Computing Conference in Beijing.

When most people use automated speech recognition technology today, it's because they have a task that needs to get done: A person to call, directions to get, a quick text to send.

In China, millions of people are using this type of natural language processing in a much more human way: To carry on a casual conversation with a Microsoft technology called XiaoIce.

Hsiao-Wuen Hon, corporate vice president in charge of Microsoft Research Asia, sees XiaoIce as an example of the vast possibility that artificial intelligence holds — not to replace human tasks and experiences, but rather to augment them. This way in which advanced technology is increasingly being used to create very human experiences is just beginning, he noted.

"We've just barely scratched the surface," he said.

Hon recently joined some of the world's leading computer scientists at the 21st Century Computing Conference in Beijing, an annual meeting of researchers and computer science students, to discuss some of these emerging trends.

Microsoft Research Asia has been hosting the conference since 1999, as a way to give young and promising computer scientists in Asia Pacific the opportunity to meet and talk with some of the world's most renowned computer scientists. This year alone, the conference included two Turing Award winners.

Artificial intelligence was one of the hottest topics at the conference, because of the recent, major advances in these technologies that can see, hear, speak and even understand. It's also one of the primary focuses of Microsoft's research lab in China, where more than 230 researchers are doing cutting-edge research in areas including natural user interfaces and next-generation multimedia.

The lab's researchers have contributed key elements of many products that consumers use today, including the real-time translation tool Skype Translator and products such as Windows, Office, Bing, Xbox, Kinect and Windows Phone, in collaboration with other Microsoft research labs and groups.

Peter Lee, the Microsoft corporate vice president whose responsibilities include overseeing Microsoft Research Asia, said efforts like Skype Translator have been part of Microsoft Research's strategy of aligning research around a specific goal they want to accomplish.

"How can we eliminate the language barrier for all people on the Internet, by using Skype? That's a goal and it's very, very exciting," he said.

Big bets and concrete accomplishments

The Asia lab's researchers also are looking much further into the future, at tools that may seem outlandish today but that we could take for granted in years to come.

Lee noted that it is both important and alluring to do research that goes directly into products consumers use immediately, such as Skype Translator. But, he said, a research lab also needs to be thinking about bigger, bolder bets that may not pay off immediately but could change the world in the future.


Peter Lee, corporate vice president for Microsoft Research, discusses the future of computing at
the 21st Century Computing Conference in Beijing.

He noted that many of the world's most important technological innovations, including the personal computer and the transistor, came out of corporate research labs. One challenge is that people don't always immediately recognize how important an innovation is, and only see its importance a few years down the line.

"Sometimes I wonder if Microsoft Research has already invented the idea that will change the world, but we just don't know it yet," he told journalists at the 21st Century Computing Conference.

A more personalized search engine

That's why researchers at the Asia lab are looking at both near- and long-term implications of the work they are doing.

Take XiaoIce, for example. The fact that so many Chinese people like to talk to her – about their day at work, the weather or current events – makes Hon think that, in the future, speech recognition technology could be used to make all sorts of tools more personalized and more human.

Think of a search engine that worked more like XiaoIce. Instead of just typing a phrase into a box, Hon envisions a world in which the search engine is more like a personal assistant you have a conversation with.

"The search engine could change into more like your chatting buddy," Hon said.

The buddy would do research for you and offer opinions, and it would be human-like in other ways as well, he said. For example, the chatbot could remember your search chat for the next few days and send you additional information as it becomes available.

And, just like most people have many human contacts with different areas of expertise, you might have multiple chatbots who are experts in various topics, like medicine, cooking or hiking. You also might choose to take the chatbot's advice, or ignore it and seek out another chatbot for additional perspectives.

An invisible revolution

The advances in speech recognition and natural language processing are part of what Harry Shum, Microsoft's executive vice president in charge of technology and research, calls the "invisible revolution" in technological progress.

Until now, many of the biggest leaps in technology have come from faster computers and better gadgets. But in the coming years, Shum says, many of those new discoveries will instead come from tools that you use but don't necessarily see. Those include cloud computing systems that can analyze vast amounts of data in a just a few minutes and productivity tools that can use machine learning, in which systems get smarter as they amass more data, to better understand and anticipate people's needs.

In China, Hon said he's already seeing that revolution take place with tools such as face recognition technology.

Researchers and other Microsoft employees in China recently partnered with a leading non-governmental organization to use face recognition technology to help locate missing children. The Photo Missing Children project uses face recognition technology from a suite of tools called Project Oxford to help the organization recognize and find children.

Hon expects to see many more of these types of tools in years to come, and he said those will come more quickly because of recent, major advances in artificial intelligence and the related fields of machine learning and big data analysis.

Thanks to these tools, researchers can now analyze larger amounts of data much more quickly, and then use that data to train systems to do complex tasks. That will help everyone from company executives who want to better understand sales trends to teenagers who want to accurately dictate texts in loud rooms.

"Why are artificial intelligence and machine learning and big data so exciting?" Hon asked. "Because it touches everything we do."

Related:

Allison Linn is a senior writer at Microsoft Research. Follow Allison on Twitter.