Baidu, the 'Chinese Google,' Is Teaching AI to Spot Malware

The Chinese search giant is feeding known malware into a neural net to help its software recognize viruses the same way it recognizes faces.
Andrew Ng.
Photo: Ariel Zambelich/WiredAriel Zambelich/WIRED

Andrew Ng picks up his iPhone and opens an app called FaceYou.

Ng, the chief scientist at Chinese Internet giant Baidu, is eating lunch at his desk inside the company's Silicon Valley research lab, and naturally, the conversation revolves around artificial intelligence. Ng, who moonlights as a professor of computer science at Stanford, helped launch the Google Brain project at that other search giant down the road, and now, he's exploring similar AI research at Baidu.

FaceYou is a way of demonstrating some of the company's latest work with what's called deep learning. Released just before Halloween, the app taps into a live video image of your face, fitting your mug with a kind of virtual mask.

FaceYou

It can make you look like Barack Obama or Bill Clinton or JFK or, uh, a geisha—not to mention all sorts of classic Halloween ghouls. These masks move as your face moves, fitting your jaw, nose, and eyes just right. The app can do this, Ng says, because it has "learned" to identify over 70 different facial features and shape the masks accordingly.

It learns via a neural network—a network of machines that approximate the web of neurons in the human brain. In essence, Baidu feeds this neural net with thousands of images of human faces, and over time, it gets a sense of what a face looks like. At the big Internet giants, this kind of deep neural network is all the rage. Last week, Facebook showed how such neural networks can be used not only to recognize photos, but also, on some level, to understand natural language. And this week, during a briefing with reporters at its Mountain View headquarters, Google explained how neural networks can recognize spoken words and translate from one language to another.

All of this well documented. But deep learning is quickly pushing into other areas as well. Ng also says that Baidu is now using deep neural nets to help drive the company's security software (Baidu sells such Symantec-like anti-virus software in China). He declined to discuss the specifics. But in effect, the company is teaching a neural net to identify new malware by feeding it scads of known malware. Just as a neural net can learn to recognize a face, it can learn to identify a virus.

"You input the state of a system, and it tries to detect whether or not there's a threat, if someone is trying to do something that isn't supposed to be done," Ng says. "One specific example is anti-virus...You examine a [file] and try to determine if it's malicious."

Catching Viruses Before They Get Loose

Baidu isn't the only one trying to spot malicious code with artificial intelligence. This week, an Israeli company called Deep Instinct opened its doors, saying that has spent thew last two years building a security tool that can learn to identify malware in a similar way. "First, we tested our infrastructure with images, audios, and text," says Deep Instinct chief technology officer Eli David. "Then we applied it to cybersecurity." Meanwhile, other operations, including Microsoft and a company called Invincea, have published papers describing how this approach can work.

The technique is intriguing because it would allow tools to identify a particular piece of malware before it has been identified in the wild. Traditionally, anti-virus programs operate by tapping into a vast database of known malware—malware that has been explicitly identified by researchers. A neural net could identify a new piece of malware just because it looks like other malware—just because it resembles tens of thousands of viruses that have been identified in the past. "You can identify malware even if it's never been seen before," Ng says.

That said, many security experts question the value of such security software in general. "This falls into the category of 'show me in production.' We see bold claims like this all the time in the industry—that some new scientific technique creates a breakthrough in defense. Most of the time, it doesn’t really work out that way," says Rich Mogull, a security analyst and consultant with a company called Securiosis.

"In other words, it looks promising on paper, and maybe they even have great demos, but we really can’t say anything positive or negative until we see it in a real production environment and measure the results. Security is about stopping an adversary, not a technology, and where the two meet is much messier in reality than theory."

Tiny Intelligences

Whatever the ultimate value of this new breed of security software, the new tool from Deep Instinct points to a second trend in the world of deep learning. The company trains its model on vast neural nets inside the data center, but once the model is trained, it can run on smartphone and other small machines. According to David, the company puts a tiny agent on user phones, and this agent can identify malware without calling back to the data center.

Typically, this is not how a deep learning service works. It operates in two stages—training and execution—but both stages happen in the data center, tapping into a vast network of machines. (This is why Google Now doesn't work when you're not connected to the Internet.) But Researchers are now working to hone the execution stage so that it can run on phones, even without an Internet connection.

Google, for instance, can now do instant language translation on a phone. This lets you point your phone at a sign that's in a foreign language and instantly view it in English. And, in fact, Baidu's FaceYou app executes entirely on the phone. The rub is that it can lag at times. Getting these complex AI models onto such small devices isn't easy.

In any event, the seemingly frivolous FaceYou points the way to a rather wide range of less frivolous applications. Top Google engineer Jeff Dean says that deep learning is now used in dozens of Google applications, and this week, during that briefing with reporters, Google researcher Greg Corrado said that deep learning code shows up in over 1,200 project software libraries inside the company—which means that many projects are at least kicking the tires on this increasingly important technology.

Google recently revealed that deep neural nets now underpin its Internet search engine. In the past, Ng has said that, at Baidu, neural nets help target ads. Facebook is exploring systems that allow blind Facebookers to understand what's in the photos that turn up in their News Feed. AI is no longer a niche pursuit. It's just part of how we compute—or, indeed, how we live.