Netflix Abuses Amazon With Monkeys. Now You Can Too

To ensure that its massive video-streaming service can withstand the rigors of life on the public internet, Netflix spends an awful lot of time attacking the thing with a monkey. And now, you can sic the same monkey on your own internet services.
Image may contain Animal Wildlife Mammal Monkey and Baboon
ndj5/Flickr

To ensure that its massive video-streaming service can withstand the rigors of life on the public internet, Netflix spends an awful lot of time attacking the thing with a monkey. And now, you can sic the same monkey on your own internet services.

On Monday, the company open sourced its "Chaos Monkey," software that randomly turns off virtual machines running beneath its streaming service, a way of simulating the small outages the service will inevitably face day after day. This means that anyone can use the tool or even modify its source code.

This is just one of many software "monkeys" Netflix has built to test its online service, and eventually, it will open source the entire Simian Army.

The Netflix video-streaming service runs in part on Amazon Web Services, the massively popular set of "cloud services" that provides instant access to computing infrastructure over the net. Chaos Monkey is designed to scurry around AWS and start turning off virtual machines. "We have found that the best defense against major unexpected failures is to fail often," Netflix says in a blog post announcing the open sourcing of the tool. "By frequently causing failures, we force our services to be built in a way that is more resilient."

There are certain advantages to running a web service from a cloud service such as AWS, but as with any piece of computing infrastructure down here on earth, there are bound to be failures. Netflix experiences a high-profile outage last month when Amazon experienced problems with a data center in Virginia -- problems that took down several other big name sites including Instagram and Pinterest.

Chaos Monkey couldn't prepare Netflix for that outage. After a storm cut the power to Amazon's data center, the company's backup generators failed to kick-in, and a bug in its load balancers failed to spread the traffic across other computing facilities. But the simian software could help prepare you for other failures.

After Netflix first discussed its Chaos Monkey early last year, Jeff Atwood -- the cofounder of the popular developer Q&A service Stack Exchange -- praises the idea, saying that his company, Stack Exchange, solved its outage problem only after its embraced the real life Chaos Monkey that was hitting its infrastructure.

"Sometimes, you don't get a choice. The Chaos Monkey chooses you...Every few days, one of our servers -- no telling which one -- would randomly wink off the network," he said in a blog post last year. "Every week that went by, we made our system a tiny bit more redundant, because we had to. Despite the ongoing pain, it became clear that Chaos Monkey was actually doing us a big favor by forcing us to become extremely resilient."

Netflix also uses a tool called Janitor Monkey, which shuts down other system resources that aren't being used. And then there's Security Monkey, which looks for service-configuration and, yes, security flaws. These will be open sources at some point in the future.