Jul 31, 2013 6:30 AM

Ex-Facebookers Feed Zuck's Code Into New Data Revolution

Nikita Shamgunov and Eric Frenkiel first met during Facebook bootcamp. Like every other new engineer hired by the company, they spent eight weeks in the hacker equivalent of Parris Island, fixing bug after bug in the world's largest social network, coming to terms with the Facebook culture, and mentally winding their way through the sweeping, custom-built software systems that juggle data inside the house that Zuck built.

Nikita Shamgunov and Eric Frenkiel cofounders of San Francisco startup MemSQL.

If you buy something using links in our stories, we may earn a commission. Learn more.

Nikita Shamgunov and Eric Frenkiel first met during Facebook bootcamp. Like every other new engineer hired by the company, they spent a good eight weeks in the hacker equivalent of Parris Island, fixing bug after bug after bug inside the world's largest social network, coming to terms with the swashbuckling Facebook culture, and mentally winding their way through the sweeping software systems that juggle data inside the house that Zuck built.

"It's very much like bootcamp in that most graduate -- but some don't," Frenkiel says. "Even if you pass a series of interviews, they still want to see you perform for eight weeks -- for 10 weeks -- before you actually join a team."

Shamgunov and Frenkiel ended up on separate engineering teams, but they remained friends, often sharing the morning commute from San Francisco to Facebook's Silicon Valley headquarters, and in 2011, they left the tech giant to found their own company. It's called MemSQL, and it echoes Facebook in more ways than one.

>'It’s very much like bootcamp in that most graduate -- but some don’t. Even if you pass a series of interviews, they still want to see you perform for eight weeks -- for 10 weeks -- before you actually join a team.'

Eric Frenkiel

Frenkiel (the company's CEO) and Shamgunov (its CTO) installed their own engineering bootcamp inside the San Francisco startup -- "we even hired an ex-Marine officer," says Frenkiel, referring to executive vice president Carl Wright -- and with their new team of engineers, they created a software system that would mimic the Facebook machine, letting the rest of the world harness massive amounts of data in ways that are now routine with Zuckerberg and company.

MemSQL offers what's called an "in-memory database." Much like a Facebook creation known as Scuba, it spreads information across the memory systems inside dozens of computer servers, bypassing the (much slower) hard disks that traditionally house the world's information. The end result is a system that lets you retrieve and analyze data at unusually high speeds.

At Facebook, Scuba provides a means of instantly diagnosing problems with the enormous network of hardware and software that drives the company's web service, but MemSQL expands on this mission. It can help analyze the ins and outs of practically anything, from email marketing campaigns to trading activity on a stock exchange. "Zynga uses us," Frenkiel says, "and so does Morgan Stanley."

MemSQL is at the forefront of a much larger effort to move the world's digital data off the hard disk and into memory -- a trend that will ultimately let us juggle "Big Data" not only with greater speed but with greater accuracy. Inside the data centers that underpin its popular web services, Yahoo is shifting towards a in-memory tool known as Spark, which does all sorts of data analysis, and MemSQL is just one of several in-memory databases, which can handle both data analysis and the high-speed data transactions that are an integral part of so many websites. In other words, they can drive things like online user accounts and product purchases and maybe even bank payments.

These databases -- which also include names such as NuoDB and VoltDB and SAP's Hana -- are just beginning to find their way in the world. But some are already juggling live data inside businesses large and small. In an echo of Facebook and Scuba, Shutterstock -- a New York outfit that offers up an online library of photos, graphics, and videos -- uses MemSQL to analyze, and then improve, the operation of the computer servers and other hardware tools that drive its web service.

"It has performed crazy well, and it has allowed us to do some things we couldn't do before," says Chris Fischer, the vice president of operations at Shutterstock. "It lets us gauge the health of our [server] ecosystem. We want to know if something changes, and we want to know in real time."

Memory Serves. Again

The idea of a database that runs in computer memory is hardly new. TimesTen, an in-memory database offered by software giant Oracle, dates back to the mid-1990s. But as Frenkiel explains, MemSQL represents a new breed of in-memory database -- an in-memory database specifically designed to operate across a large number of machines.

"TimesTen is legacy," he says, sitting inside one of the fish-bowl conference rooms that line the outside of the startup's offices in San Francisco's SOMA neighborhood. "We now have an in-memory renaissance." This is driven in part, he says, by the decline in memory prices over the past few years. Standard computer servers now include as much as a terabyte of memory (aka one thousand gigabytes), if not more.

>'The difference between NoSQL and us is transactions. If you want to move $100 from one bank account to another, you need transactions.'

Mike Stonebraker

The new in-memory databases can handle more data more quickly, but they also give you the power to treat all that data as a whole. This is called "consistency," and basically, it means that someone looking at the data from one place sees the same thing as someone looking at it from another. If you don't have consistency, you can't analyze your data with complete accuracy -- and you certainly can't handle something as delicate as online bank transactions. "People care about whether their bank accounts remember their money," says Barry Morris, founder and CEO of NuoDB.

Over the past several years, we've seen the rise what the pundits call "NoSQL" databases, including MongoDB and Cassandra. This is a loose term, but generally, it refers to a new breed of web-centric database designed to scale across many machines. They let us store more data, but they aren't quite nimble enough to slice and dice it the way we can with a traditional database built for a single machine.

They can't use the familiar SQL language that businesses have long used to query their data, and typically, they can't maintain consistency across large datasets. With traditional SQL databases, for instance, you can use a command to instantly "JOIN" two separate datasets, so that you can then analyze them collectively. That's not something you can typically do with NoSQL.

Frenkiel and his MemSQL co-founder Shamgunov bill their creation as antidote to the limitations of the NoSQL brigade. Using a term first coined by a well-known industry analyst, Frenkiel calls their product as a "NewSQL" database. Like the NoSQL databases, he says, it scales across many machines, but unlike those older creations, it lets you query your data with tried-and-true SQL, and it provides the consistency you need for transactional applications -- or at least some of them. "You could use us to run a website," Shamgunov says.

Mike Stonebraker -- best described as the high priest of the database world -- delivers much the same message when discussing his own in-memory database, VoltDB. "The difference between NoSQL and us is transactions," he explains "If you want to move $100 from one bank account to another, you need transactions." The NewSQl market, Stonebraker says, is on "a rocketship."

Mike Stonebraker.

Photo: Wikipedia Commons

But the story is more complicated than that. At the moment, databases such MemSQL and VoltDB are suited to some tasks but not others (by the companies' own admission). In some respects, they're limited by amount of memory your systems have (though data can spill onto disks). And according to Andy Gross, the principal architect at Basho, an outfit that offers a NoSQL database called Riak, the NoSQL crowd is slowly improving their software so that they too can provide the sort of consistency you get from VoltDB and MemSQL.

The point, Gross says, is that the entire database world is evolving. So many databases -- whether they're tagged NoSQL or NewSQL -- are inching towards a new reality where we can store data across an enormous number of machines but still change and analyze it as if it was stored on a single system.

Google has already reached this nirvana with a creation it calls Spanner. This mind-boggling software platform spans the globe -- literally -- but thanks to some ingenuous engineering involving GPS devices and atomic clocks, it can treat that world of data as if it's in one place.

Barry Morris, the CEO and founder of NuoDB, says that his company has already brought Spanner-like technology to the masses. "Part of the magic of NuoDB," he says, "is that we can do it without atomic clocks." Others, such as Stonebraker, are skeptical, saying that unlike Spanner, NuoDB is still burdened by a lag time that would preclude rapid-fire database transactions. But at the very least, it's a step in the Google direction. And many others are moving the same way.

nproxy.org