Arik Hesseldahl

Recent Posts by Arik Hesseldahl

Hadoop Start-Up Cloudera Teams Up With Storage Player NetApp

If a company has a batch of data of any reasonable size and wants to do anything useful with it, chances are that at one point or another it’s going to wind up using some version of Hadoop.

Hadoop, whose mascot is a cute cartoon elephant, is open source software based in part on a technique called MapReduce. Initially developed at Google, it makes big jobs involving the processing of large sets of data manageable. And while anyone can go get the open source software for free and put it to use, the number of start-up companies trying to build a business around helping other companies use Hadoop effectively is multiplying. A team of Hadoop engineers recently spun out of Yahoo as a start-up called Hortonworks, and another Hadoop outfit called MapR landed $20 million in venture capital funding in August.

To me, the best-known among the Hadoop start-ups is Cloudera. Backed by $36 million in investments from Accel Partners, Greylock Partners, Meritech Capital Partners and In-Q-Tel, Cloudera has probably got the biggest head start among the Hadoop companies. Its customers include eBay, Groupon and AOL.

Cloudera is also the company behind the Hadoop World conference that begins tomorrow in New York; as such, the eyes of the Hadoop — er, universe — will be paying attention to what goes on here.

The first bit of news is that Cloudera will be teaming up with the storage concern NetApp, which is announcing a turnkey product called the NetApp Open Solution for Hadoop. (One of these days people will dispense with using the word “solution” in this way. Alas, not yet!) Basically, the idea is to make Hadoop and Cloudera’s subscription support service easy to deploy from within NetApp storage hardware. NetApp will become a Cloudera reseller.

One problem companies deploying Hadoop often run into is the need for more storage, says Jeff O’Neal, senior director for data center solutions at NetApp. “When you deploy Hadoop in the traditional way, the ratio between computing power and storage is locked, and here we’re opening that up.”

Why pick Cloudera, when NetApp could have just as easily slapped on a freebie Hadoop installation and sold it alongside its own hardware? Speed. Cloudera, O’Neal says, can help customers get their Hadoop installations up and running faster than they otherwise would. “We can take weeks or even months out of the cycle of getting the infrastructure up and running,” O’Neal says.

The deal will also get Cloudera exposed to some new high-rolling customers where NetApp has some strengths, says Kirk Dunn, Cloudera’s COO. NetApp, for one thing, does a lot of business with federal government customers in the areas of defense and intelligence, and their data needs aren’t getting smaller. “The workloads are big. The velocity of data coming at both the compute and storage racks are significant,” Dunn says. So is the size of the data. Consider, for example how the military and intelligence community are creating more satellite imagery than ever before; then consider that all that data has to be sorted and analyzed in an efficient way. Outside of government, banking and financial institutions want to sift through the increasing stream of information on people and companies to determine risk.

The amount of data that companies are generating is huge. Five or six years ago, the average large corporation had maybe 360 terabytes of data lying around, Dunn says. Cloudera has some customers that are generating about that much new data nearly every day, he says, and it’s not slowing down. “The problems only get more vexing as time goes on. They sure aren’t getting any simpler,” he says. After years of helping those companies and governments store all that data, NetApp is uniquely positioned, Dunn says, to go back to those organizations and sell them on the idea of mining that data for useful information. “For NetApp, this is as basic as motherhood and apple pie.”

Latest Video

View all videos »

Search »

Another gadget you don’t really need. Will not work once you get it home. New model out in 4 weeks. Battery life is too short to be of any use.

— From the fact sheet for a fake product entitled Useless Plasticbox 1.2 (an actual empty plastic box) placed in L.A.-area Best Buy stores by an artist called Plastic Jesus