Arik Hesseldahl

Recent Posts by Arik Hesseldahl

Flash Madness Part 3: Pure Storage Comes Out of Stealth

This has been the summer of flash memory. So far we’ve seen the initial public offering of Fusion-io, which uses flash chips to get data in servers closer to the processor and thus speed things up.

Next we saw Violin Memory — which makes flash-based storage arrays that are intended to make enterprise applications run faster — land $40 million in venture capital funding.

Now we see a third player entering the “flash madness” narrative. Pure Storage is coming out of stealth today, announcing its plans to sell flash-based storage arrays. It is also announcing that it has landed a $30 million C-round led by Redpoint Ventures, with Samsung Venture Investment joining. (Yes, that would be the venture capital arm of the South Korean electronics giant that happens to be the world’s biggest manufacturer of flash memory.) Greylock Partners and Sutter Hill Ventures also participated. The latest round brings Pure’s total funding raised to date to $55 million.

So what is Pure Storage all about? I met up with CEO Scott Dietzen last week and got the download.

The fundamental problem with enterprise storage is that hard drives just can’t keep up with everything else that’s gotten faster in the data center. Flash memory is fundamentally faster, it uses less energy and it takes up less space. We all know this.

The problem with flash is that it has always tended to be more expensive than hard drives. Today, you can buy a one terabyte hard drive for $100 or less. But just try getting that same amount in flash memory and see if the price isn’t, well, a lot higher.

The same principles apply in the data center. CIOs would love to convert to flash-based systems, as long as they’re reliable and affordable and work with the applications and other hardware they already have.

Pure Storage is essentially promising to deliver just that, Dietzen says. The company’s first product is an all-flash storage array that is 10 times faster and 10 times smaller than hard-disk-based systems. It’s called the Pure Storage FlashArray, and it is being aimed at mainstream enterprises in a manner that’s easy to deploy.

Pure’s founders are John Colgrove — one of the founding engineers at Veritas, now part of Symantec — and John Hayes, a founding engineer at Bix, which was ultimately swallowed up by Yahoo. Dietzen hails from Yahoo as well, by way of its acquisition of Zimbra, where he was CTO.

An early key hire was Michael Cornwell, who was lead technologist for flash at Sun Microsystems (now part of Oracle). Cornwell also worked at Apple, where he was Manager of Storage Engineering for the iPod division, and oversaw that product’s transition to — you guessed it — flash memory. Remember the first iPod nano? That was his baby.

Another key name: Greylock venture partner Frank Slootman, the former CEO of Data Domain, is on Pure’s board.

So what’s so special about a storage array built on flash memory? “Disks get slower every year,” Dietzen says. “Intel says processors have gotten 175 times faster over the last 15 years.” Disks just keep getting more data packed onto them, which doesn’t really make them any faster. The mechanical arm inside the disk that grabs data from the platter really can’t go much faster. “Disks today are comparably slower than tape was 15 years ago,” he says.

This creates a problem. Storage needs are going up, but hard drives are slowing data centers down, preventing them from reaching their full potential. It’s only because of cost — about $5 per gigabyte — that hard drives are still appealing. Enterprise-grade flash, on the other hand, tends to cost $40 to $100 per gigabyte, and because flash is historically less reliable, you have to buy double what you really need.

Pure’s play is to get over the cost hurdle. Dietzen says the company can get the cost down to $5 per gigabyte and less.

How does it do that? By reducing the amount of data you actually store. What happens in enterprise environments is that various bits of data get copied and recopied, over and over. Imagine a big filing cabinet with 50 copies of each document scattered around in different folders, when you really only need one. Suddenly the size of that file cabinet need not be so big. The same applies in data storage: Why bother having 10 copies of the same block of data, when one or two will do?

Using a technique known as deduplication, a system can eliminate all those unneeded copies and thus streamline the whole operation. Deduplication, combined with compression, was the primary principle behind Slootman’s Data Domain, which is now part of EMC.

But deduplication is expensive on hard drives, and really doesn’t make sense. Because the mechanical arm in a hard drive is always searching around for where its next needed block of data is to be found, if you employ deduplication, you end up with a bunch of reference signs telling the arm where to go, Dietzen says. The end result is that the disk has to spin more, not less. Flash memory chips don’t have that problem. “We make that process fast, because there’s no performance hit to the deduping process,” he says.

On top of that, Pure has created some algorithms that make the process a lot more granular than on hard-disk-based systems, by working with smaller disk-sector sizes. How small? He wouldn’t say exactly.

Unlike other storage companies — like, say, EMC — Pure’s array, Dietzen says, is built from the ground up for running flash. “The disk-centric companies are slotting flash into places where disks used to be, but they’re not changing the software to take advantage of the flash, to protect the flash from uneven wear and other things.”

A few early companies have tried the hardware, among them the law firm of Fenwick & West, whose CIO Matt Kesner is quoted in Pure’s press release as saying that the data used for various workloads was reduced from 50 to 90 percent.

One key thing that’s going on in the data center these days is virtualization — running several virtual computers within one single physical computer. When you run a lot of virtual machines, you have a lot of data that, like the paper in that big file cabinet, is essentially the same. Dietzen says that Pure’s flash array is able to eliminate a lot of that data. “Even if those virtual machines are a mix of Windows and Linux, there are a lot of commonalities between them,” he says. It’s not uncommon to see the data footprint for virtual machines reduced by a factor of 15 or 20 to one.

And that has caused some interesting reactions among early customers trying out the array. “Some people try it and are shocked when they put 15 terabytes on it and see there’s only one terabyte and think we’ve lost a lot of their data,” Dietzen says. “It’s a little scary at first, but then they run all their workloads and see all the data is there.”

Latest Video

View all videos »

Search »

First the NSA came for, well, jeez pretty much everybody’s data at this point, and I said nothing because wait how does this joke work

— Parker Higgins via Twitter