Arik Hesseldahl

Recent Posts by Arik Hesseldahl

IBM’s Project Sparta Is Now Named PureData

So remember last month how I told you about Project Sparta? That would be the IBM project aimed at simplifying how companies attack their big-data problems.

Well, it has a name now, and as I suspected, it’s part of the growing Pure line. It has been dubbed the PureData System, and it comes in three flavors: One optimized for transactions, one for operations and one for big-data analytics. IBM announced the system late Monday, in connection with an event in Singapore.

In the announcement, Big Blue included one of those big-picture observations about the state of data and the unceasing struggle to get a handle on it all. According to IBM’s reckoning, 2.5 exabytes of data is created every day. (You know what a gigabyte is; after that are terabytes, then petabytes, then exabytes.) And the amount is growing so fast that 90 percent of the data that now exists has been created in the last two years. What this means is that the amount of data that companies and governments and people are creating is growing like crazy, and that doesn’t even begin to get the point across.

I talked with Arvind Krishna, general manager, IBM Information Management, and he compared the different flavors to a Web site selling stuff: One version of the system can handle all the sales; another handles the analytics one might use to figure out what combinations of products people buy together at different times of the year, or to watch for credit card fraud.

With these new systems, IBM is promising that it can handle problems like these in minutes instead of hours.

And it’s relatively easy to deploy, requiring less than 10 days to spin up, versus the six months that used to be required for this sort of thing. Once the machine is up, the database starts running with a single click on a console.

This quest for simplifying big problems is very much in vogue now. Oracle talked about it a great deal last week, in connection with its Exa line of engineered systems.

Here’s an example of the kinds of problems companies are grappling with: The Premier healthcare alliance is a collection of 2,700 hospitals in the U.S., and some 90,000 other health care facilities that collaborate to try to boost the overall quality of care they deliver. The group has selected IBM’s PureData System to analyze the largest collection of clinical, financial and medical outcomes to find useful patterns.

The database includes information on one out of every four hospital patient discharges in the U.S., 2.5 million clinical transactions every day gathered in real time, and $43 billion worth of annual purchasing data. It doesn’t take much imagination to assume that all this produces a lot of data.

If you can analyze data like that for patterns, you can make a real difference in the day-to-day efficiency of medical professionals, and probably save money along the way, with the result being better outcomes for patients.

Other potential customers, Krishna told me, are companies in the payment processing industry, who are constantly on the lookout for credit card fraud. “In the 200 or 300 milliseconds you have while the payment is going through, you can do a quick fraud assessment,” Krishna told me. “That’s an example of what we call a mixed workload.”

These are just the latest examples of IBM’s intention to boost its big-data business. It has acquired more than 30 companies in recent years to enhance its capabilities around analyzing data for useful business intelligence. Just last month, it announced plans to acquire U.K.-based Butterfly Software, and in April it acquired Vivismo and Varicent. Additionally, IBM has pledged to spend $100 million of its research and development budget over the next five years to tackle big-data problems.

The PureData system is the latest result of a four-year, $2 billion research and development effort that has so far yielded two other products: PureFlex and PureApplications. The PureFlex product combines computing, data storage, systems management and networking components in a single integrated product that’s preconfigured for the customer and is intended to be easy to deploy in a data center. PureApplications is a machine designed for database and Web transactions.

Latest Video

View all videos »

Search »

The problem with the Billionaire Savior phase of the newspaper collapse has always been that billionaires don’t tend to like the kind of authority-questioning journalism that upsets the status quo.

— Ryan Chittum, writing in the Columbia Journalism Review about the promise of Pierre Omidyar’s new media venture with Glenn Greenwald