Arik Hesseldahl

Recent Posts by Arik Hesseldahl

Seven Questions About Big Data and Analytics for IBM’s Steven Mills

The one thing you probably know about IBM is that it’s big. It does big projects with other big companies, and with governments. One concept it has been talking about lately is “big data,” or analytics. The idea is that you can find money-saving value in the intelligent analysis of patterns that might minimize or eliminate wasted effort, or detect common problems early, or anticipate a need before it becomes acute.

Last Friday I rode a bus up to IBM Research in Yorktown Heights, New York–site of the IBM Jeopardy Challenge from earlier this year–to sit in on a series of presentations on the benefits of analytics. But I also got a chance to sit down with Steve Mills, who’s been IBM’s Senior Vice President and Group Executive, IBM Software and Systems. He has some 100,000 employees reporting to him and oversees the operations that contribute about $40 billion worth of IBM’s revenue. If you were to look for someone who’s on the front lines with big IT projects intended to corral and extract value from mountains of data, he’s it.

We talked about opportunities IBM is finding around analytics in fields as varied as marketing, health care and pumping oil in the North Sea.

AllThingsD: So Steve, IBM is talking a lot these days about big data and analytics. Naturally, companies are always on guard to resist the temptation to invest in buzzwords–they have to bring some skepticism to the discussion and wonder if making a big investment is right for them. What kind of benefits are companies getting out of it, and what kind of questions should they be asking?

Steve Mills: Certainly larger enterprises have been buying IT and listening to IT companies for decades. They are not drawn in by the latest buzzwords, they are looking for value. In the 1990s was the last time you saw people making technology investments because they thought they had to, not because they knew exactly where they were going. It was an era where people thought they had to be Amazon or be ‘Amazoned.’ You had to ride the wave and be there. It was a period of overbuying. And it was a period of great promise, but not one of great value. It’s not to say there was no value. The Web improved data connectivity standards, data structures became more finite, HTML and XML, there was some lasting value that emerged from the feeding frenzy of the late 1990s. The aftermath of that was much more caution. Companies are much more careful now. With this whole notion of big data and data analytics, companies are asking themselves what their big analytic problems are, and wondering where they could get more value from having more data. They get that the cost of bringing that data in has come down. They can store and organize it and analyze it and hunt for patterns within it. These are things that were always of interest but the costs have come down to a point where they can do them. The economics have improved, but they still want to go through the exercise of determining what kind of value they get from it. What might look like a great use of technology might not deliver much return.

Well, that’s the most important question, isn’t it? The value looks great on paper, but then there are practical limitations. I’m thinking of someone who said to me the other day that is a great product and can deliver a lot of value, but then the salespeople that it’s aimed at are often a little lazy and don’t put all the information they can into it which limits the value somewhat. There are human elements that can limit the value. Do you run into that much?

There’s no such thing as a big data program without a pilot. The more pilots you run the more real deployments you do. Will every pilot result in an immediate deployment. No. Does that mean you don’t see any value? Not really, but its just that as you run it and begin to see the complexity, you might back away, other priorities might come in, projects get shelved. I think the discussion of big data is better when its around an applied operational definition rather than a theorertical one. You look at areas where business are exploring this and looking for value. We see a lot of consumer packaging companies looking at what’s known as sentiment analysis. They want to collect things out of the blogosphere and Tweets and whatever information there is about what people are saying about the products they produce. Historically they might have run focus groups, and they probably still will, but you see them reaching out for large amounts of unstructured data, and a lot of it is garbage. They’re hunting for the jewels, and it comes back to the issue that the technology is there, it’s affordable. If you talk to any consumer package goods company, they’re definitely looking at this and beginning to make some attempts to determine if they can get any incremental value from analytics.

So where does IBM fit into all of this?

As you saw last year we acquired Unica and Coremetrics. Those technologies are geared toward the ability to bring together large amounts of information around marketing, on-the-Web activity, effectiveness of marketing campaigns in the case of Unica, and effectiveness of Web presence in the case of Coremetrics. And use these technologies as ways to sift through that and look for trends that steer you toward what’s working. There’s a lot of scenarios in big data around health care and the use of the Watson technologies. The idea is to deal with a big corpus of information and bring it together in a way that delivers value against a particular purpose. The whole medical effort is to create a physician’s assistant, not to replace doctors but to give them more at-the-fingertips insight based on the symptons the doctor is observing. If you have access to all the outcome data that’s available it’s possible to hone in on the problem much more quickly.

I feel like we’ve heard this discussion around bringing IT to health care for years, and yet there’s still so much resistance and inertia. What’s going to change that now?

That resistance will be broken by providers with control over their physicians. So if Kaiser Permanente starts to dictate to its physicians what prescribed approaches they have to take, then they’ll be required to participate. The Mayo Clinic does this today. It’s a very tightly controlled environment. They have an enormous amount of data and they use it quite effectively. Its part of what’s required to be a physician at the Mayo. And the Mayo doesn’t share its data, it thinks it has a strategic advantage in the way it uses data. Other institutions may be more open to that. But in the end, if a doctor wants to get paid, they’ll used the tools that are prescribed. I think there’s enough impetus for change. WIll every independent practitioner use these tools? No, not near term. Will major institutions adopt? There’s no question that they will.

Obviously, this takes place against the backdrop of the larger debates around health care reform and the President’s plans, and some Congressional opposition to it. Where does this all fit in relation to that?

Our view is that the change will be driven at the state and local level. Local private health care entities will play a role. The major health care payer companies, the Wellpoints, the Blue Cross Blue Shield network, Cigna, Aetna, they all have a vested interest in bringing more efficiency to health care. It will move from the bottom up rather than from the federal level down.

What kind of results are you seeing? Is there a standard yardstick you use to think about return on investment? Or do you have enough data on that yet?

You have to break it down by area. The New York State Department of revenue is seeing some pretty impresive results in using analytics to collect delinquent taxes. I don’t know how many thousands of a percent return that amounts to from the initial investment. The level of investment is modest versus the recovery of money. The project paid for itself overnight. Similar projects at Medicaid meant to detect fraud and abuse pay for themselves typically within a matter of months. There are areas that have that immediate kind of payback. Often its where you know you’re paying for something or losing money, you’re able to stop the bleeding. We’ve done some very sophisticated projects related to energy and energy transmission that pay for themselves in six months. It’s a little gear driving the big gear situation. The capital and operational investments at power companies are so high, that when a little IT is applied geared toward optimizing preventative maintenance, the investment is paid for in weeks. The problem is usually so obvious, and it’s usually just a matter of correlating and analyzing the data you have available to realize a better approach. The small gear of IT investment spins very fast relative to the big gear that comes from high opex and capex investment. In those cases the payback is really fast. Other areas are less clear, which is why you do pilot projects that start small.

What do you generally recommend when companies do these pilot projects? Is there a classic approach that applies widely?

You have to have some intuitive sense of where you want to go. When we started with the New York State Tax initiative, they had already been doing their own investigative work on fraudulent tax returns, so they knew what was going on, people weren’t paying correctly for whatever reason. They had some sense that analytics was going to help them, but it didn’t mean simply throwing money at the problem was going to work. We had to come up with a selected set of uses cases and figure out how to best use the technology for maximum result. You don’t want to hunt for nickels and dimes. You want to go where the money is. You also don’t want to investigate a lot of false positives.

This kind of work would, I think apply so widely to so many areas of human endeavor, I wonder if there’s one project you’ve done that sticks out as having been unusual?

We did a big project with Statoil out of Norway, trying to get more pumping days out of the year from their North Sea oil platforms given the extreme conditions and the equipment they have to maintain. They had a problem with excessive downtime. We worked with them on their instrumentation, then collecting and formatting the data, applying time index data to it because that’s critical. That was a two-year project, and we had people out there working on the platforms. There were 62 use cases, none of which existed before we started the project. There was a lot of effort from IBM people working with Statoil, but the payback was enormous: Given the relative value of a barrel of oil you don’t need that large a percentage in improvement for the investment to pay off. It was a two-year pilot that showed you could do these things, and that it was cost effective to do it.

(Image courtesy IBM’s Flickr feed.)

Latest Video

View all videos »

Search »

First the NSA came for, well, jeez pretty much everybody’s data at this point, and I said nothing because wait how does this joke work

— Parker Higgins via Twitter