Better Algorithms Beat More Data — And Here’s Why

Image via vichie81

Recently, Omar Tawakol from BlueKai wrote a fascinating article positing that more data beats better algorithms. He argued that more data trumps a better algorithm, but better still is having an algorithm that augments your data with linkages and connections, in the end creating a more robust data asset.

At Rocket Fuel, we’re big believers in the power of algorithms. This is because data, no matter how rich or augmented, is still a mostly static representation of customer interest and intent. To use data in the traditional way for Web advertising, choosing whom to show ads on the basis of the specific data segments they may be in represents one very simple choice of algorithm. But there are many others that can be strategically applied to take advantage of specific opportunities in the market, like a sudden burst of relevant ad inventory or a sudden increase in competition for consumers in a particular data segment. The algorithms can react to the changing usefulness of data, such as data that indicates interest in a specific time-sensitive event that is now past. They can also take advantage of ephemeral data not tied to individual behavior in any long-term way, such as the time of day or the context in which the person is browsing.

So while the world of data is rich, and algorithms can extend those data assets even further, the use of that data can be even more interesting and challenging, requiring extremely clever algorithms that result in significant, measurable improvements in campaign performance. Very few of these performance improvements are attributable solely to the use of more data.

For the sake of illustration, imagine you want to marry someone who will help you produce tall, healthy children. You are sequentially presented with suitors whom you have to either marry, or reject forever. Let’s say you start with only being able to look at the suitor’s height, and your simple algorithm is to “marry the first person who is over six feet tall.” How can we improve on these results? Using the “more data” strategy, we could also look at how strong they are, and set a threshold for that. Alternatively, we could use the same data but improve the algorithm: “Measure the height of the first third of the people I see, and marry the next person who is taller than all of them.” This algorithm improvement has a good chance of delivering a better result than just using more data with a simple algorithm.

Choosing opportunities to show online advertising to consumers is very much like that example, except that we’re picking millions of “suitors” each day for each advertiser, out of tens of billions of opportunities. As with the marriage challenge, we find it is most valuable to make improvements to the algorithms to help us make real-time decisions that grow increasingly optimal with each campaign.

There’s yet another dimension not covered in Omar’s article: the speed of the algorithms and data access, and the capacity of the infrastructure on which they run. The provider you work with needs to be able to make more decisions, faster, than any other players in this space. Doing that calls for a huge investment in hardware and software improvements at all layers of the stack. These investments are in some ways orthogonal to Omar’s original question: they simultaneously help optimize the performance of the algorithms, and they ensure the ability to store and process massive amounts of data.

In short, if I were told I had to either give up all the third-party data I might use, or give up my use of algorithms, I would give up the data in a heartbeat. There is plenty of relevant data captured through the passive activity of consumers interacting with Web advertising — more than enough to drive great performance for the vast majority of clients.

Mark Torrance is CTO of Rocket Fuel, which provides artificial-intelligence advertising solutions.

Must-Reads from other Websites

Panos Mourdoukoutas

Why Apple Should Buy China’s Xiaomi

Paul Graham

What I Didn’t Say

Benjamin Bratton

We Need to Talk About TED

Mat Honan

I, Glasshole: My Year With Google Glass

Chris Ware

All Together Now

Corey S. Powell and Laurie Gwen Shapiro

The Sculpture on the Moon

About Voices

Along with original content and posts from across the Dow Jones network, this section of AllThingsD includes Must-Reads From Other Websites — pieces we’ve read, discussions we’ve followed, stuff we like. Six posts from external sites are included here each weekday, but we only run the headlines. We link to the original sites for the rest. These posts are explicitly labeled, so it’s clear that the content comes from other websites, and for clarity’s sake, all outside posts run against a pink background.

We also solicit original full-length posts and accept some unsolicited submissions.

Read more »