More Data or Better Algorithms? Not So Fast.


Image via Kronick

It’s always a pleasure to follow a stimulating debate between great thinkers within our industry. Recently, Rocket Fuel’s CTO Mark Torrance wrote “Better Algorithms Beat More Data — And Here’s Why” in direct response to BlueKai CEO Omar Tawakol’s piece, “More Data Beats Better Algorithms — Or Does It?” For his part, Mr. Torrance takes Mr. Tawakol to task for asserting that more data trumps algorithms, but that’s not quite Mr. Tawakol’s point. In fact, the bulk of his article explores the importance of having an algorithm that connects disparate data points, giving them enhanced meaning and usefulness through better context.

Still, I think we can all safely agree that both data and algorithms are absolutely necessary to complete any analytical project. But, regardless, the real point of any successful analytics project is to help an organization achieve a specific business goal. In that light, I hope we can also agree that marketing success is actually driven by four primary considerations: business acumen, data, algorithms and operations. Let’s take a look at each.

Business Acumen

By getting caught up in the data and the math, we can easily forget that analytics projects live and die on business knowledge. Analytics projects, therefore, must always begin with a clear business goal. What does this campaign seek to accomplish? What activities or actions does the marketer wish to encourage or measure? What does the organization already know about key prospects? And what pitfalls will marketers need to anticipate and avoid?

The answers to such questions will influence the other three analytics drivers. For example, digital media optimization models require a “dependent” variable (data), which is often expressed in terms of converters and non-converters. Naturally, the stated business goal will drive which users are deemed “converters.” If some arbitrary mismatch exists between goals and definition, the campaign may very well fail.

Data and Algorithms

I linked these two drivers together to emphasis the point that data and algorithms must be used in tandem. In truth, data scientists spend much of their day employing and refining algorithms. Yet I can’t quite accept Mr. Torrance’s example concerning how best to select a marriage partner if the goal is to produce tall and healthy children. The simple algorithm, he says, might be to marry the first suitor who’s over six feet tall. You could add more data, such as a threshold for strength, to get better results, he says. But for best results, a better algorithm is what’s needed. He writes:

“Measure the height of the first third of the people I see, and marry the next person who is taller than all of them. This algorithm improvement has a good chance of delivering a better result than just using more data with a simple algorithm.”

I am not so sure I agree. Without a doubt, a perfect algorithm that considers height alone will select one of the tallest suitors in the community. But is that sufficient to achieve the stated goal of tall and healthy children? What if that marriage partner happens to have a transmittable genetic condition, or is horribly grumpy? Wouldn’t knowing more about the partner (i.e. more data types) lead to a healthier life for all involved?

Of course, we can’t assume that the more data you have, the better off you’ll be. Useful information that you didn’t have before (orthogonal data, in analytics speak) trumps unlimited data for the simple reason that not all data are created equally. Here’s an example of how that’s the case:

Let’s say you’re building a model that will help you find likely prospects for a new luxury sedan. Now let’s say your analytics model begins with one known input: Household income. Given a choice between additional data and a better algorithm, which should you choose? Additional information — purchase intent — say, will tell you something you didn’t know before. It will be useful therefore to know if a user is interested in purchasing high-end vehicles. But, given a choice between a better algorithm and adding new data such as individual assets under management, the better algorithm may be the best approach. Purchase intent represents new information and insight directly relevant to the business goal. But another measure of affluence? Not so much.


All the best data and algorithms will be for naught if analytics isn’t fully embedded and widely distributed within appropriate marketing systems, so that the analytics can be directly leveraged whenever and wherever it’s most needed. At eXelate, we recognize that platform flexibility is a critical component for realizing digital media success via cross channel marketing execution.

One last point I’d like to make has to do with the value of contextual and behavioral data. In his article, Mr. Torrance writes, “At Rocket Fuel, we’re big believers in the power of algorithms. This is because data, no matter how rich or augmented, is still a mostly static representation of customer interest and intent.” As far as I can tell, Rocket Fuel therefore sees all data as contextual data, limited to the time of the marketing event.

While it’s of course possible to ignore the time component of data, doing so throws away vital information that an appropriate algorithm could easily digest. In a very simple example, treating someone as “auto intender” or not is far less powerful that tracking a user’s behavior over time to better understand how their intents and interests evolve. That is, unlike contextual data, behavioral data is by its very nature directly linked to longer-term patterns of user behavior. As a result, behavioral data are far better at driving long-term value across a variety of campaigns, especially branding efforts that address personal aspirations.

In a 2011 article entitled “Consumers Are People Too…,” I argued that when asking which data are better, behavioral or contextual, the only right answer is both. And therefore, I reject any absolutes, because marketers clearly will benefit from a variety of data types.

In conclusion, if you’re asked to pick between more data or a new and better algorithm, which should you choose? The proper response is that business acumen, varied information, great algorithms and operations are all critical to the success of an audience model.

As SVP, Analytics, Kevin is responsible for leading the vision and execution of the eXelate’s analytics and data optimization activities. Kevin has a BA in Russian Language and Eastern European Studies from the University of Illinois at Urbana-Champaign, an MA in Medieval History from The Ohio State University and an MA in Applied Statistics from the City University of New York-Hunter College.

Must-Reads from other Websites

Panos Mourdoukoutas

Why Apple Should Buy China’s Xiaomi

Paul Graham

What I Didn’t Say

Benjamin Bratton

We Need to Talk About TED

Mat Honan

I, Glasshole: My Year With Google Glass

Chris Ware

All Together Now

Corey S. Powell and Laurie Gwen Shapiro

The Sculpture on the Moon

About Voices

Along with original content and posts from across the Dow Jones network, this section of AllThingsD includes Must-Reads From Other Websites — pieces we’ve read, discussions we’ve followed, stuff we like. Six posts from external sites are included here each weekday, but we only run the headlines. We link to the original sites for the rest. These posts are explicitly labeled, so it’s clear that the content comes from other websites, and for clarity’s sake, all outside posts run against a pink background.

We also solicit original full-length posts and accept some unsolicited submissions.

Read more »