Big Data and the Soles of Your Shoes

shoes640

Image copyright BONNINSTUDIO

There has been significant ink killed in the name of Big Data over the last few years. Much of that slaughter is justified; Big Data represents a massive generational shift in the way we think about data and its impact on every aspect of our personal and professional lives.

But I don’t think people really get it. Big Data is too big and too abstract for most people to truly grasp. To many, it feels like just another label for the relentless technology arms race we’ve been engulfed in for 30 years.

Rather than provide my grandiose vision of Big Data for the ages, let’s think of it in terms of a pair of shoes. Not all the shoes, just one simple pair of shoes. That’s right. A single pair of shoes = Big Data.

How can one simple pair of black, size 10.5 Nike shoes represent the truth behind Big Data? Let me explain.

A Complex Journey

These specific shoes (currently adorning my very average-sized feet) arrived via standard consumer freight service. I bought them from a large online retailer after doing a little bit of Web browsing. Nothing too strenuous. Couple of reviews here, maybe a video there. I like them. They’re pretty comfortable shoes.

Now look at your shoes. Really look at them. Think about the journey that pair of shoes has taken to get to your feet.

In the beginning, your shoes start out not as anything physical, but as a series of electronic documents between the shoe manufacturer (let’s use Nike for my own person reference) and the suppliers of the raw shoe materials (rubber, canvas, leather, etc.). The exchange of data comes the moment you click “Buy”; your online purchase generates documents that contain details about your shoe preferences (size, style, color, etc.), invoices, pricing and billing information, and other pieces of data required to complete your electronic purchase.

Nike then processes your order, which generates more data: Order, customer, inventory and logistics information is exchanged between the Web and back-office systems. But at this stage, your shoes don’t resemble anything close to what you’re currently looking at on your feet. Physically they’re bits and pieces of raw materials in a textile supplier warehouse. Electronically, they’re bits and pieces of data that are processed by every single machine at the textile manufacturer. Going from raw material to something that can be used in a shoe can generate literally tens of thousands of points of information — simple things like the current temperature of a fabrication machine to more complex information like the fabric weave or the design of the shoe itself. Every second, these textile-processing machines are sending out status information as they sort, organize and package up the various pieces of material that will eventually become your shoes.

So now your shoes are starting to resemble, well, shoes. But their journey isn’t done yet. Not even close.

Once the textiles are boxed up, they need to get to Nike for assembly. How do they get there? Simple freight (UPS, FedEx, USPS). The textile manufacturer has an internal transportation and logistics system, where shipping information (i.e. Nike’s address) is entered. Depending on how far away the textile plant is from Nike’s manufacturing plant, each package will be electronically scanned by UPS/FedEx hundreds of times. Each scan event — the departure of the package from the textile manufacturer, the arrival at a big warehouse somewhere in Indiana, to Nike’s warehouse — is recorded to allow the logistics company to track each package through its system. This information is then blended together with all other packages in the system to enable better optimization of the transportation network.

Okay, so your shoes — in textile form — are now at Nike. But they’re still a box of fabrics, rubbers, leathers and other boring materials. They aren’t yet the cozy size 10.5s you’ve ordered. What now? Once the package arrives, even more electronic documents are passed back and forth between the textile manufacturer and Nike, where your box of not-quite-shoes is recorded and unpacked.

So here we are at the assembly stage of the process, where your shoes transform from a bunch of materials into a single pair of shoes. As was the case in the previous stages, assembly involves a massive amount of data transfer from machine to machine. Your shoes generally go through four different processes — cutting, closing, lasting and finishing — before they become wearable. At each stage down the line, thousands of pieces of information are generated. You ordered your Nikes in your favorite colors, red and white? That’s reflected down the line. You wanted them to include that brand new cushioned sole to alleviate your shin splints? Each machine must know this so that what you ordered is what you get.

Finally, your shoes are the right size, shape, color — and most importantly, they’re actual shoes. But they’re not yet at your doorstep. Let’s resurrect that freight service yet again to get them from the Nike manufacturing plant to your feet. Again, the details about the package are sent through Nike’s internal transportation and logistics system. From there, UPS/FedEx/insert-favorite-shipping-company here picks up the package and scans it into its system. Each scan is critical, as it tells you the customer exactly where your package is. Information on location and time is captured during each scan and then, like a stone hitting a pool of water, these small scans create massive ripples of data across the transportation and logistics systems. These systems correlate all of this location/time data against weather data, network statuses, disruptions, outages, customs, etc., all in an effort to give you the most accurate delivery time.

But let’s remember that not all of the millions of packages delivered daily are done so without some sort of obstacle. Say a major storm breaks out at the airport near the manufacturer. Or maybe there’s traffic congestion (Carmageddon?), or a delivery truck breaks down unexpectedly. These things happen daily, and every delay generates even more information, as the systems that are managing the package try to find the most optimal route. Imagine it: Millions of packages, every day, generating massive amounts of information every second, all to fulfill the retailer’s promise.

We’ve been ignoring the financial transactions involved between you the customer, the textile manufacturer and the shoemaker. Banks on behalf of all three parties are working with additional data behind the scenes to make sure you pay what you’re supposed to and that the whole process goes smoothly.

This is all for a single, relatively small box of shoes. A shoe may require components and materials from many, many suppliers. The amount of data generated and stored keeps mounting. Not only is information being generated that’s tied to the physical material and inventory, but each application and system is also generating machine information, recording things like the health of the system. Every time a system does anything, that action is recorded in log files. The amount of data generated from the manufacturing and delivery of a single $80 pair of shoes is staggering.

Forget Big Data. Think About Small Data.

Big Data always comes across as “Big” first and “Data” second. What I urge you to do is think about the “small data.” This type of data is what happens every moment of every day. The humble pair of shoes represents small data. It’s a pair of shoes. It doesn’t pretend to be a space shuttle. But that pair of shoes has generated a massive quantity of data in its journey to you.

Small data represents the constant dripping faucet of information you generate every day. From ordering food at a restaurant to visiting a Web page to buying a pair of shoes, this faucet never stops. The amount of small data out there trumps the amount of Big Data.

Matt Quinn is CTO of TIBCO, a Palo Alto-based Big Data company, where he leads the company’s technology vision, including spearheading a series of successful high-profile acquisitions and providing overall leadership and coordination of TIBCO’s product plans and technology direction.


Must-Reads from other Websites

Panos Mourdoukoutas

Why Apple Should Buy China’s Xiaomi

Paul Graham

What I Didn’t Say

Benjamin Bratton

We Need to Talk About TED

Mat Honan

I, Glasshole: My Year With Google Glass

Chris Ware

All Together Now

Corey S. Powell and Laurie Gwen Shapiro

The Sculpture on the Moon

About Voices

Along with original content and posts from across the Dow Jones network, this section of AllThingsD includes Must-Reads From Other Websites — pieces we’ve read, discussions we’ve followed, stuff we like. Six posts from external sites are included here each weekday, but we only run the headlines. We link to the original sites for the rest. These posts are explicitly labeled, so it’s clear that the content comes from other websites, and for clarity’s sake, all outside posts run against a pink background.

We also solicit original full-length posts and accept some unsolicited submissions.

Read more »