Big Data and the Soles of Your Shoes

Image copyright BONNINSTUDIO
But I don’t think people really get it. Big Data is too big and too abstract for most people to truly grasp. To many, it feels like just another label for the relentless technology arms race we’ve been engulfed in for 30 years.
Rather than provide my grandiose vision of Big Data for the ages, let’s think of it in terms of a pair of shoes. Not all the shoes, just one simple pair of shoes. That’s right. A single pair of shoes = Big Data.
How can one simple pair of black, size 10.5 Nike shoes represent the truth behind Big Data? Let me explain.
A Complex Journey
These specific shoes (currently adorning my very average-sized feet) arrived via standard consumer freight service. I bought them from a large online retailer after doing a little bit of Web browsing. Nothing too strenuous. Couple of reviews here, maybe a video there. I like them. They’re pretty comfortable shoes.
Now look at your shoes. Really look at them. Think about the journey that pair of shoes has taken to get to your feet.
In the beginning, your shoes start out not as anything physical, but as a series of electronic documents between the shoe manufacturer (let’s use Nike for my own person reference) and the suppliers of the raw shoe materials (rubber, canvas, leather, etc.). The exchange of data comes the moment you click “Buy”; your online purchase generates documents that contain details about your shoe preferences (size, style, color, etc.), invoices, pricing and billing information, and other pieces of data required to complete your electronic purchase.
Nike then processes your order, which generates more data: Order, customer, inventory and logistics information is exchanged between the Web and back-office systems. But at this stage, your shoes don’t resemble anything close to what you’re currently looking at on your feet. Physically they’re bits and pieces of raw materials in a textile supplier warehouse. Electronically, they’re bits and pieces of data that are processed by every single machine at the textile manufacturer. Going from raw material to something that can be used in a shoe can generate literally tens of thousands of points of information — simple things like the current temperature of a fabrication machine to more complex information like the fabric weave or the design of the shoe itself. Every second, these textile-processing machines are sending out status information as they sort, organize and package up the various pieces of material that will eventually become your shoes.
So now your shoes are starting to resemble, well, shoes. But their journey isn’t done yet. Not even close.
Once the textiles are boxed up, they need to get to Nike for assembly. How do they get there? Simple freight (UPS, FedEx, USPS). The textile manufacturer has an internal transportation and logistics system, where shipping information (i.e. Nike’s address) is entered. Depending on how far away the textile plant is from Nike’s manufacturing plant, each package will be electronically scanned by UPS/FedEx hundreds of times. Each scan event — the departure of the package from the textile manufacturer, the arrival at a big warehouse somewhere in Indiana, to Nike’s warehouse — is recorded to allow the logistics company to track each package through its system. This information is then blended together with all other packages in the system to enable better optimization of the transportation network.
Okay, so your shoes — in textile form — are now at Nike. But they’re still a box of fabrics, rubbers, leathers and other boring materials. They aren’t yet the cozy size 10.5s you’ve ordered. What now? Once the package arrives, even more electronic documents are passed back and forth between the textile manufacturer and Nike, where your box of not-quite-shoes is recorded and unpacked.
So here we are at the assembly stage of the process, where your shoes transform from a bunch of materials into a single pair of shoes. As was the case in the previous stages, assembly involves a massive amount of data transfer from machine to machine. Your shoes generally go through four different processes — cutting, closing, lasting and finishing — before they become wearable. At each stage down the line, thousands of pieces of information are generated. You ordered your Nikes in your favorite colors, red and white? That’s reflected down the line. You wanted them to include that brand new cushioned sole to alleviate your shin splints? Each machine must know this so that what you ordered is what you get.
Finally, your shoes are the right size, shape, color — and most importantly, they’re actual shoes. But they’re not yet at your doorstep. Let’s resurrect that freight service yet again to get them from the Nike manufacturing plant to your feet. Again, the details about the package are sent through Nike’s internal transportation and logistics system. From there, UPS/FedEx/insert-favorite-shipping-company here picks up the package and scans it into its system. Each scan is critical, as it tells you the customer exactly where your package is. Information on location and time is captured during each scan and then, like a stone hitting a pool of water, these small scans create massive ripples of data across the transportation and logistics systems. These systems correlate all of this location/time data against weather data, network statuses, disruptions, outages, customs, etc., all in an effort to give you the most accurate delivery time.
But let’s remember that not all of the millions of packages delivered daily are done so without some sort of obstacle. Say a major storm breaks out at the airport near the manufacturer. Or maybe there’s traffic congestion (Carmageddon?), or a delivery truck breaks down unexpectedly. These things happen daily, and every delay generates even more information, as the systems that are managing the package try to find the most optimal route. Imagine it: Millions of packages, every day, generating massive amounts of information every second, all to fulfill the retailer’s promise.
We’ve been ignoring the financial transactions involved between you the customer, the textile manufacturer and the shoemaker. Banks on behalf of all three parties are working with additional data behind the scenes to make sure you pay what you’re supposed to and that the whole process goes smoothly.
This is all for a single, relatively small box of shoes. A shoe may require components and materials from many, many suppliers. The amount of data generated and stored keeps mounting. Not only is information being generated that’s tied to the physical material and inventory, but each application and system is also generating machine information, recording things like the health of the system. Every time a system does anything, that action is recorded in log files. The amount of data generated from the manufacturing and delivery of a single $80 pair of shoes is staggering.
Forget Big Data. Think About Small Data.
Big Data always comes across as “Big” first and “Data” second. What I urge you to do is think about the “small data.” This type of data is what happens every moment of every day. The humble pair of shoes represents small data. It’s a pair of shoes. It doesn’t pretend to be a space shuttle. But that pair of shoes has generated a massive quantity of data in its journey to you.
Small data represents the constant dripping faucet of information you generate every day. From ordering food at a restaurant to visiting a Web page to buying a pair of shoes, this faucet never stops. The amount of small data out there trumps the amount of Big Data.
Matt Quinn is CTO of TIBCO, a Palo Alto-based Big Data company, where he leads the company’s technology vision, including spearheading a series of successful high-profile acquisitions and providing overall leadership and coordination of TIBCO’s product plans and technology direction.