Oil, Soil, and the need for Toil

Last week I was invited by LexisNexis to participate in a discussion forum on the data industry.

We were asked to ponder an “exam question” throughout the session: “Data: Oil or Soil?”. (A quick Google search will show that we are far from the first people to ask the question, but I hadn’t come across it before.) I’m not sure there’s a right answer; oil and soil mean too many things to different people: any response probably says as much about the author’s views than anything else.

But people’s views are interesting, or you wouldn’t be reading this blog. I think in terms of how the two metaphors were originally put forward (oil by Clive Humby; soil by David McCandless), data-as-oil is a better characterization than data-as-soil. I also think the oil metaphor casts more useful light on the state of the industry. Data – for the purposes of this discourse at least – is part of the supply chain for businesses, much like any other commodity, and the structure of the data industry can resemble the structure of other extractive industries.

It’s not a wholly clearcut case, though. And I sympathize with the motivations behind the soil metaphor; I’d like to live in a world where data was more like soil, and maybe we can to some extent.

Common to both metaphors is the insight that data is not interesting or useful in and of itself, you need to work at it. Is data Oil or is data Soil? Either way, making data useful involves significant Toil. And while I think Oil describes the world better for the most part, it’s hard to refine your own crude. With Soil you can dig in your own backyard.

Data as Oil

Data is the new oil” – apparently first said in 2006 by Clive Humby of DunnHumby (or so Quora would have us believe). He was presenting to marketers at an industry conference: the full text of his original remark doesn’t seem to be available, but a contemporary response said

“Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.”

Assuming that’s a fair summary, Clive Humby’s key insight was that data is a commodity; a commodity that is useless in its raw form to most people, and a commodity around which an industry was coalescing. An obvious corollary is that we might expect lessons for this nascent industry in the structure of the oil industry.

Diluting the message

Of course, with such a quotable tagline, all nuance was rapidly lost, and the headline phrase lived on to acquire its own semiotics: as the Harvard Business Review put it in 2014,

Every 14 minutes, somewhere in the world, an ad exec strides on stage with the same breathless declaration: ‘Data is the new oil!’

And what was meant by this? Marketers smelt money:

“Data in the 21st Century is like Oil in the 18th Century: an immensely, untapped valuable asset. Like oil, for those who see Data’s fundamental value and learn to extract and use it there will be huge rewards.”

The message was clear: if you find yourself atop an oilfield, you can sit back and watch the money roll in; a fount of free, unearned value. It’s a message that doesn’t so much resonate, as throb –  the sort of message that cries out to be shaped to tap strategic budgets and build up a CMO’s empire.

Data as Soil.

But the oil industry is not well-loved these days, and nor is the idea of free, unearned value, accessible only to the few.

In 2010, David McCandless demonstrated his own marketing prowess by coining a replacement tagline, “Data is the new Soil” in a TED talk:

“[…]  it feels like a fertile, creative medium. […]  we irrigate it with networks and connectivity, and it’s been worked and tilled by unpaid workers and governments. […] it’s a really fertile medium, and it feels like visualizations, infographics, data visualizations, they feel like flowers blooming from this medium.”

Oil might be black gold: oil is also dirty, polluting, mostly in the hands of kleptocrats, and running out. Data is clean, infinitely reusable, and holds the future promise of democratized value to all.

Oil vs Soil

If data is oil – it’s a generic commodity, which needs significant processing and enormous capital investment to make useful. It’s ultimately used by almost everyone, powering the entire economy. Everyone’s aware of the price of petrol and the importance of oil to the economy – but for the vast majority of people the sourcing and processing is opaque. They just buy and consume the output.

If data is soil – it still needs husbanding and care to extract useful output. Enormous agribusinesses might exist, but you can still farm in your own backyard if you want. Nobody buys soil (except garden hobbyists), but everyone buys packaged vegetables in the supermarket. And most people never think twice about the importance of soil to their lives.

There’s a bit of truth in both metaphors.

David McCandless was talking about how he works with data. He’s a data journalist: journalists live in a world of information, trying to extract meaning. Not just journalists – that’s what business analysts do, or scientists. And if you’re in that world, it’s true – you look at the information and data around you, you prod, you ask questions, and out of nowhere sometimes a story, a visualization, a narrative will emerge and grow as if of its own accord. Data can feel like fertile soil.

But usually data doesn’t easily give up its stories; it’s hard to get those narratives to grow, even if you have the skills. Often the data isn’t actually there, it has to be sought out and acquired, if not collected from scratch.

And most people aren’t data journalists, business analysts, or scientists; their job doesn’t involve professional curiosity; they have no need, let alone wherewithal, to put significant effort into getting answers out of data.

So for most people – for most businesses – data doesn’t act like soil. Insights don’t spring up out of nowhere, and you don’t have clods of rich loamy data lying at your feet. Data – insights – are something you have to buy in, something that is processed for you, something that is a cost of doing business.

For these people, data is oil. Increasingly, you know you need insights and analysis to compete; you know that data but you don’t have the data yourself, and you don’t have the capabilities or resources to effectively process it. Plugging yourself into the data economy is becoming a cost of doing business.

But with data you can do some backyard refining (or gardening). You need skills, but you don’t need millions of dollars of investment. And increasingly, the skills you need are less specialist; you can get a long way with just the desire, the opportunity, and a bit of bone-headedness. Data genuinely can be democratic in a way that oil can’t.

Speaking as a former scientist, and someone who spends a lot of his working life with data, I’ve a great deal of sympathy for David McCandless’ take. I think he was too optimistic in 2010 and I think he would be too optimistic if he said the same thing today. But 2016 is a better time to be working with data than 2010, and 2022 will be better still. Data analysis and visualization tools are becoming more powerful in the hands of experts; and more importantly, much more accessible to the hands of non-experts.

I don’t think we’ll ever get to a stage where most people, or most businesses, are engaged in their own data agriculture. But as long as you’re prepared to get your hands dirty, it’s becoming easier to tend your own backyard.

4 thoughts on “Oil, Soil, and the need for Toil

  1. What about the Gold Rush? You can mine for gold, but most people are just panning in a promising stream. I’m not sure the serendipity of finding a nugget really does justice to the “toil” though… Back to the drawing board…


  2. Very well written!

    Not being a data specialist, I’ve got two comments:

    a) Does it matter that much which simile we choose? One man’s soil might be another’s oil – whichever term helps selling in a particular situation is the right one.
    b) Data differs from (s)oil in being virtually infinite and produced at a higher pace than consumed. Also, analytics tend to differ a lot depending on their domain, so both comparisons diverge very quickly. In short, I think either soil/oil terms are good in a news headline article, but lose value right after that.


    1. Roman, I suspect you’re right that “whichever term helps selling in a particular situation is the right one.” Perhaps the question should really be “in which situations is ‘oil’ a better sales pitch than ‘soil’, and vice versa?”

      I don’t know how true it is that analytics differ across domains though. In detail, sure, but you could make a reasonable case that there’s a fairly limited vocabulary of analytics processes & outputs across huge swathes of common domains. If for no other reason than that there are economies of scales in process tooling, and in bandwidth to explain. The audience for your analytics is going to respond better if you’re using terms & outputs that they are generally familiar with, so people tend to go with what they know. It’s in that sense I think you can make comparisons to an industrial process.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s