“Drowning in Data”

Data Center

Data Center (Photo credit: bandarji)

I’ve just read a fascinating article in The Times of 22nd October which starts out by saying, “The world is ‘drowning in data’ and computing companies are running out of space to store it…”

Some of the interesting numbers that came out of the article:

  • By 2016, the number of devices connected to the internet will be 3x the global population (so, well over 20 billion devices) – that’s up from 9 billion today, itself an eightfold increase in seven years, with the num

    ber expected to reach a staggering 50 billion by 2020.

  • Global IP traffic in 2016 will reach about 120 exabytes / month. That’s 120 million terabytes (or, if you prefer, 120 billion gigabytes) of data every month – and almost 10% is expected to be mobile data.
  • And, if you think YouTube has too much video on nowadays, by 2016 estimates are the 20 000 hours of video will cross the internet every second!

Already this year we’ve seen the explosion of tablets and smartphones – not just in numbers, but in data traffic, too, with the average tablet expected to handle some 4 gigabytes of data every month, up 8x from last year, and the average smartphone to be handling around 2.5 gigabytes of data a month, about 16x more than last year.

This pace of growth indicates both devices overtaking laptops for data traffic in the next year or so as laptops are ‘only’ handling around 7 gigabytes a month, little more than 3x up on last year.

So, we’re creating vast amounts of information but what are we doing with it all? Seemingly, it’s going into enormous storage pools as another recent article in Microscope (19th October) pointed to a significant skills gap when it came to the ability of companies to handle this level of information – ‘Big Data’ as it’s referred to.

Although the article points to research showing that almost 2/3rds of UK business understood the competitive advantages of being able to utilise this data (nearly twice the number of firms in 2010), less than a quarter believe they have the ability to analyse all the unstructured data streaming in.

So, not only do we have a growing issue with storing all this exponentially increasing data traffic, but we’re largely unable to do anything with it.

It’s going to be fascinating to see the business models that spring up to manage this in the next couple of years.

6 responses to ““Drowning in Data”

  1. I’ve been building web things since the days of Mozaic and HTML 2, so I have a few views on these things. The trends are not all good. There’s two threads worth pointing out. (This also isn’t a response based directly on the Times article, I haven’t read it!)

    1. Improvement in technologies and equipment. Not universal, some purported advances are moves backward, but, if you choose right, things are getting better.

    2. A flood of utterly useless content. This is not limited to the Internet but easy publishing on the Internet has made it worse. In a world where the majority of peer reviewed content in some scientific fields is plain wrong, it’s hardly surprising that a lot of Internet content is noise or worse.

    Given that it’s possible to take charge of your own data firehose and go forward. In my view those who coast along on the standard ways of using time run a high risk of drowning in data. The advertising paid folks seem to have a vested interest in making your life worse. They prevent you doing what you really want and try to force you backwards. (To them it makes sense, for you?)

    I recently saw a good example. I’ve used twitter via a news feed for some time. They have now cut off the news feeds (as far as I can tell, no announcement got to me!). Presumably they want people to read adverts to pay for the service. I’ve stopped using it regularly. The step backward is too great.

    So, in my view, a larger portion of the traffic and data is entirely dispensable. In fact more than that, the world would be better without it. (By the way IP 6 will sort out the problems with number of devices. Not necessarily a good thing, it will encourage and enable much madness!!)

    As a practical personal solution I recommend getting to grips with the technology. Not some shrink wrapped consumer oriented abomination that gives you indirect access, get closer to the metal and closer to the wire.

    There are some great technologies out there which allow you to erect your own filters, get to the main vein of real information and shape your world more as you want it. If you let others take charge they may well warp your world (brain plasticity is very real) and waste your time.

    • Thanks, Mike. As you say, far too much useless content out there, adding to the exabytes of potentially useful data. Mind you, it’s interesting to see how companies are utilising some of the streams of info, including that not directly useful, to gauge sentiment and make decisions on that basis (a number of equity trading systems look at things in this way as markets are driven as much by sentiment than other things).

      In all generally upward trajectories (apart from rockets, perhaps) there are periods where we go down/back for a while, although maintaining our overall direction over time, so hopefully some of the backwards steps will prove to be just this…

      • That system for using twitter on the market is interesting. I remember spending an hour or two looking at it and seeing what would be needed to emulate. It would have taken a few days at most to set up a similar system. (Twitter feed stream (could be done without the full firehose), word similarity lists (there’s at least one freely available huge corpus out there derived from whole Internet trawls), and the same simple search strings they were applying to the tweet-stream.)

        My impression was the quality would be highly dependent on the amount of thought going into tweets. My subjective impression is that, based on whole stream, and a scale -10 to 10, much of that quality is in the range -5 to 1. GIGO. To make it work you’ve got to grade the tweet stream. (I could have taken a few hours to set that up, but feared what Twitter would do with the licensing. The apprehension was spot on. They’ve realised that VC funding is not forever. Maybe in a year they’ll have settled down and it might be worth giving it a shot.)

        End of day, same problem with most Internet data. The human generated stuff often comes out of brains in deep hibernation (or worse), so grading is needed. Most of the grading I’ve seen simply stinks. (Just look at the drek coming out the search engines fr’instance.)

        If you ever wanted to get into this stuff some of it’s not that hard. When you break it down the intellectual challenge of understanding Amazon-Web-Services, Azure, Hadoop etc. isn’t that bad and it’s genuinely easy to set up, and dead cheap to experiment with. I’ve seen major analysis done with about USD 100 of compute time!

        My guess is that those who learn to filter out the low grade content, will be well placed.

      • On the Twitter streams side, I guess it depends what you’re looking for. If just measuring sentiment then the level of thought behind a stream doesn’t really matter as sentiment is top-of-mind stuff, anyway. If one’s looking for more specific info – customer feedback for a company, etc., then this does play a role. Either way, as you say, there’s good money to be made by those who can understand what to filter and when.

  2. Good and true article Guy. It’s impossible to digest all the information we are bombarded with 24/7. Same goes for new social media networks, new marketing strategies and so forth.

    This one really makes me laugh: “1,000 ways to increase your sales with Twitter”. Imagine only the time it would take to check out all 1,000 ways. Whoever goes for that one has to have an IQ lower than their shoe number;-)

    Get invitations to join social media networks on a daily basis. But where is the time going to come from – I only have time for Linkedin, Facebook, Twitter and Google+.

    Getting back to the huge amount of information we are bombarded with, most of it is actually “copy & pasteish” and very little is original or innovative. Catch is you have to go through the rubbish in order to find the diamonds.

    • Thanks, Catarina. Some excellent points about the deluge we’re getting from social media sites, and the like. As you say, where will people find the time to manage the plethora of sites we’re invited to join?

      Not all the data is useful – much is absolute junk (like so many Twitter streams, for example) – but there’s no question that there a many exabytes of data that could be really useful if only it was analysed properly. More and more companies are using Twitter streams to gauge public sentiment for not just adverse comments about their business, but for stock trades, etc. And then there’s the location-based stuff, the transaction histories showing what customers buy and when, and so on.

      As I say, it’s going to be interesting to see what comes along…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.