It’s Not About The Size Of Your Data

It's how you use it


Ever since Facebook, Twitter and Google have become multi-billion dollar companies it has often been assumed that the amount of data they hold and process has been the key to their success. This has caused other companies to think that the more data they hold, the more likely they are to achieve considerable success.

Although it is true that the data used by these three huge companies has a significant impact on their overall success, it is not about they size of their data, it is how they use it.

They collect petabytes of data every day, but most of it is not used, at least at the time it is collected anyway. The key metrics they use for each user are deliberately limited in scope, targeting specific elements at specific times. For instance when they are targeting an advert to somebody they only need to know certain information, like age, demographic and recent purchases. This is not a huge amount of information needed to power their primary revenue driver.

However, companies like to boast about how much data they have as they seem to think that the more data they hold, the more informed they are. However, this is not the case and what companies end up doing is wasting money and resources hoarding data that has no use to them. One of the primary drivers behind this is the belief that in the future this data will be useful and that it will then become relevant.

To some extent this is true, but any data over around 6 months old has little current business value as behaviours change and people also change positions, alter circumstances or even die in that time. So holding this information is an exercise in futility when it plays no real benefit to the company.

Even when we look at how the future of technology is developing, the use of Big Data is not necessary for it to function. The IoT, for instance, will require only a limited amount of information to function to the best of its ability. If a connected thermostat needs to function properly, it needs to know the temperature now, what it needs to be and where it needs to be a certain temperature. This information is nothing, but when stored and used in analysis it becomes possible to see average temperatures, how well a house is insulated and how often boilers need to be turned on across millions of devices.

Therefore, this information plays almost no part in the actual business operations of companies, but allows them to improve them or begin others in the future. Even doing this requires deep learning and data mining techniques that require considerable skill and technology to implement effectively, something that most companies do not possess and will not possess before the data they hold becomes unreliable.

This shows one thing very clearly to companies today, that it is worth using the data that you need, not simply storing the data you don’t. 


Read next:

Working At The Boundaries Of Aesthetics And Inference