Building Your Big Data Foundations

There are some key points to consider when implementing a new data program


Whenever a company starts out in their data journey, most face the same broad issues. This is regardless of what kind of data they are collecting, what their business goals are and which industry they work in - each will follow broadly the same route to their goal.

The problem is that many don't know what this map is, as it seems like it is constantly changing as new technologies and techniques are brought to market. However, the truth is that there are four foundations that every company needs to look at when creating their data program.

Mission Statement

Before any company starts off on a data journey, it is important that they set out a mission statement. This doesn't even need to be implicitly about data use but needs to be something that is clearly communicated and gives some indication of how data will play a part.

A key example of this would be Google, who's mission statement is 'to organize the world’s information and make it universally accessible and useful.' This doesn't say much about exactly what the company does with its data, how it’s stored, or even where it comes from. However, it is clear from this that mass data collection, storage, and utilization were part of the founding statement and data use in the company has always focussed around this principle.

Even if the statement is something basic like 'We aim to sell more products across the world to more demographics’, it shows that the data that should be collected should look at buying habits, different demographics and international markets rather than just being limited to a single country. This gives a basis to collection and a clear indication to the company about how to do deal with it.

Collection and Storage

When you know roughly what you are going to be collecting and why you are collecting it, the next key stage is how you are going to do so.

Collection of data is not as simple as just starting to collect everything, it is not sustainable and ultimately costs time and money that doesn't need to be wasted. Think about how quickly Facebook scaled to what it is now, in Q3 of 2008 it had 100 million users, by Q3 of 2012 it had over 1 billion. Although unlikely, this kind of growth could happen to your business and would you have the infrastructure in place to scale effectively?

It is also incredibly important to get effective storage in place before collection begins in earnest, you want to know what to store, how to store it, correct formats, and even potential integration issues. If these are wrong and you end up with hundreds of thousands of unusable records, the entire process will grind to a halt. Avoid the issues by planning you collection and storage well in advance.

Security and regulation

This is arguably the most important element of any data program because the collection and storage of data is nice to have, but if your company is hacked and you allow customer data to be stolen, your data program will arguably be over.

Customers will not trust you to hold their data and you will be hit by potentially huge fines as a result. If you are a small to mid-size company, these can be crippling. Even huge companies can be impacted significantly from this kind of hack. Target's CEO and CIO were forced to resign following their data loss, and Ashley Madison is facing billions of dollars of damages following theirs.

It is not simply about how you protect the data from anyone outside the company illegally accessing it either, it is also about using it responsibly when you have it.

New regulations can have a huge impact on data programs from something as simple as not using data properly. If you spam somebody from Canada for instance, your company could be hit with a $10 million fine, if you accidentally share personal information about somebody you are open to lawsuits, and if you target the wrong people at the wrong time you could easily lose hundreds of customers.


Analysis of data is the ultimate goal of an organization, the ability to utilize algorithms and new technologies to draw conclusions from data is, in essence, the entire reason a data program was created in the first place. However, it is also the hardest and most time-consuming part, with significant struggles to both find the correct people and then keep them happy.

It is an element that needs a considerable amount of preparation given the current skills gap within the market. It is certainly not easy to find these skill sets either, a survey from Price Waterhouse Cooper revealed that 44% of companies felt they didn't have enough talent to capitalize on big data. So if you don't have these talents already, how are you going to get it?

There are a number of ways. You could train an existing member of staff, out-pay competitors for the best data talent, or offer something unique that will draw people with the skill set you want to your company. The most important element is to make sure you have how you want this process to work in your mind well beforehand. 

Bean small

Read next:

City of Chicago: An Analytics-Driven City