The big data buzz over the past two years has created a thirst for technical skills amongst thousands of companies. The success of early adopters such as Google and Facebook, who's primary income drivers derive from big data, has caused business leaders to sit up and take notice.
The demand has pushed up the potential salaries of those currently working in the area, whilst also creating a gap in the supply of data skills required across the wider business community.
So with companies now looking at big data implementation, how can they bridge the skills gap without compromising the quality of their analysis?
Building a team
When discussing with those in the industry, the question I often pose is what skills are needed to really succeed in data science? In addition to technical knowledge around Hadoop, stacks, SQL and other technologies, most say business understanding.
The idea behind this is that a data scientist should be more than just a data user, they should be a true catalyst to business change with the business knowledge to implement these new ideas across the corporation.
Although this may be ideal, finding somebody like this is almost impossible. All people will have strengths & weaknesses and data scientists are no different. They may be brilliant at data mining and analysis but may be weaker on communicating finding.
Therefore companies should be looking to create a data science team. By identifying what you need to use the data for and who will be the likely recipients, those with necessary individual skills can be brought in to create functional teams. I know of companies who employ journalists within their data team to help communicate findings and business consultants to streamline integrations. Data science teams need to be viewed like factories, where to get the end product you need to have several different aspects putting it together.
There is a crossover between many current roles that exist in organisations and the skills needed to become an effective data scientist. The prime example of this would be web developers.
Although CSS and HTML do seem like relatively basic coding languages, in reality the crossover between these and manipulation of data is strong. The creation of stacks is essentially code manipulation, something that they do within their current roles. Due to this with some additional training they could technically start a data science program.
By plucking these people from their current roles and training them through external companies such as Cloudera, there is likely to be not only the technological understanding but also a wider business knowledge.
Sometimes companies over invest in aspects of their business which in reality can be outsourced. The same is true of data management and analysis.
Of course outsourcing is not always possible, due to confidentiality and legal issues surrounding some data. However with the majority this can simply be outsourced to a company who have the experts there already. Letting another company do the leg work for your data makes perfect sense. Having experts working on your data who aren't on your payroll also means that you do not need to try to find a qualified candidate.
As a prime example of the difficulties around this, Kirk Borne, one of the early pioneers of modern big data, says that the roles have now been reversed in the past 10 years. A decade ago there would be one job for one hundred relevant graduates, today there are one hundred jobs for one graduate. Avoiding the time and money spent on recruiting in such a competitive market allows these to be reinvested in implementing the findings from the outsourced data.
The skills gap is something that everybody in the industry is well aware of and until we have the number of graduates to match the number of jobs, it will be an issue. The gap might grow or shrink, but at the moment companies need to find ways to avoid falling into it.