We often talk about data science on the Channels and go into detail about how to use it effectively. However, an in depth look at how this is used is not useful for those who are simply looking at understanding what data science actually is and how they can use it.
Data science is essentially the science of studying data in order to arrive at conclusions based on the data studied.
It is a broad term used to discuss a number of different disciplines and incorporates many different languages and processes in order to find patterns and trends that could then be used to come to some sort of hypothesis.
Data science requires an in depth knowledge of computing, statistics and mathematics in order to utilize systems to extract knowledge from data. It commonly requires a knowledge of both implementing and building algorithms that data can be pushed through. These algorithms can then be placed onto systems and the process can be automatically run.
Common systems used in data science are Hadoop, Pig and HBase, but Hadoop is the most common due to both its effectiveness and the fact that it costs nothing.
The history of data science goes back to the 1960s, with the study of data becoming more common. It was known as dataology, but the modern term data science came from the current Chief Data Scientist at the US Government, DJ Patil and Jeff Hammerbacher. It was originally coined in the early 2000’s to describe what they were currently working on.
Data science has a considerable amount of impact on Big Data, which acts as the ways in which people collect large amounts of data on which data science can be conducted. Although you could technically perform data science on small amounts of data without using any technology, it is generally accepted that it consists of large data sets run through complex algorithms and powerful data processing systems.
Partly due to this, data science is often considered to be an expensive division. Added to this is the expense of hiring effective data scientists, who are generally well compensated due to the scarcity of their skills throughout the general workforce.
The use of data science is not specific to any one industry or sector. It is used across almost every area of life, from insurance companies to charities and everything in between. A data science programme can be started on comparatively small amounts of data, but it generally thought of as requiring large amounts of data and computing power.