Big Data At Uber

We investigate how the taxi giant utilizes its data


Since its launch in 2009, Uber has been one of the most exciting and fastest growing companies in the world. They started off in San Francisco, initially servicing only the Californian city, but quickly began to expand to today where it offers its service in around 720 cities across the world and has hundreds of thousands of drivers serving their customers. Not bad for a company that’s only 8 years old.

Part of this meteoric rise has been the service they offer, but the company have managed to provide this thanks to the way they have harnessed the huge amount of data they collect every day. The amount collected is monumental, with every meter covered by an Uber driver, whether they have a passenger or not, tracked and recorded to help Uber optimize its service.

They utilize three main sources for their data collection: Kafka, an SOA database, and their custom Shemaless in-house data store. They also utilize more standard systems, like Hadoop and Spark, in addition to other in-house platforms, such as their Streamific system which utilizes Apache Helix and Akka.

But how exactly does the company utilize all the data it collects and analyses?

Surge Pricing

The idea of surge pricing for Uber drivers and passengers is to increase the price of a journey where there is a larger number of passengers. The idea behind this is that the potential to collect a larger fare is going to attract more drivers to a particular area where they are needed the most, meaning happier drivers and happier customers. The drivers get more money and passengers get a ride when they need it.

Data is used for this in both a predictive and reactive way. For instance, Uber can predict where there is likely to be a high demand and roughly what that demand is likely to be thanks to historical data. For instance, it is relatively clear that there is going to be a huge demand for Ubers after a sports match, but as stadiums are often away from city centres there are only likely to be a few drivers in the area on a regular basis. Through predicting when drivers are going to be needed Uber can draw a larger number to the area to provide a better service for their customers.

The system can also be reactive to changing circumstances too thanks to basic GPS data. For instance, if there is a breakdown on public transport there is likely to be a larger number of people needing an Uber in an area that would not normally have the numbers to support that number of people. Surge pricing will draw drivers from surrounding areas to satisfy the sudden and unexpected increase in demand.

Another key element of the surge pricing is that it doesn’t simply impact the one area where demand is highest, it also uses locational data to deliver a decent service to surrounding areas which would naturally see a drain of Ubers from areas nearby. Through offering slight surge pricing to these areas too delivery of service won’t be impacted, meaning a fairer and practical delivery of services despite a large spike in demand.

Driver Quality

Given the decentralized aspect of the Uber business model, interviewing and thoroughly vetting drivers in advance would be difficult. In fact, it is something that Uber has historically struggled with. In 2015 for instance, San Francisco and Los Angeles district attorneys found that Uber had failed to detect the criminal records of 25 drivers it had hired. Given that Uber currently has 160,000 drivers across the world, it is clearly challenging to thoroughly vet every driver prior to them starting work.

Although there are some vetting procedures in place, Uber user review data to make sure their drivers are performing well, with any driver with a rating below 4.6/5 in danger of being deactivated from the Uber platform. According to documentation released in 2014 only around 2-3% of all Uber drivers have a rating under 4.6, but the threat of falling beneath this threshold gives their drivers a clear imperative to treat customers well.

However, it is not only drivers who are rated, but also passengers, with drivers also rating each fare in order to help the next driver decide whether to take a passenger or not. This same system therefore helps to keep drivers safe and informed on the kind of people they are transporting, whilst also keeping them and their property safe.


Part of Uber’s appeal is not only that customers can hail them down using only a smartphone, but also that when they’re in the car they know they are taking the optimal route to their destination. This is thanks to Uber’s use of traffic data to predict the best routes based on congestion and traffic flow. This helps to make every journey as quick and easy as possible compared to traditional taxi drivers.

For instance, in London traditional black cab drivers need to pass The Knowledge, a test that takes an average person between 2-4 years to pass and requires them to know every possible route in an area in addition to around 30,000 points of interest. This is an impressive and challenging test, but in reality makes little difference when it comes to saving times on journeys in real-time, as even with this huge knowledge, traffic flows at any one time are simply not possible to accurately predict.

Uber’s navigation can quickly redirect to avoid traffic delays and make sure that passengers are getting to their destinations as quickly as possible, whilst also preventing their cars adding to the congestion.

City Planning

Uber have a very mixed relationship with the cities in which they operate and especially with the traditional taxi companies that they are disrupting. For instance, cities including Austin, San Francisco, New York, Munich, and London have all filed some kind of suit against the company, with reasons varying from tax avoidance, through to licensing and employment laws.

However, a relatively recent development from the company, Movement, uses the data collected by the company to help cities plan traffic flows. Given the kind of infrastructure that cities would need to create to build this capability themselves, it saves a huge amount of money of development whilst also providing perhaps the most accurate traffic flow data available.

Many of claimed that this is some kind of charitable or philanthropic move by Uber, but the reality is that it also has a huge upside to the company. In addition to the positive coverage, Uber rely on the infrastructure they are providing data to improve, so the better they can make it, the better their service will ultimately be. 

To learn more about how you can best achieve ROI in a data-driven world from other industry-leading experts, attend this year's Big Data Innovation Summit in San Francisco, April 12–13.

Looking small

Read next:

Expert Insight: 'An Effective Visualization Results From A Great Deal Of Curiosity And Exploration'