It is now widely acknowledged that the best way to utilize the wealth of data at organizations’ disposal is by democratizing data, allowing all employees access to the data in such a format that they can use it effectively in their day-to-day decision making.
Speaking at the Big Data Innovation Summit last September, Riley Newman, Head of Data Science at Airbnb, discussed how the room letting platform has achieved this and the benefits to doing so.
Newman is somewhat of an oddity among data scientists in that he doesn’t believe that companies should necessarily be 'data-driven'. Rather, he believes that it is more sensible that decisions be data ‘informed’. He notes that, ‘data is often a useful piece of information, but it's rarely a complete picture. So a company should be data informed, with the understanding that there may be more to the story.’
When Newman first joined Airbnb in 2010, the big data phenomenon had yet to take off in earnest. He was also just their seventh employee - an unusual move for a startup who usually hire later in the growth cycle. It was his goal to ensure that Airbnb make these data informed decisions as they grew.
In his early days, he says that the way they had set up their data - in a hive with 3500 tables, of which only 5 were actually any good - they were forced to rely on tribal knowledge to make sense of the tables. This resulted in multiple parallel efforts focused on solving what were essentially the same problems at the same time, with each department looking at different sources of truth, causing a great deal of confusion and wasted efforts.
The problems with such an approach became clear around 2011, when Airbnb exploded internationally. The ‘Jack of all trades’ approach the data science team - with just three people at the time - was no longer fit for purpose, and Newman looked for a new way of doing things. In order to fix this problem, Airbnb established something called ‘Core Data’, providing one canonical source of truth to all staff to ensure they were no longer working at crossed purposes.
For Core Data to work, it was imperative that all data was 100% reliable and robust, always up-to-date, and always consistent. This new system also had to be highly intuitive, with tables linking through to one another, so it did not require explanation, training, and needed few technical skills to make sense of it, meaning everyone could garner the insights regardless of experience or background.
The system needed to be simple and scalable to allow for expansion. They, therefore, used a star schema so as not to limit the size of the warehouse as the company grew. They created one key to hook into other tables so the whole warehouse could work together, with 2 tables - one for ‘facts’ and one for ‘dimensions’. Booking informations and the like goes into facts, while the dimensions table looks deeper, adding a few more bits of information. They also designed meta data for every table, showing who’s in charge of every table, who’s responsible for keeping it clean, descriptions of all of the variables in hive to show what everything means, and so forth, to be sure of its usability.
The benefits for employees outside the data science team by democratizing data in this way are tremendous, as it gives them information necessary to back up their hunches when they need. By not having to go through the data science team, it means that they can get the answers they are looking for far quicker than they otherwise would. Perhaps most importantly, though, is the benefit it has on the data science team itself. As Newman mentions in his presentation, it frees up the data science team to focus on more technical problems. This makes their work more challenging, providing a better culture for the data science team. Since Airbnb implemented ’Core Data’, retention rates have gone up, and they are doing more interesting and impactful work, that in turn benefits how the whole company uses data.