The Pyramid Of Data Needs (And Why It Matters For Your Career)

Machine Learning Data Science Manager at Uber, Hugh Williams


Every company has a pyramid of data needs, and your role as a data scientist/analyst will fall somewhere along this spectrum. Understanding this framework is key to properly articulating your current skills/responsibilities and where you want to go with your career.

Before we get too deep into the subject, let me give credit where it’s due. This entire concept is based on Maslov’s Hierarchy of Needs, and allegorizing it to data science is not new. In fact, Monica Rogati gave an exceptional description and visualization of data needs about six months ago. So why re-hash the subject — especially given that my knowledge of the subject and writing skills pale in comparison to Monica’s?

Well, Monica concisely framed the discussion in terms of advising startups/companies, but my motivation is to write about how this pyramid impacts the careers of individual data scientists. I’ve had the same conversation trying to map one’s interests to their role and I keep coming back to this visualization. It is super helpful at conveying skills and job responsibilities in a generalized, but still meaningful way. So rather than wasting more marker ink on a whiteboard, I’m following  Rachel Thomas’ principle of putting it in a blog.

So with that, a couple points to get across…

(Please) call it a pyramid, not a hierarchy

Yeah it sounds clunky, but trust me it’s for a good reason. When I drafted my own visualization (as part of an effort to better define the roles at my current company), there were three noticeable differences to Monica’s:

  1. My overly aggressive color scheme (apologies for creating what has been described as “a bastardized version of the pride flag”).
  2. It’s upside down (relative to Monica’s).
  3. I use the word 'pyramid' instead of 'hierarchy.'

The two latter choices are purposeful. By flipping the pyramid and supplanting “hierarchy,” I am attempting to eliminate the perception that working in the narrower parts of the pyramid is better, cooler, or more impactful for a business. Too often people chase the fanciest data science technique (e.g. deep learning, multi-armed bandits), rather than focusing on the building blocks to creating a solid foundation. Heck, rarely do we even stop to question whether something like deep learning is needed. Using verbiage and visuals that avoid implying superiority is crucial to de-hyping the narrower parts of the pyramid.

Every company is different

It’s a pyramid/triangle/non-rectangle for a reason. Not every company needs advanced research or machine learning algorithms. It depends on the business they’re in.

Let’s use an example. Take a brick and mortar store. Maybe they have 50 transactions a day. Logging every transaction is still critical to knowing what inventory is selling and isn’t, but this doesn’t require any sophisticated data management. It might get logged in excel and maybe they do some business intelligence-like work to gain insights. But the point is they don’t need anything more advanced for their operations than the first building block of data pyramid.

On the flip side, a company that gives out loans — even a small local credit union — will need to build multiple layers in order to ensure the loans are being repaid. In fact, the local credit union will likely want to move as close to modeling as possible to minimize the risk of their loans by predicting people’s likelihood to default.

As companies’ needs differ, so do their staffing strategies. Some prefer specialists (using different people for each layer of the pyramid) whereas others prefer generalists (asking 1–2 people to own the large portions of the project). The specialist model gets stronger results at each stage but requires huge communication overhead to ensure proper building across many people. The noticeably small overlap in responsibilities breeds clear, smooth ownership but also makes it extremely difficult to speak the same language throughout the entire pyramid. Meanwhile, the generalist model allows people to build data science products (i.e., dashboards, models, etc.) quicker but can often land on local maxima. Honestly, achieving a high-quality generalist model is also really hard to hire for since it’s so rare for someone to have such a diverse background.

I’ve seen both strategies work firsthand, but welcome your thoughts on the pros/cons of each.

Use the pyramid to frame conversations on skills and responsibilities

The pyramid is super helpful for framing what your job responsibilities are and how they map to your technical skills and interests. I use this framework whenever I interview with a company to best understand whether the role is aligned with my interests. Companies have vastly different definitions of what a data scientist is, so before you jump into a new internal or external role talk with your manager-to-be about what the distribution of work will look like. A company or department’s maturity is correlated with their place on the pyramid, so don’t be surprised if upon joining a startup or brand new team you find it necessary to instrument a lot of the logging yourself. Asking tough questions upfront ensures both you and your company knows what they’re signing up for.

No one role is 'better' than another

The last thing I want readers to take away from this article is that no single role is inherently better or more important than another, or that individuals in a given role have to learn everything. There are of course data scientists who are 'full-stack' and can build in nearly all parts of the pyramid; likewise, there are super-specialists with extremely deep knowledge in one part of the pyramid. However, striving to be either of those is not necessarily what makes a great data scientist.

The best folks I’ve worked with are those that can acutely identify exactly where the gaps are for their team and work with their team to fill it. This ability to find the biggest opportunities typically aren’t the skills taught in graduate school, online courses, or even what shows up on your bi-annual feedback. So no matter what your role, I strongly encourage you to always be thinking about what part of the data pyramid will have the largest impact for your team/company. Once you figure that out, I trust you’ll wrangle the necessary people or skills to find a solution that works.

To hear more from Hugh Williams and how Uber is improving customer care with NLP and deep learning, attend our Machine Learning Summit May 9-10 in San Fransico.  


Read next:

Why We Need Data Visualization To Understand Unstructured Data