Are You a Good or Great Data Science Professional?

Plus: Are organizations asking too much of their chief data officers? Read the latest edition of The Cluster, the DATAx newsletter.

25Feb

Are organizations asking too much of their chief data officers (CDOs)? We’re betting many CDOs would say yes (privately). A Harvard Business Review piece identified seven key kinds of CDO jobs, “each distinct enough that it would be difficult or impossible for one person to perform all of them well”: chief data and analytics officer, data entrepreneur, data developer, data defender, data architect, data governor, and data ethicist.

To us, the sexiest sounding are entrepreneur and ethicist. Selling data generated by the organization’s business processes (the enterpreneur) is tough to do, though, “despite the popularity of data monetization in the abstract … Companies that sell tangible products find it difficult to switch their orientation to selling data. And both consumers and businesses are increasingly concerned about the ownership and use of data that they help to generate.” Also, how many people with skills in data science are natural salespersons?

Ethicist sounds nice and trendy, but it might be more suited to someone who has experience in data privacy. (And someone with a law degree). The point of the article? “Organizations need to choose which CDO job or jobs make the most sense for them.” One CDO can’t do it all.

Aiming for Excellence

Reddit’s r/datascience channel had an informative conversation on what differentiates an average data science professional from a good or great one. Some of the best responses about the requirements for excellence: “awareness of model assumptions and limitations,” “high domain knowledge,” “business context, and understanding how to be practical,” “a focus on bringing value to your employer and justifying your position,” “ability to sell the idea of data-driven innovation,” and “asking good questions to the right people often, in order to accelerate learning.” As regards that last one, as one Reddit user put it, “All good data scientists spend a little time being average on their way to becoming good.” We’ll add this courtesy of medium: “knows how to write code computationally efficient and is lurking around to find high-impact business projects.”

Diana Ma is the Los Angeles Lakers’ data scientist, and Forbes interviewed her. I would have liked to hear more about what she actually does in her job, but the interview’s focus is on being a woman in a field full of men. She did have some great advice for young girls interested in STEM: “I want you to know that there’s absolutely no correlation between the score on [a] math test and your ability to be successful in STEM. I was that little girl who didn’t do well in high school math. Don’t give up. Keep pushing yourself.”

A Real Problem

Childhood obesity is rampant — there are 158 obese children around the world. In the U.K., the National Health Service found that 20% of 10- and 11-year-olds are obese. Analytics to the rescue? That’s what the National Health Service and Guy’s and St. Thomas’ Charity hope. They are using Tableau and other tools to explore and map data on the prevalence of obesity in London neighborhoods. Among other things, the geo-data allow the charity to apply its funds where the need is greatest.

Rethinking Fire

“In sports, as in fire, decision-making was a job largely reserved for the trained eyes and gut instincts of grizzled veterans.” So says Matthew Thompson, a research forester at the U.S. Forest Service. With changing climates, expanding human development, and accumulating fuels, new approaches are needed in fighting deadly forest fires. For example, some experts are calling for suppressing fewer fires and accelerating forest restoration, according to MIT News. Thompson is helping bring data-driven decision-making to wildland fire management, including helping fire managers use data to prioritize where they put resources and determine where suppression efforts are likeliest to succeed.

Reference Shelf

Which data science technologies are hot and which are not? It might not matter, according to Medium’s Jeff Hale. Rather than trying to master the list of technologies above, it's best to "focus on learning one technology at a time." OK, but in what order? Glad you asked:

  1. Python (for general programming)
  2. Pandas (for data manipulation)
  3. Scikit-learn library (for learning ML)
  4. SQL (for querying)
  5. Tableau (for data visualization)
  6. Cloud platform (for running models/applications)
  7. TensorFlow (most popular) or PyTorch (growing fastest) (for deep learning)

Mapping Data Science

The three biggest AI mistakes businesses make are (1) not having data people in the room when launching machine learning projects; (2) thinking too far ahead; and (3) not understanding how machine learning works. Perhaps more importantly, are your company’s leaders and data scientists on the same page, asks HBR. Misalignment is common. “The best mental picture of this dynamic is an inverted pyramid. The wide top reflects the C-suite’s oversized expectations for data science impact. The small point at the bottom represents the data science team’s current capabilities, which are often far more modest and develop over time,” writes Joel Shapiro of Kellogg.

Overfit

Gartner launched its “magic quadrant” for data science and machine learning … Would a differently abled person be able to function in your lab or research group? A Northeastern University professor asks the question. … Astrologers were the first data scientists?! A new book by Alexander Boxer makes the claim. … In case you were wondering, the National Institutes of Health has 30 petabytes of biomedical research in the cloud. … I’m sure you’ve heard: Larry Tesler, the creator of cut, copy, and paste, passed away at 74. Where did he do his pioneering work? Where else? Xerox. … Finally, Davide Zilli of Mind Foundry wrote an excellent piece for us on “Keeping Machine Learning Algorithms Humble and Honest.”

Big Jobs

Aviso, the pioneer in AI-powered sales guidance and forecasting, announced the appointment of veteran artificial intelligence and machine learning researcher Joy Mustafi as chief scientist. The former principal researcher for Salesforce's Einstein platform will leverage his decades of experience to help Aviso customers accelerate deal-closing and expand revenue opportunities. … Rand Group (no, not the policy think tank) launched a new data sciences practice that it says will combine research, modeling, software engineering, and database development. … Crowdstrike is looking for a lead data scientist.

Money Flow$

Deepnote announced that it has raised a $3.8 million seed round led by Index Ventures and Accel, with participation from YC and Credo Ventures. The company wants to provide data scientists with a cloud-based platform that allows them to focus on their work by “abstracting away all of the infrastructure.” A lot of teachers already use Deepnote to publish interactive exercises for their students, according to the company’s founders. … Old-school IT consulting firms continue to invest to position themselves leaders in data science. Capgemini bought Swedish BI firm Advectas on February 20. Advectas provides data management and data science services, and planning and simulation to clients.

Visual Sweets

People who live in Idaho and Nebraska must be fuming about this map. On the other hand, Alabamans (or Alabamians, if you prefer) are beaming ear to ear.

Have a news item? Email us at DATAx.events@gmail.com

Lakers photo by Katharine Lotze/Getty Images


News, analysis, and independent thinking for the data science community Sign up for The Cluster, the DATAx newsletter!
Robothandshakegettyimages 1128467534

Read next:

Keeping Machine Learning Algorithms Humble and Honest

i