Continuing my interviews with some speakers ahead of this year's Machine Learning Summit in San Francisco, I spoke to Sam Zimmerman, Freebird Inc. CTO and co-founder.
Zimmerman is a backend software developer and data scientist with extensive experience in the commercial application of machine learning algorithms. Prior to Freebird, Sam worked as a quantitative risk analyst in the currency markets and as a team leader automating a large-scale data classification problem for an energy intelligence company. Sam is a Duke University graduate and works on a grant with MIT’s Computational Cognitive Science group to extend decision theory using advancements in machine learning and artificial intelligence.
Can you tell us a bit about how Freebird is revolutionizing the transportation industry?
Freebird empowers travelers to instantly rebook their flight in three touches on their mobile phone in the event of a cancellation, delay, or missed connection. For decades, the travel industry has relied on reactive travel insurance claim filing. However, Freebird has introduced a proactive rebooking solution based on data science and mobile technology to help get folks to the places that matter most.
Since my co-founder Ethan Bernstein and I launched Freebird in 2015, we’ve raised $8.5 million from Accomplice and General Catalyst, partnered with companies that book over $30 billion in travel annually, and have grown our team to over 20 exceptionally talented and curious people in Cambridge, MA.
That's very impressive. So in your opinion, what can other companies do to best position themselves to adopt machine learning?
It’s important to know what you need. Most companies have a need for the tools and techniques variously labeled as data science right now, but each company's needs can be quite different.
I like to think that a company navigating this confusing mashup of skills and techniques must feel a bit like being one of the first people to own a car at the beginning of the 20th century. You’re likely to have car problems, but you might not know which problems you can fix, which problems a local mechanic can fix, and which problems you need to send to an expert.
Common problems like changing a tire or replacing your oil you can be handled by yourself or by paying a bit of money to have a consultant to take care of quickly. These are data science tasks like basic analytics, business intelligence, and simple regressions. You can likely teach people already in your organization how to solve these problems, much like you can teach anyone how to change a tire.
More complex problems like changing a brake line or having your alternator go are a bit more tricky, but still quite common. You really want someone who works full time on these problems to help. These are data science problems like standard classification tasks, recommender systems, and basic image labeling.
Finally, some car problems are actually within the design of the car itself and need to be sent back. This is ultimately the responsibility of the manufacturer, takes a ton of time and expertise, and should only be attempted by a professional engineer. In data science, many new applications and research techniques are best tackled by experienced teams with access to the massive resources and a large group of collaborators. These are often found at large tech companies working on problems like self-driving cars and cybersecurity.
At our core, Freebird is a data science company - which is fairly unique for a consumer tech product. Many of our core risk management problems require the development of new methods. So, in the car analogy above, Freebird’s more of a boutique car manufacturer.
Do you think the evolution of machine learning will be hampered by the lack of the talent in the field? If so, what can be done to solve it?
I do not think the evolution of machine learning will be hampered by the lack of research talent in the field. The machine learning field moves incredibly quickly thanks to open source norms, large amounts of attention by a variety of thinkers, and capital from VC’s and large companies. That being said, I do think machine learning is currently limited by two things;
- A lack of people who are incentivized to validate an incredible amount of fantastic research that is being published
- A lack of talent to apply the aforementioned research to messy real-world problems.
In order to solve the validation issues, we need to find ways to incent talented individuals to replicate and generalize research that has already been completed. This can be done by rewarding the quality and extensibility of research, over speed and novelty. To solve the second problem, we need to develop more specialized skill sets and job descriptions as an industry. For example, we need engineers who can handle the idiosyncratic issues involved with deploying and maintaining machine learning models, as well as something like 'Data Science Product Managers' in charge of owning the product the model is powering. As a field, we’ve focused on too much on research. We need to develop other important roles in the data science ecosystem like 'data science product' and 'data science strategy' in order for data science to have maximum impact.
How has the industry’s attitude towards machine learning and other AI changed from when you first entered the field compared to today?
When I first entered the field in 2011, machine learning was just beginning to extend outside of advertising and finance into domains like sentiment analysis and computer vision. Largely this was a migration from quite clear optimizations of well-defined outcome variables (like click-through-rates and PnL) to much more abstract, subjective, and ill-defined outcome variables (like the 'sentiment' of a sentence or the 'setting' of a photo). This shift was made possible through two contemporaneous changes around that time:
- Cheaper computer power
- The development of several very important regularization techniques (like dropout)
Both of which propelled the successful application of deep learning into new domains. However, I think equally as important as those two changes, the industry also developed massive training sets around these softer outcome variables to power these models. Without this influx of (largely manual) efforts to create massive well-labeled datasets to define these softer outcome variables, we certainly would not be where we are today.
What do you think the biggest myth around AI and machine learning being propagated round your industry is?
Personally, I believe the biggest myth right now is that we are close to Type IV or General Intelligence - meaning machine intelligence that demonstrates human-like learning and intelligence - with these models. We are not.
I'm not sure if that is good or bad news, to be honest. Finally, what will you be discussing in your presentation?
I’m excited to share some of the work Freebird has done applying Deep Learning techniques to the U.S. Airspace to help predict the likelihood and severity of flight disruptions. I’ll also highlight some of the weaknesses of these Deep Learning approaches to this problem, and share some of the research we’ve done to address those issues.