Predictive Analytics: Not All Sports Were Born Equal

Some sports are more difficult to quantify


Baseball has always been a numbers game. Ask any batter about their own ability and they’ll reel off statistics - on-base percentage, batting average, slugging percentage, etc. Baseball is obsessed with the figures in a way that other sports just aren’t. You’ll rarely hear an iconic soccer midfielder remembered for their long-ball accuracy or even their goals per game. It’s unsurprising, then, that if any sport has embraced data analytics, it’s baseball.

Baseball has been synonymous with the use of data since Billy Beane’s work with the Oakland Athletics in the early 2000s. The sport has led the way since, and we have seen what president of SABR, Vince Gennaro, calls ‘an absolute explosion of baseball data.’ Each game of baseball now generates close to 1TB of data, a ’10-million fold increase’ that sees just about every ballpark action you could think of measured.

The data collected and analyzed today makes Beane’s sabermetrics look basic. On-base percentage changed the game, but analytics teams now have access to how hard a player hits the ball, a player’s luck regarding balls batted in play, or the average exit angle. So much can be learnt about a batter’s style that coaching teams will be judged on their ability to turn that insight into actionable improvements. A tweak to a batter’s average exit angle could have a knock on effect to their batting average -

Other sports find it harder to do this. Baseball, essentially, is a series of micro actions. The pitcher throws, the batter swings, and those on base run. It’s a sport in which you can isolate actions, pinpoint exactly where things went right or wrong and work on minute elements of a player’s game. In a sport like soccer, though, a player can have a significant impact on the game without hitting any key metrics - goals, assists, key passes, tackles, etc. Exactly why a team are improving on passing accuracy or chances created isn’t easy to pin down, either.

The difference lies in the nature of the different games. Sports as structured as baseball lent themselves to data collection even before the analytical revolution swept through sports. The challenge posed by more fluid sports like hockey or soccer, on the other hand, is far greater. There’s a reason hockey is so far behind baseball in terms of analytics. Similarly, there’s a reason no one in soccer has cracked a Moneyball-esque formula for toppling existing hierarchies, financial or otherwise - make no mistake, Leicester City’s Premier League triumph involved the use of analytics, but myriad other factors contributed to the club’s miraculous victory.

HockeyTech, a Canadian analytics company, has developed a puck and player tracking system that it hopes will eventually allow for proper predictive analytics in hockey. The challenge is immense, though. It plans to conflate data like shots and passes with a plethora of other metrics - where other players were on the ice, how fast they were moving, etc - to build a more complete picture of why certain events occur on the ice.

Similar systems also exist for soccer, although few are effectively utilizing a machine learning platform to link never-before connected events on the pitch. For the more chaotic and less formulaic sports, technology will need to develop to a point where these connections can be made, and the challenge for analytics teams will be not just how to interpret vast amounts of data, but how to suggest changes with all the predicted knock on effects in mind. Far from simplifying sports analytics, improved technology simply adds a necessary complexity to an industry with incredible scale. 


Read next:

The Top 5 Most Valuable Teams In Baseball