Bringing fancy stats to college hockey

college_hockey

Earlier this spring, I came across this article about the emerging field of hockey analytics. I don’t follow the NHL terribly closely, but I’m a huge college hockey fan. The NCAA season was coming to an end, but rather than be bummed that it was almost over, I immediately was intrigued and excited that this kind of analysis might be applied to college hockey.

But I couldn’t find any fancy stats relative to college hockey. No worries, I figured, I’ll just learn the NHL methods and do it myself. So I started to read all the articles and┬ápapers I could find, subscribed to a few stats blogs, and burned through a copy of Rob Vollman’s Hockey Abstract 2014, which just came out in July. I like analytics, and I love hockey. Piece of cake, right?

Well, it’s not that easy.

NHL advanced stats are truly coming of age. Like baseball almost thirty years ago, the hobbyists and amateurs are now proving their legitimacy as NHL teams start hiring statisticians and number crunchers in the quest for the Stanley Cup. The stats being developed are confirming common wisdom about the game, and giving us new ways to look at player talent, usage and potential. Suddenly even media and fans are asking about Corsi and PDO. Pretty soon every pro team will have a Director of Analytics, or they’ll be permanent division cellar dwellers.

At the college level? Not so much. Individual schools are tracking and calculating their own stats, but nothing is happening on an amateur level. Why? There’s one big difference between the NHL and NCAA when it comes to statistics. It’s not the actual analytic tools – all the calculations and formulas would translate well to the college game. It’s the data.

Consider this: Starting in 2008-2009, the NHL has published on its website detailed game summaries of every single game played each season (82 games per team per season, or 1,230 games). These summaries list every single event that happens on the ice, as well as every player who was on the ice for a given event. They also publish info on player shifts, including what the game clock was when the shift started, how long it lasted, and what the time was when it ended. So for any NHL game of the past six years, I can know who was on the ice and what was happened at any given second (assuming 60 minutes per game, that’s 4,428,000 seconds).

This kind of data doesn’t even begin to exist for college hockey teams, let alone in any public repository. And that’s where the real problem is. Even calculating some as simple as Corsi or PDO is damn near impossible without some of these details. In there’s ever going to be fancy stats at the college level, the NCAA is going to have to coerce all of their teams to track and publish this data. Or a forward-thinking conference with a high interest in fan development should step up and promote data sharing among its teams (looking at you, NCHC).

What’s the advantage? Well, it would boost interest in college hockey for one. Hockey’s popularity leans toward high earnings and education, and being on college campuses with all kinds of professors and researchers means you’ve got a built-in audience for advanced analytics in hockey. In addition, these advanced stats could improve the game, effect recruiting, and provide data for the pro scouts. What college team wouldn’t like to tell their recruits that they publish detailed data for which NHL front offices are now ravenous?

But the truth is that the NHL is a behemoth and only has 15-ish games per night to track. There are 60+ teams in college hockey, all on shoestring budgets (ignore Minnesota, Boston College and North Dakota for a minute), and games happen all over the place in all kinds of rinks. In addition, the players are only around for 3-4 years max, and play a full schedule for only 2-3 of those years. Oh, and a full schedule is about 30 games. So forget about getting any large sample sizes or longitudinal data. Collection is always going to be more difficult at this level, and the analysis will be inherently more error-laden, but that’s no reason not to try.

So what to do and where to start? Let’s identify a few stats college teams could track relatively easily. And let’s limit these to those that would be most useful for the college game in terms of analysis. To me, I see two areas of focus:

  • Track zone entries and defended entries: This paper gives you the idea. Basically, tracking whether a team carrys the puck into their zone vs. dumps it in can tell you about the quality of scoring chances they’re getting. Even more interestingly, tracking what defenders are forcing dump ins vs. which are letting opponents carry in and set up plays has big implications for player usage and talent. The challenge is mining this data, which is hard to do live, and slow to do from video.
  • Work toward player usage charts: What’s a player usage chart? It’s a visual guide to how teams use their roster, and how successful those deployments are. Even better, you don’t need an extended time series to do this – you can create a chart from 2-3 season’s worth of data (or even one!). However, it has some heavy input requirements – offensive zone start percentage, a shot-based plus/minus (like Corsi or Fenwick), and a Quality of Competition measure (which is comprised of opponents’ shot-based metrics and shared ice time between the player and the opponents). Basically, you need to track who is on the ice any time a shot is taken. Not impossible to get live, but certainly more difficult. And very hard to do after that fact from video.

Until we get the data behind these metrics, college hockey won’t see the same level of analysis that the NHL does. And maybe that’s what the NCAA and the conferences prefer. Sooner or later, though, the players, coaches and fans are going to want to know this stuff, and it’s better to get out ahead than play catch up.

This week, the NCHC (the youngest-but-best NCAA conference) announced a streaming service for nearly every game its teams play. That’s a huge start. Making this game footage accessible would theoretically allow a data miner with a humongous amount of free time to get at this data. A great next step for a conference would be to start tracking and publishing some game data. That gets you added fan interest and attention from the existing pro stats crowd, who would suddenly have college data to play with. But until then, there’s not a lot anyone can do.

As for me? I’ll keep lobbying for the data, and in the meantime, I’ll continue studying the pro methods. I’ve seen some really exciting new stats develop just this summer, and this NHL season should be the most exciting yet for data geeks. My heart will always be with college hockey, though, so I’ll probably be in my arena seats with a pencil and notepad furiously trying to track zone entries and on-ice shots stats.

Leave a Reply

Your email address will not be published. Required fields are marked *