It’s that time of year when NCAA hockey writers are prognosticating about the upcoming season. Naturally, NCHC fans are all abuzz about their 2015 NCHC prediction. I’m not a hockey writer, but I do enjoy the advanced analytics that are swiftly entering the hockey universe. I also love NCHC hockey.
It’s difficult to do any fancy stat work for NCAA hockey (mostly because of a lack of data), but on a team level, there’s enough to work with. Therefore, I’m going to attempt to make an NCHC prediction that’s based purely the numbers (aka the actual expected performances of the teams) instead of qualitative assessment.
(Also I’m going to rely on my wife Taylor, a PhD student who is much more skilled with statistics and regression than I ever will be. While I had an ok handle on the theoretical framework, I was coming up with some pretty wacky-but-reliable models, and she destroyed them and found models that were both statistically sound and even better fits than mine, so I owe this whole thing to her.)
The first step in this process will be to design the simplest-possible model that accurately predicts last year’s NCHC final standings. What a season, right? Miami, picked by writers to win the league, finished dead last. UNO, picked last, finished third. Denver, who finished sixth in the league, won the conference tournament! Surprise!
But were these results all that surprising? Let’s find out, and let’s use some actual, publicly available game data to do so.
I’m going to look at only regular season NCHC games, and I’m going to look mostly at two summary stats NHL number crunchers have found useful in describing team percentage. One of them measures the chances a team makes for itself. The other measures that other important factor in hockey – the luck. Between these components, maybe we can build a model for predicting NCHC standings.
Let’s start with the luck – the bounces. NHL analysts measure this in a stat called PDO (I don’t know why it’s called that. No one does. Just go with it). I’ll let Arctic Ice Hockey explain it better than I ever could:
On-ice shooting percentage shows what percentage of shots are going into their opponents net when a player is on the ice. On-ice save percentage shows what percentage of shots are not going into their own net when a player is on the ice. PDO is the two being combined. When you see these numbers think about bounces. While talent, effort, and skill drive results, sometimes a player might get a few more or less bounces than the other guy, or what the player had previously and likely to receive in the future.
If a player has an on-ice shooting percentage substantially lower (or higher) than he’s accustomed to in the past, there is a chance his point production may be a bit lower (or higher) due to bounces. If a player has a PDO substantially lower (or higher) than he’s accustomed to in the past, there is a chance that his goal differential (plus/minus for 5v5 only) may be a bit lower (or higher) due to bounces.
Alright, easy enough. PDO serves us best as an indicator of when a player/team is significantly above or below their normal performance. That’ll be important in determining what to expect for NCHC teams in 2015. Now, there are some NHL stats people who believe PDO is all you’ll ever need to know. That may be true in the pros, but I don’t think that will hold up for NCAA hockey – there’s so much more variance in the talent level that bounces won’t explain everything. Playing style and pure talent are going to be fundamentally different for Boston College and Alabama-Huntsville.
Let’s take a look at last year’s NCHC standings, along with shooting percentage, save percentage, and the PDO that results. Now, ideally I’d look at just even-strength situations here, but I don’t have some of that information, so let’s just go with game-level data:
2013-14 NCHC PDO
|Team||Finish||Points||Shot %||Save %||PDO||PDO Rank|
Ok, not bad. St. Cloud has the highest PDO, and also won the league. Miami and CC have the two worst, and they’re in the bottom. The middle is a little jumbled, though. Denver has the third best? They finished sixth in the league. UNO at fifth? Duluth at sixth? As expected, we need something more. And keep in mind, this is mostly measuring whether a team performed above or below expectations. Knowing that, this makes a lot of sense. I would say it’s generally true that all the teams above 1.00 did better than expected, and all the teams below 1.00 did not live up to their potential.
Let’s break the PDO down a bit more. Did a team’s bounces come more from quality shooting or quality goaltending? Let’s look at those differentials:
2013-14 NCHC PDO Differential
|Team||PDO||From Sh%||From Sv%|
This shows you a few things we probably already knew by watching the games. Denver had superior goaltending. Meanwhile, UNO’s was pretty bad. St. Cloud had incredible shooting. North Dakota was pretty good all around. CC and Miami were terrible all around. Western Michigan was about as average as a team could be.
Ok, but obviously bounces aren’t the only story, and the numbers above still don’t explain the final standings. What a team actually does to get scoring chances on the ice matters a great amount. The NHL analysts have focused on possession to quantify this. Generally this is measured in stats called Corsi or Fenwick, but I don’t have enough player-level data to build those for NCAA teams. So I’m going to simply look at the ratio between shots for and shots against for each team. Shots have tended to be a good approximation of possession. Generally speaking, if your team is getting more shots, you have the puck more than the other team. Makes sense, right?
Here are shots for, shots against, possession percentage, and possession ratio:
2013-14 NCHC Possession
|Team||Shots For||Shots Against||Possession %||Possession Ratio|
So as you can see, UNO was a possession monster, taking nearly 30% more shots than opponents. Minnesota-Duluth wasn’t far behind. In fact, these two teams sucked up so much puck time, that 5 of the other 6 teams are underwater in terms of possession. Denver took the brunt of it. They could only generate 81% of the offense that their opponents could – ouch. That might help explain why they finished so low in the league?
Well, maybe. The possession numbers aren’t predicting anything on their own. This chart is a mess compared with the standings. So why even bring it up?
Intuitively, if a team has an average shooting percentage, but takes more shots than any other team, they might score just as many goals as a team with a great shooting percentage who has an average possession game. Likewise, if a team’s shooting sucks and they can never get the puck anyway, they’re not going to win many games.
There may be some combination of these three factors – shot percentage, save percentage and possession – that is greater than the sum of the parts. One of these alone isn’t going to win games, but when we look at the three together, we might see a fuller picture of how a team performs on ice. I don’t know exactly what that relationship might be, but a regression analysis would.
So let’s regress these three variables on final points and see what happens. Our model of SH% + SV% now becomes:
Feeding this into a statistics package, we get this:
2013-14 NCHC Predictive Model
Well that’s interesting. SCSU, North Dakota, and UNO, all pretty decisively in that order. Duluth, Western Michigan and Denver are all separated by only one point, and were separated by just two in the actual standings (although Denver has two less instead of one more). Colorado College and Miami are tied for last (CC earned a few more standings points because of ties). The correlation between standings points and our model is .981, giving it an R-squared of .9639. Suspiciously high, but might make sense because these things are obviously related.This is looking pretty good, for present purposes at least.
The problem here is that I only have one season’s worth of NCHC data to work with. Does this model simply fit results we already had? Does it work for previous years? Does it work for other leagues? And does it even predict anything, or does it just tell us what we already know? Good questions. To answer them I need more data. But this post is already getting a little long, so I’ll post Part 2 on Tuesday.