In Part One earlier this week, I attempted to devise a model to predict last season’s NCHC standings, using puck possession (shots) and bounces (shot and save percentage) as the input variables. Not only did the formula reasonably predict the regular-season standings, but it appeared to express a strong fit to the data. Once again, that model is:
However, I only have one season of NCHC data to work with. It’s possible this is an anomaly, or it’s possible I fashioned a model to fit the standings that we already knew. What I really need is another couple of data sets (aka a few more NCHC seasons) to test this exact model. I don’t have that for the NCHC because there has only been one season so far, but I have something close.
Six of the eight NCHC teams played at least three seasons in the WHCA, so I applied this framework to those three previous seasons, building a regression model that considers all three seasons at once to get a model that is significant beyond the .0001 level. The results:
|Team||10-11 Points||10-11 Rank||10-11 Model||10-11 Model Rank|
|St. Cloud State||26||9||29||6|
|Team||11-12 Points||11-12 Rank||11-12 Model||11-12 Model Rank|
|St. Cloud State||28||6||32||4|
|Team||12-13 Points||12-13 Rank||12-13 Model||12-13 Model Rank|
|St. Cloud State||37||1||37||2|
Ok, not bad, not bad. Some of the predicted standings are a little jumbled, but the actual predicted points are still very close. Furthermore, the predictive power is remarkably stable over three seasons, explaining roughly 93% of the variance. So far, so good.
Does it work elsewhere?
What if there’s something about the style of play around shooting, goaltending and possesion that makes this work? Does this model hold up for a totally different set of teams? Let’s look at the conference of 2014 national champion Union College, ECAC Hockey:
|Team||13-14 Points||13-14 Rank||13-14 Model||13-14 Model Rank|
Hmm… not as good, but not terrible. For ECAC 2013-14, the model only explains 81% of what happened. There’s a lot fewer points available, too, so that lessens the predictive power. Also, according to this, the race should have been a lot tighter between Union and Quinnipiac. Why did Quinnipiac earn four less points than expected?
Aha – Quinnipiac is NOISY. When they win, they tend to pile on goals against bad goalies. They’re padding their own numbers, so the model thinks they should have more points than they do. Interesting. Well, what about Harvard – they should have finished 9th instead of 11th:
NOISE. In close situations (ties as a proxy), their possession numbers are TERRIBLE. Just one more win would have had them at 7th. Ok, one more: Clarkson finished 5th but the model says they should have been 8th:
Look at that – their goaltending went on lock down in close games (according to this data, at least), so they won a few more points than they should have. Clarkson is a clutch team. Clutch = NOISE.
Our model doesn’t take to noise very well. It assumes every team has a baseline performance that’s consistent. These three teams did not have consistent play, so they introduce error and throw off the results. We’ll call this the “Quinnipiac Effect.” The model survivies, but it’s bruised. We’ll have to remember and consider noisy teams in evaluating predictions at the end of the season.
Inputs over time
I also want to take a look at just how much shot%, save% and possession ratio change over time. Let’s start with possession ratio. Here’s a graph for WCHA-NCHC teams, 2010-2014:
A bit of noise, but these seem pretty stable, or at least have clear trends. Minnesota-Duluth and Nebraska-Omaha have been strong possession teams all four years. Last year might have been a bit of an anomaly for UNO, with all their prolific shooters. North Dakota had an off year, it seems. Bad news for Denver: Generally the possession numbers change dramatically around a coaching change (which they had before the 2013-14 season). My theory is that, in college hockey, possession is primarily driven by playing style, and thus coaching. So a downward trend after a coaching change probably isn’t a good sign. It’ll be interesting to see what happens for Colorado College this year. Could be a hard call for a 2015 NCHC prediction.
Ok, now shot% and save%:
Hmm, shot percentage seems to be all over the map. Other than St. Cloud, I don’t see any real consistency here. My guess? Your top six are going to drive shot percentage, and that changes nearly every year. So we’ll have to pay attention to that in making a 2015 NCHC prediction.
And the goaltending? Well, individual talent pops out pretty strikingly. No surprises with Denver or North Dakota. No surprises with Nebraska-Omaha either (sorry, Mav fans). For next year, we can probably expect a returning goalie to maintain his save percentage. The new guys will be the tough part.
An aside: Why this model?
One last thing before we get to forecasting. I want to examine the model and think through its logic. The math checks out, we think, but are the inputs at all grounded in reality? Does our model translate into coach speak?
Here’s the model again:
Essentially, this breaks down into three parts. The first term is offense – shot%. In the various models it carries an r-squared of about .4. The next term is obviously goaltending (save%), and it’s also relevant with an r-squared of .45-.47. Finally, the possession term could be considered the defense portion of the equation. It seems to carry the highest predictive power – r-squared of .54. But none of these terms predicts the standings on its own. They’re all equally important, but they must be considered in conjunction to get an accurate estimation of how a team will finish.
So if you want to win the league, increase your goals and the shots against you that miss, and/or prevent shots against you in the first place. Sounds like the basics of hockey to me.
A 2015 NCHC Prediction
All I’ve done so far is explain why the standings ended up the way they did. I haven’t attempted to predict a season that hasn’t happened yet. So what about a 2015 NCHC prediction? These teams won’t be the same this year, nor will their bounces or possession game. Without a few more back seasons in this league, it’s going to be a little difficult to predict the second season. I’m going to need to make some calls about how to adjust these stats for 2014-2015. I’ll try to do that in Part Three on Thursday.