2015 NCHC Prediction: Testing the Model

How will Miami and Duluth fare in a 2015 NCHC prediction?

In Part One earlier this week, I attempted to devise a model to predict last season’s NCHC standings, using puck possession (shots) and bounces (shot and save percentage) as the input variables. Not only did the formula reasonably predict the regular-season standings, but it appeared to express a strong fit to the data. Once again, that model is:

 \text{POINTS} = b + b_{1}(\frac{g_{f}}{sh_{f}})+b_{2}(\frac{sh_{a}-g_{a}}{sh_{a}})+b_{3}(\frac{sh_{f}}{sh_{a}})+e

However, I only have one season of NCHC data to work with. It’s possible this is an anomaly, or it’s possible I fashioned a model to fit the standings that we already knew. What I really need is another couple of data sets (aka a few more NCHC seasons) to test this exact model. I don’t have that for the NCHC because there has only been one season so far, but I have something close.

Six of the eight NCHC teams played at least three seasons in the WHCA, so I applied this framework to those three previous seasons, building a regression model that considers all three seasons at once to get a model that is significant beyond the .0001 level. The results:

WCHA 2010-2011

Team 10-11 Points 10-11 Rank 10-11 Model 10-11 Model Rank
North Dakota 43 1 45 1
Denver 37 2 34 3
Nebraska Omaha 36 3 35 2
Minnesota Duluth 35 4 33 4
Minnesota 31 5 31 5
Colorado College 28 6 27 8
Wisconsin 27 7 28 7
Alaska-Anchorage 26 8 22 10
St. Cloud State 26 9 29 6
Bemidji State 21 10 23 9
Minnesota State 20 11 21 11
Michigan Tech 6 12 7 12
ALL WCHA 336 336

WCHA 2011-2012

Team 11-12 Points 11-12 Rank 11-12 Model 11-12 Model Rank
Minnesota 40 1 39 1
Minnesota-Duluth 37 2 38 2
Denver 36 3 33 3
North Dakota 33 4 31 5
Colorado College 31 5 31 5
St. Cloud State 28 6 32 4
Nebraska Omaha 27 7 27 7
Michigan Tech 26 8 27 7
Bemidji State 25 9 23 10
Wisconsin 24 10 26 9
Minnesota State 18 11 19 11
Alaska-Anchorage 11 12 11 12
ALL WCHA 336 336

WCHA 2012-2013

Team 12-13 Points 12-13 Rank 12-13 Model 12-13 Model Rank
St. Cloud State 37 1 37 2
Minnesota 37 1 38 1
North Dakota 35 3 34 4
Wisconsin 33 4 29 6
Denver 33 4 31 5
Minnesota State 33 4 35 3
Nebraska Omaha 30 7 28 7
Colorado College 26 8 26 8
Minnesota Duluth 25 9 26 8
Michigan Tech 20 10 22 10
Bemidji State 17 11 18 11
Alaska-Anchorage 10 12 11 12
ALL WCHA 336 336

Ok, not bad, not bad. Some of the predicted standings are a little jumbled, but the actual predicted points are still very close. Furthermore, the predictive power is remarkably stable over three seasons, explaining roughly 93% of the variance. So far, so good.


Does it work elsewhere?

What if there’s something about the style of play around shooting, goaltending and possesion that makes this work? Does this model hold up for a totally different set of teams? Let’s look at the conference of 2014 national champion Union College, ECAC Hockey:

ECAC 2013-2014

Team 13-14 Points 13-14 Rank 13-14 Model 13-14 Model Rank
Union 37 1 35 1
Colgate 29 2 28 3
Quinnipiac 28 3 32 2
Cornell 26 4 22 6
Clarkson 24 5 19 8
Yale 24 5 24 4
Rensselaer 21 7 24 4
St. Lawrence 18 8 20 7
Brown 17 9 16 10
Dartmouth 16 10 15 11
Harvard 16 11 19 9
Princeton 8 12 10 12
ALL ECAC 264 264

Hmm… not as good, but not terrible. For ECAC 2013-14, the model only explains 81% of what happened. There’s a lot fewer points available, too, so that lessens the predictive power. Also, according to this, the race should have been a lot tighter between Union and Quinnipiac. Why did Quinnipiac earn four less points than expected?

Team Sh% Sv% Possession
Quinnipiac W 14.06% 95.42% 60.83%
Quinnipiac L 6.91% 87.10% 54.81%
Quinnipiac T 6.72% 89.29% 61.47%

Aha – Quinnipiac is NOISY. When they win, they tend to pile on goals against bad goalies. They’re padding their own numbers, so the model thinks they should have more points than they do. Interesting. Well, what about Harvard – they should have finished 9th instead of 11th:

Team Sh% Sv% Possession
Harvard W 10.13% 96.61% 47.16%
Harvard L 6.48% 90.29% 43.47%
Harvard T 8.82% 94.04% 40.32%

NOISE. In close situations (ties as a proxy), their possession numbers are TERRIBLE. Just one more win would have had them at 7th. Ok, one more: Clarkson finished 5th but the model says they should have been 8th:

Team Sh% Sv% Possession
Clarkson W 11.08% 91.19% 56.13%
Clarkson L 7.46% 84.62% 48.00%
Clarkson T 5.66% 94.55% 49.07%

Look at that – their goaltending went on lock down in close games (according to this data, at least), so they won a few more points than they should have. Clarkson is a clutch team. Clutch = NOISE.

Our model doesn’t take to noise very well. It assumes every team has a baseline performance that’s consistent. These three teams did not have consistent play, so they introduce error and throw off the results. We’ll call this the “Quinnipiac Effect.” The model survivies, but it’s bruised. We’ll have to remember and consider noisy teams in evaluating predictions at the end of the season.


Inputs over time

I also want to take a look at just how much shot%, save% and possession ratio change over time. Let’s start with possession ratio. Here’s a graph for WCHA-NCHC teams, 2010-2014:


A bit of noise, but these seem pretty stable, or at least have clear trends. Minnesota-Duluth and Nebraska-Omaha have been strong possession teams all four years. Last year might have been a bit of an anomaly for UNO, with all their prolific shooters. North Dakota had an off year, it seems. Bad news for Denver: Generally the possession numbers change dramatically around a coaching change (which they had before the 2013-14 season). My theory is that, in college hockey, possession is primarily driven by playing style, and thus coaching. So a downward trend after a coaching change probably isn’t a good sign. It’ll be interesting to see what happens for Colorado College this year. Could be a hard call for a 2015 NCHC prediction.

Ok, now shot% and save%:



Hmm, shot percentage seems to be all over the map. Other than St. Cloud, I don’t see any real consistency here. My guess? Your top six are going to drive shot percentage, and that changes nearly every year. So we’ll have to pay attention to that in making a 2015 NCHC prediction.

And the goaltending? Well, individual talent pops out pretty strikingly. No surprises with Denver or North Dakota. No surprises with Nebraska-Omaha either (sorry, Mav fans). For next year, we can probably expect a returning goalie to maintain his save percentage. The new guys will be the tough part.


An aside: Why this model?

One last thing before we get to forecasting. I want to examine the model and think through its logic. The math checks out, we think, but are the inputs at all grounded in reality? Does our model translate into coach speak?

Here’s the model again:

 \text{POINTS} = b + b_{1}(\frac{g_{f}}{sh_{f}})+b_{2}(\frac{sh_{a}-g_{a}}{sh_{a}})+b_{3}(\frac{sh_{f}}{sh_{a}})+e

Essentially, this breaks down into three parts. The first term is offense – shot%. In the various models it carries an r-squared of about .4.  The next term is obviously goaltending (save%), and it’s also relevant with an r-squared of .45-.47. Finally, the possession term could be considered the defense portion of the equation. It seems to carry the highest predictive power – r-squared of .54. But none of these terms predicts the standings on its own. They’re all equally important, but they must be considered in conjunction to get an accurate estimation of how a team will finish.

So if you want to win the league, increase your goals and the shots against you that miss, and/or prevent shots against you in the first place. Sounds like the basics of hockey to me.


A 2015 NCHC Prediction

All I’ve done so far is explain why the standings ended up the way they did. I haven’t attempted to predict a season that hasn’t happened yet. So what about a 2015 NCHC prediction? These teams won’t be the same this year, nor will their bounces or possession game. Without a few more back seasons in this league, it’s going to be a little difficult to predict the second season. I’m going to need to make some calls about how to adjust these stats for 2014-2015. I’ll try to do that in Part Three on Thursday.

Part One: Building a NCHC predictive model

Part Three: A 2014-15 NCHC Prediction

Leave a Reply

Your email address will not be published. Required fields are marked *