Welcome to a
long thorough analysis of my analysis – the NCHC model for predicting final standings. Before last season, I created a model to predict the NCHC final standings. In this post, I’m re-examining my methods and assessing their validity. Honestly, I thought I did OK last year, considering I did better than most of the expert writers and media. However, my model didn’t exactly nail the final standings, so there’s room for improvement. Today I’ll take a look at the components of the model and whether they worked as intended. Later this week, I’ll bring it all together for a 2016 prediction.
Be warned – this post is methods heavy. If you have an interest in NCAA hockey analytics, read on. If not, turn back now, and come back later in the week for my stats-based NCHC predictions. Still here? Ok, here we go.
First of all, kudos to the National Collegiate Hockey Conference for providing full-season and in-conference shot statistics for 2014-15. It’s a good start and a huge improvement over last year.
If you’d like to get familiar with the theory behind this model, I suggest reading my post from last year in which I created the NCHC model. Also, before we get started, the usual disclaimers: this analysis uses NCHC data taken from NCHC official records, and only considers intra-conference play during the regular season. Non-conference games, NCHC tournament games, and NCAA tournament games are not included.
2013-14 vs. 2014-15
|Team||'14 Sh%||'15 Sh%||Δ||'14 Sv%||'15 Sv%||Δ||'14 Poss.||'15 Poss.||Δ|
|St. Cloud State||12.46%||9.18%||-3.28%||91.14%||91.37%||0.23%||49.15%||53.70%||4.55%|
The adage says the best predictor of future performance is past performance. Looking at this table, that holds. St. Cloud lost some shooting prowess, and Omaha had better goaltending, but not much jumps out otherwise. Performance was fairly steady across the three categories from ’14 to ’15 – except possession. Wild swings there, eh? Omaha lost nearly 10 percentage points in possession share. Meanwhile, Denver and Miami saw great improvements, which was a big reason for their good finishes. I’ll take a look at what might be driving possession swings a little later. For now, when comparing ’14 and ’15 performance, you can group the eight teams into four categories:
- Greatly improved: Miami
- Somewhat improved: Denver, Minnesota-Duluth, Omaha, St. Cloud
- Somewhat declined: North Dakota, Western Michigan
- Greatly declined: Colorado College
Note also that goaltending generally improved across the board, while shooting dipped. That’s not surprising – those numbers are likely to fluctuate around the 9.5%/90.5% mark from year to year.
2014-15 Predictions vs. Reality
So we’ve looked at actual performance ’14 vs. ’15. Now, how well did the model predict ’15 based off ’14 data?
|Team||P-Sh%||'15 Sh%||ε*||P-Sv%||'15 Sv%||ε*||P-Poss.||'15 Poss.||ε*|
|St. Cloud State||10.02%||9.18%||-0.84%||90.21%||91.37%||1.16%||49.00%||53.70%||4.70%|
*error calculated same as Δ in ’14 to ’15 comparison, but technically not a “change.”
In the table above, green indicates the statistical prediction was a better indicator of NCHC performance than ’14 actual NCHC performance, while red indicates the stat prediction was worse. Generally, the model improves upon simply looking at past performance (15 better vs 11 worse). While good, that’s probably the minimum expectation any model should have. So while this method holds up, there’s certainly room for improvement, particularly on possession.
Of course, I used these stats predictions to build a regression model that would predict final points for each NCHC team before the season started (predicted points, or P-Points) and after the season had completed (model points, or M-Points). We can compare each of these to the actual final points each NCHC team earned in conference play.
|St. Cloud State||34||36||+2||42||+8|
First, some general observations on the points predictions. Generally, the model worked; that is, M-Points were much more accurate than P-Points. What that means is my stat predictions were off, but the mathematical way of predicting points has a solid level of validity. I just need to improve the inputs.
Because the model relied on data from the previous year of NCHC play, there wasn’t as much variance in range of points totals, so the model didn’t know how to handle a teams as good as North Dakota, or more importantly a team as bad as Colorado College. That led to the model awarding five fewer points that it should have. Also, St. Cloud had 3-4 very lopsided games in terms of goals scored, which skewed their shot and save percentages enough to throw off their points total. In addition, the teams they played in those lopsided games (WMU and CC) suffered for it. With a new year of data, we should be able to correct these things and make the model even better. I think ideally, we would see no team’s M-Points off by more than 2 points.
Team-Level Assessment of Predictions
Ok, so looking at these three tables above, let’s analyze what was predicted vs. what actually happened, team by team. This might help determine what areas need adjustment for next year.
Colorado College: The Tigers were not expected to do well in 14-15, and they did even worse than that. CC saw significant declines in performance from ’14 to ’15 in shot percentage, save percentage and possession. I predicted minor improvement in almost every area for CC, simply because they were already the worst team in the league on all measures and there should have been some regression to the mean. However, CC lost 32% of its team from ’14 to ’15 (33% of it possession), and could’ve used a few lucky breaks that never came. CC’s 2014-15 is probably an outlier, but a good example that not every bad team can be expected to improve. The big question now: Do they finally get better in 2015-16?
Denver: From ’14 to ’15, Denver made big improvements in shooting (9.08% to 10.49%) and possession (44.18% to 50.90%), while goaltending was slightly down. I predicted each of those stats to a reasonable degree, but I underestimated just how much possession would improve. Denver didn’t lose many players from ’14 to ’15, and those they lost were generally not great at possession or shooting, so it makes sense that those areas would improve, especially with the arrival of Danton Heinen. I predicted Denver would finish tied for third in the NCHC. They finished fourth. Not bad.
Miami: The Redhawks represent a case in which past performance doesn’t predict future performance, and I admittedly fell for the trap. Bringing back almost exactly the same team from ’14, Miami greatly improved in ’15 in shooting percentage, goaltending and possession, thanks to experience and better luck. While I did anticipate upticks in every area for Miami, I was off in the degree of improvement, especially possession. Nailed the goaltending at exactly 90.90%, though. Nevertheless, I picked Miami to finish sixth, and they almost won the league. So, again, probably need to take a look at how I’m predicting possession.
Minnesota-Duluth: Overall, the Bulldogs achieved about what I expected in ’15. None of their stats fluctuated much from ’14, though I was slightly off in prediction possession and netminding. I predicted it would increase slightly, but instead it decreased slightly. As for goaltending, well, I didn’t see Kaskisuo coming (but who did, really?). I thought they would be one of three teams tied for third, and they actually finished fifth. So a bit off there, but that has more to do with other teams performing better than expected than anything Duluth did.
Omaha: Not many thought UNO would be very good in ’15 – but I had them finishing second in the NCHC. They made it to third, so here’s why I was justified: I predicted their improved shooting percentage (although I was a little too optimistic), I predicted their improved save percentage (although I underestimated the improvement), and I predicted their drop-off in possession (although I didn’t realize it was going to be so huge). Looking back, I should have seen the possession decline coming. UNO lost 42.9% of its team from ’14 to ’15, which equated to 58.24% of its ’14 possession. That’s a lot to lose. I’m starting to see a trend here, and I’ll get to that soon.
North Dakota: This is another team where the model and my predictions were mostly vindicated. Goaltending got better as predicted and possession improved as predicted. However, I did think shooting percentage would increase, but I should have known better to pick an already high shooting percentage team to go even higher. I thought North Dakota would earn 48 points and win the league – they earned 50 points and won the league. Good enough for me.
St. Cloud State: The Huskies were one of my missteps in predicting ’15. I thought they’d finish tied for third, and they finished sixth. In addition, even after the ’15 season, my predictive model still thought they should be somewhere between second and fifth. This was likely because a.) SCSU beat up on the weaker teams in the conference, and b.) they had their fair share of bad bounces. Both meant they mathematically looked like they should have earned more points than they did. Not much I could have done there. However, I really underestimated their possession. I thought it would slightly decline – instead it greatly improved. I also though their goaltending would slightly suffer, but again, improvement. Some of this, however, was related to a. and b. above. SCSU’s feast-or-famine ’15 made them a hard team for the model to assess.
Western Michigan: Last but not least, I did perfectly fine with the Broncos, but considering they inhabited the dead space between Colorado College and the rest of the NCHC, that wasn’t hard to do. I predicted their shooting ability would decline, and it did. I also predicted their possession would improve, and it did. But I expected goaltending to be better than it was. I thought WMU would earn 30 points, slightly ahead of the 27 they actually earned, which is close enough for a relative ranking of the teams, but ideally I would have liked to be just one or two points closer. Still, not bad.
What stands out most from league- and team-level analysis is the difficulty of predicting possession. Had I been able to foresee some of those huge swings in possession from ’14 to ’15, I would have been able to make more accurate predictions. The good news is, with an additional year’s data, I think I can better predict possession changes. Here, I’m going to pull in a table I considered in making last year’s predictions, and I’m going to add some fields for actual ’15 performance:
|Team||% of team lost||Lost Sh%||Lost Poss.%||'15 Poss. Δ|
|St. Cloud State||21.7%||14.94%||22.06%||4.55%|
This table looked at what every team lost for ’15: percentage of their team, shooting percentage of the departed players and the share of the possession they made up. I’ve now added the actual change in possession from ’14 to ’15. Notice anything? Yeah, that’s exactly right: there’s a -91% correlation between lost possession and change year over year. UNO, which lost 58.24% of its possession, dropped 9.99 percentage points. Meanwhile, which retained 92% of its possession from ’14, saw a 4.65% increase. Furthermore, teams that retained the average NCHC amount of possession saw virtually no change in their possession from ’14 to ’15. This is good news for extrapolation to future seasons. It’s only one season so far, but it’s looking safer to assume that a team that loses a great number of players loses much of its possession, and a team that retains its players builds its possession game. Furthermore, it appears that retention has a greater correlation with possession growth than attrition has with possession decline. We can even calculate a formula for translating possession loss and subsequent team performance. Though we don’t want to overfit one year’s worth of data, I can use this concept for predicting 2015-16.
Freshman Imputation Method
As I mentioned earlier, shot percentage was fairly close for each team (I predicted 5 of 8 better than ’14 performance alone, and 6 of 8 in the right direction). However, there’s always room for improvement, and one method from last year that needs to be examined is the freshman imputation method.
Because we have no idea how incoming players might perform in the year ahead, I needed to create a way to estimate data for freshman skaters. For ’15, I simply took the average performance of each team’s 2014 freshman and applied it to incoming 2015 freshman. It’s a theoretically defensible method given that quality of freshman is often reliant on recruiting ability, prestige of the program, playing style of the team, etc., and assuming no major coaching changes, those things won’t vary much from year to year.
Below is a table of the imputed data for ’15 freshman, the actual performance of those players, how far off from reality my educated guess was, and how much of an effect it had on overall shooting percentage (measured by freshman share of possession):
|Team||'15 Fr-Imputed||'15 Fr-Actual||ε||Poss|
|St. Cloud State||5.33%||10.84%||5.51||22.52%|
Oof. That didn’t work at all. Only one of these estimates was even close. Again, only one year, but this strongly suggests past freshman performance is not a good predictor of future freshman performance (or at least not only the preceding year). Unfortunately, we don’t have many options in determining how incoming players will perform – we could examine their juniors or high school records player by player, but this data isn’t always available, and the translation between levels isn’t always clear. So I’m kind of stuck. However, I think this better-than-nothing way to improve freshmen estimates would be to average the last few years of freshman records. Seeing that I only have two years, I will use those two years together to determine what a value could realistically be for each team’s incoming freshman. The good new is that for most teams, freshman account for less than 20% of on-ice activity. It’s enough to move the scales, but in most cases, not by much. Ugh, rough.
Adjusting the Model
You still with me? You must be really bored at work or something. Either that or you are a true stats geek. Either way, thanks for reading this far. We’re almost done – this is the last phase of the assessment.
Finally, we’ll look at the model itself. Last year, that model was:
… and it did a decent job predicting the final rankings. With an extra year of data, however, we should be able to adjust the specific coefficients of the model to get an even more accurate model. Is that the case?
|Team||'15 SH%||'15 SV%||'15 Poss.||Points||'15 Model||New Model|
|St. Cloud State||9.18%||91.37%||53.7%||34||42||42|
The new model is a tiny bit better, but not much, and it doesn’t seem to matter that much on its face. However, the model now has an R-squared value of 0.9148, meaning it explains 91% of the variance (in English, only 9% of the points are due to unexplained factors). In addition, an extra year of data means all of the variables are now highly significant (<0.001). In other words, it’s a good model.
So why does it get some teams so wrong, like St. Cloud last year? Well, St. Cloud had three games in which they scored 6 or more goals against teams that scored 1 or less goals. Because of this, the model wants to award the Huskies extra points, but we no in reality that even if a team wins a league game 100-0, they still only earn three points. The model can’t interpret that division of point allocation, and it automatically interprets that single performance as an indicator of strength across the season. The model gets a little duped by outlier performances. Now, I could institute a rule that artificially suppresses large point differentials and outlier performances, but it would be just that – artificial, and somewhat arbitrary.
Long story short, I’d rather the model be methodically sound and slightly incorrect than produce near-perfect results through manipulation. And the longer we do this (the more NCHC seasons there are), the better this model will get. In year two, we can hopefully do better. By year five or 10, we should be golden.
Phew. Ok, I’m approaching 2,800 words, so that’s it for me. I think I’ve sufficiently explored and identified tweaks to the model. Check in later this week when I run the numbers for this year attempt to predict the 2016 final NCHC standings.
Hey, I can at least hope to do better than the writers…