The Science Behind the NHLe Tool

I won’t provide the weightings in the predictive models or how it all works for proprietary reasons but I will provide a taste of the models that fuel my probability scores in the Player Comparison Tool.

The model relies on, among other things, the NHLe tier the player finds themselves in at what age. What’s the player’s chances of being a star and/or an NHLer based on similar players that have come before them when you add it all up.

The way to do this is through a predictive model. There’s dozens of different models you can apply (linear regression, logistic regression, random forest, boosted trees, simulations, etc.). I tried about five different models, eventually settling on the best fit for myself.

There’s 1,000s of great articles on the line that will give you plenty of detail on predictive modelling and the various models one can choose from so I’ll spare you those details.

In an attempt to show how well my models work I will explain one concept of predictive modelling, however. When doing any sort of predictive modelling, you generally split the data in to two halfs. You use one model to ‘train’ on (e.g., determine what the significant variables are and how it all works in predicting the outcome) and then you ‘test’ on the other data set (e.g., this is new data that the model hasn’t seen yet. If you run the model you developed in train on the new data, do you get the same results?).

What I’ve done is take all the data from 1995 to 2012, split it into a train and test group. I’ve also used data the 1990 to 1994 as an additional test group (to see if it works decades ago) and also used the data from 2013 to Present to see if it’s predicting the players, in the early going of their career, look like they’re heading to be stars and NHLers. Backwards and forwards, it all works.

Here’s a few examples, with the 1995 to 2012 data, that show how effective the models are:

First, the Forward NHLer Model. I’ve split the data (by highest probability) into 20 groups (ventiles). In each ventile, I’m looking at how many turned into NHLers. You can see that the top ventile is picking up ~80% NHLers, then ~70% NHLers, then ~40% NHLers, etc. So it’s doing exactly what you’d expect. At the top end, nearly every player is an NHLer and past the 10th ventile (50th percent) there’s very few NHLers.

Second, the Forward Star Model. In this case I’ve also split the data into ventiles, sorting them the same way. However, there’s so very few stars that emerge (~4% of all players drafted will turn into stars) so it doesn’t make sense to look at the data in the same way. Instead, in this case, I’m looking at the entire population of stars in each group. What percent of all stars does each ventile make up? If it’s working correctly, most should be in the top ventile.

We can see that the top 5 percent of all scores pick up on ~60% of all stars, the next ventile picks up another ~20%. The next ventile another ~10%. The model is picking up 90% of all stars in the top 15 percent of the data. The ones it isn’t picking up on are one-off outliers that almost come out of nowhere and become stars (e.g., Blake Wheeler, Brad Marchand and Mikko Koivu).

In nearly every case, if a forward is going to be a star in the NHL, they better be in the top 15 percent in this model after their D+3 season or there’s very little chance they’re going to become a star! We see similar results for defenseman.

It’s important to note that in this particular model knows nothing about where the player was drafted. Bringing in the position or round that the player is drafted improves the model (i.e., picks up on more NHLers and Stars in the top ventile) as 1st rounders are much more likely to make the NHL than 2nd rounders, 3rd rounders, etc. This is due to 1st rounders tend to be more talented overall (hence why consensus rankings often have many of the same players ranked in the 1st round). But also 1st rounders get much more opportunity to make the NHL. You can be wrong about a late round pick, they’re not expected to make it. 1st rounders are expected to.

I hope this brief look behind the curtains of the models provides you with the confidence and knowledge that the probabilities in the NHLe Player Comparison Tool are highly reliable and accurate.