dCORSI
(Jul 3, 2014)
INTRODUCTION TO dCORSI
(July 19, 2014)
At this point in the hockey analytics (or #fancystats) community one of the biggest problems with analyzing and assessing hockey play from a possession – and particularly defensive – perspective is tracking which players actually make a significant impact within a given system. Generally this has been tackled by assessing a player’s results within the context of their usage, then comparing a variety of statistical metrics to those of their “usage” peers.
In an effort to determine a skater’s personal impact on shot based possession metrics (Corsi) I conducted a multi-variate linear regression using R to assess the player’s Expected Corsi For and Expected Corsi Against. The residuals (differential) between the Expected Corsi For and observed Corsi For and the Expected Corsi Against and observed Corsi Against are then combined into a single dCorsi (delta Corsi) score. This dCorsi value represents the seasonal average level above or below Expected Corsi a player has produced for every 20 minutes of 5v5 game play in a given season when usage is taken into account.
The logical justification for separating out Corsi For and Corsi Against is built upon an examination of the correlation between the two. They were found to have a Pearson’s R correlation value of -0.13, with a coefficient of determination (R2) of 0.02. Thus the explanatory value of Corsi For to Corsi Against or vice versa is very weak (approximately 2%). The linkage between the two has been over-stated in many corners in the past – apparently at the individual skater level this is a flawed assumption.
dCorsi represents the unexplained residual portion of Corsi results observed for a given skater in a given season. Admittedly variation in these numbers can arise from a combination of randomness, skill, and other factors not assessed in this regression such as coaching, defensive or offensive system structures, and injury. Also, it should be noted that smaller samples are inevitably prone to greater variation due to random occurrences, and are thus less reliable as a descriptor of skill. As we move to larger sample sizes (i.e. greater time on ice) we develop a clearer picture of a player’s impact on their team’s shot differential.
WHAT IS dCORSI & HOW TO USE IT
Here we are using delta to represent differential, specifically the differential between a skater’s Expected Corsi and Observed Corsi.
The differential between Expected Corsi (as determined by regression) and Observed Corsi depends on the “usage effects” explained by Expected Corsi – this is what you’d expect out of a perfectly average player in an average season if he was handed the same minutes with the same players against the same opposition. The other half of this coin is what the player in question does with their minutes. How do the shot differentials work out and how far away is it from the expected result?
In an individual year the number will shift due to random factors and effects but over time (particularly multiple seasons) a lot of this washes away and you’re left with a pretty solid measure of player skill relative to their usage
There are outside factors that impact upon Corsi. We have adjustments for a bunch of them. Zone Starts and Faceoffs? Yep. What other things affect Corsi? Winning or Losing faceoffs. How about the effects of teammates or the team system itself? Well for that we use Corsi REL (apologies for linking to HEOTP)… but that has its own potential problems.
So how do we deal with the issues presented by these outside factors? Well we have adjustments for all of the above and we also adjust for other situations like game state (i.e. score effects). The problem that arises with all of these various adjustment factors and distinct statistics is that it becomes amazingly onerous to delve into comparison between players on the same team, let alone across the NHL.
Basically players with extremely high or consistently high dCorsi values are playing above their usage, while players with extremely low or consistently low dCorsi values are in over their head with respect to their usage. In either case, it should probably be adjusted if possible to improve how they are being used.
dCORSI & USAGE COMPARISON
https://numberpuck.wordpress.com/2015/02/19/calder-trophy-race-analysis-dcorsi-and-usage-comparison/
It allows for us to look at not only how players are being deployed (looking at their Expected Corsi numbers), but it allows us to compare players using their delta numbers, which are the differences between their expected and observed corsi for and against numbers. What it lets us do is compare players across different usages in order to see which players are actually under- or over-performing.
A high dCorsi shows that a player is contributed more to the possession game than he is expected, and vice-versa for a low (negative) dCorsi number. The dCorsiImpact number simply shows the cumulative sum of “dCorsi” a player contributes, based on their total ice time for the season.
The dCorsi number is only one way that Burtch has allowed us to view the usage of individual players. The dCorsiImpact number is very helpful when trying to quantify the possession impact of a player over an entire season or over a span of games.
The smart and clever guys at WarOnIce have split this number into two separate categories: dCFImpact and dCAImpact. This allows us to see if a player is getting that big impact number as a result of strong offensive play or as a product of good defensive possession work.
A positive dCFImpact number is good, and a negative dCAImpact number is good. The best players, or most well-rounded, should fall in the lower-right quadrant of the graph. This would mean they are driving both more possession for than expected and less possession against than expected. But really what this plot shows is where their dCorsiImpact number is being derived from.
DELTA (SOT)
INTRODUCING DELTA SOT (2 ARTICLES)
Delta, a Corsi-type number that uses weighted shots instead of raw shots as its plus-minus data.
The benefits of Corsi are numerous: they rely on a much larger sample than plus-minus, between 12X and 25X depending on whether you also include missed and blocked shots, which means that Corsi numbers are much more reliable and less noise-prone. They also naturally factor out the effect of goaltending, which causes plus-minus to be even more team-dependant than it already is.
However, Corsi numbers also have their weaknesses. While Corsi is useful as a proxy for territorial advantage, it does not factor in the quality of the shots being generated or allowed. My historical preference for plus-minus stemmed from the fact that a player may be doing several subtle things right that will, in the long-term, show up in his plus-minus. This remains true: however, the long-term can be several seasons or even longer, and most of us would prefer a metric that allows us to judge a player before he has retired, not after. The classic bugaboo of Corsi is Scott Gomez: a player who generates a large number of low-quality shots, thus artificially inflating his Corsi without generating a commensurate number of scoring chances. Corsi is also influenced by game score, as trailing teams generate more shots.
The main factors that affect shot quality (distance, rebounds, game situation, game score) are well known. Simply weight each shot by its expected chance of resulting in a goal, do the difference of for and against as in Corsi, and voila! Delta, the love-child of Corsi and plus-minus, is born
Is Delta a better predictor of plus-minus than Corsi is? Luckily the answer is yes, otherwise you probably wouldn’t be reading this right now.
Having established a raw benchmark that has some value, we can now apply the golden rule of hockey statistics, as per the Oilogosphere: context is everything. In terms of influencing Delta, the most important factors are the number of faceoffs taken in the offensive and defensive zone, and the percentage of faceoffs that your team wins. We can now correct Delta to produce DeltaS (for Situation), which gives Delta working in a zone-neutral environment.
Obviously, the starting zone of faceoffs is not the only factor that influences a player’s result. The strength of his teammates and strength of his opponents also factors in. Note: compensating for teammate strength is fraught with peril, and must be done with caution.
The magnitude of this effect is smaller than that of faceoffs: you can start 70% of your shifts in the defensive zone, but you can’t start 70% of your shifts against Getzlaf (except in the playoffs, but that’s a story for another day). We now have DeltaSO (for Opponents).
All that remains is to adjust Delta for Teammates, and we will have a first stab at a complete measure of a player’s contribution to scoring chance advantage, which I have baptized DeltaSOT.
DeltaSOT is not meant to be an all-encompassing statistic. It doesn’t factor in many things, most notably offensive or shooting ability, which is why players like Jason Blake and Markus Naslund are found among the leaders.
Overall, I’m happy with DeltaSOT. It still requires some adjustments to take into account the varying level at which teammates play, but overall it has the right feel to it, and given the scale of the problem it attempts to solve it is a valiant try.
YET ANOTHER ADVANCED STAT – DELTA SOT
Plus/Minus seems relevant to a lot of people but stat geeks are moving away from it because of it’s inability to account for individual contributions. It has more value over time but it’s hard to see how much sense it makes in the smaller sample size of an individual season.
Corsi numbers are being used more and more to assess puck possession but they don’t do a wonderful job of accounting for the quality of scoring chances. Again, we might want to use something slightly different.
Delta: Basically a statistical combination of the ideas of Corsi numbers and +/- Delta weights every shot on the ice (the Corsi portion) on the basis of the expectation of whether or not said shot will result in a goal (the +/- portion). The shot-quality is determined using models around shot distance and location on the ice as well as the game score.
The whole purpose behind the exercise of determining this new statistic is basically adjusting the concepts around puck possession in terms of ability to generate and prevent good scoring chances. I am particularly interested in adjusted Delta Values, which control for the various players on the ice with a given player, as well as the game situation in terms of current score. In the end, we get a fairly solid determination of the quality of a given player.
A value above zero indicates that the number of expected goals with the player on the ice was in his team’s favour rather than for the opposition.
A Delta SOT score of zero would be a perfectly neutral player. They give up and produce chances of identical quality and number. Obviously virtually no players will accomplish this feat, so any player on the positive side of the docket produces more good scoring chances than they allow, while players on the negative side allow more good scoring chances than they produce.
SHOT QUALITY PLAYER EVALUATION
Can we improve on Corsi by factoring in shot location and type?
Not all shots are the same, you know. The stat community relies heavily on shot differential measures (Corsi and Fenwick) that don’t make any effort to account for the quality of those shots. But of course shot quality also has to matter a little bit, so why not try to factor it in?
Instead of treating all shots as equal, we out a weight on each shot based on how often shots of that type and location go in.
The correlation between a player’s Delta one year and his Delta in the next year is 0.35. That’s a modest figure – enough that a player’s Delta in one year tells us something about what to expect next year, but players obviously bounce up and down a fair bit from year to year.
But it’s not a strong relationship – players who are near the bottom of the league in year 1 are all over the map in year 2, so our points scatter into more of a blob than a line.
Is Corsi more repeatable? Yes, it is; the repeatability is 0.56
The question is really whether factoring in shot quality leads to a stronger correlation to future scoring – is the added information more important than the added noise?
No matter what form we put the data in, Corsi does a better job of predicting next year’s goal differential with the player on the ice than Delta does.
This is because the shot quality factor in Delta has a lot of randomness in it. In fact, the variability is so bad that not only is Delta a worse predictor than Corsi of future goal differential; Delta is even a slightly worse predictor of future Delta.
Remember, the only difference between Delta and Corsi is that Delta accounts for the location and type of each shot. So if it’s doing significantly worse at predicting the future, that implies that including this shot quality factor really isn’t helping.
The shot quality factor just appears to add more noise than value over sample sizes of ~82 games, which is why these metrics have never really caught on.
Shot quality measures have a lot of random fluctuations, but they can add value in some instances
However, to get the most out of them, they must be regressed properly – we have to pull our estimates in towards the average so that those random fluctuations don’t have a large influence on our assessments.
iCF & iFF
Corsi and Fenwick can be applied to an individual in two ways. The concepts can be used to measure a player’s iCF or iFF — their individual Corsi For and individual Fenwick For — in addition to playing a role as an ‘on-ice’ measure. By doing this, you can gain a better understanding of what happens when a player is on the ice compared to their peers and competitors. We’ll get into this more when we look at relative stats.
If a player piles up an “8 iCF” at 5-on-5 in a game, that means they attempted eight shots during 5-on-5 play that either went on net, were blocked, missed the net or resulted in a goal. When we look at what happens when a player is on the ice, we look at the ratio between two events, i.e. Goals For versus Goals Against (GF%), Corsi For versus Corsi Against (CF%) or rate stats (we’ll get there too).
Comments