INVESTIGATING THE MULTIVARIATE NATURE OF NHL PLAYER PERFORMANCE WITH STRUCTURAL EQUATION MODELING
(April 2017)
Sean N. Riley
Abstract
Overall, it was found that the concepts of offense, defense, and possession are best understood via a small constellation of measured variables, and that offense mediates the relationship between possession and defense such that higher levels of offense leads to poorer defensive performance.
Introduction
Hockey performance metrics
Level 1: Raw performance metrics.
At the most fundamental level sits raw performance metrics such as goals, assists, shots, and-so-forth.
corsi, points, goals for, goals against, assists, and faceoff location.
Level 2: Relative to team.
The focus at this level is on taking raw metrics and situating them within the context of the entire team
Overall, the goal of τ-metrics is to get an idea of whether a player helps or hinders their team’s overall performance.
Level 3: Relative to linemates.
As beneficial as τ-metrics are, it is also helpful to know how a player performs relative to their linemates.
This approach is needed to strip away as much of the interaction between players as possible.
Structural equation modeling
Structural equation modeling (SEM) is a relatively new, and increasingly popular, statistical technique designed to address the issues outlined above by combining factor analysis with tools such as regression and analysis of variance
Overall, the goal of SEM is to specify a model whose estimated means and covariances (referred to as parameter estimates) fit the observed data
Aims of current research
offense cannot be fully captured by a single measure such as goals or points, nor can possession be fully captured by corsi for percentage, nor defense by goals against.
Moreover, the concepts of offense, defense, and possession are best described by a constellation of measured variables, thus it is beneficial if assessments of performance include enough measured variables to sufficiently capture the concepts in question.
only a small number of measured variables are needed to sufficiently capture the multivariate concepts of offense, defense, and possession, and that a system of regressions whereby offense acts as a mediator between possession and defense will generate parameter estimates that fit the data.
The structural model (Fig 1) has paths from possession to offense, possession to defense, and from offense to defense.
Further, all latent variables have disturbances to account for any unspecified predictors. These disturbances are uncorrelated under the premise that defense disturbances can largely be attributed to goaltender skill, which has no impact on offense; and that offense disturbances can largely be attributed to individual skills such as shooting percentage (how often a shot leads to a goal), which have no bearing on defense.
Moreover, possession disturbances can largely be attributed to metrics such as offensive zone faceoff win percentage and offensive zone entry metrics, which have no bearing on the skill metrics of offense and defense.
The theory behind each SEM is simple: (i) if a team/line/player spends more time in possession of the puck, then they are not only more likely to score more goals/points, but also have fewer goals scored against them; and (ii) players/lines with a high level of offensive output are more likely to have goals scored against them (possibly) due to missed defensive coverages brought about by an overemphasis on offense.
Ranking player performance
a hockey player’s overall performance. That is, a player’s overall performance is simply a composition of their scores on latent variables.
a player’s overall score exists as some combination of possession, offense, and defense; how these three scores are combined, however, depends on how much emphasis a person places in each of the above factors; if a person believes offense is more important than defense, then they will assign more weight to those scores.
Conclusion
offense mediates the relationship between possession and defense, and that this mediation occurs under multiple measurement models.
In having identified a model that conveys the multivariate nature of hockey, and that is applicable across multiple seasons, we are able to not only generate factor scores for latent variables, but also combine these scores into an overall score. These scores, be they for possession, offense, defense, or overall, can then be used to rank players in a more nuanced way that if we were to rely on measured variables alone.
Moreover, the ability to generate different overall scores by applying different weightings to latent variables allows us to prioritize components of player performance.
Comments