Expected Goals 1 - Introduction To Expected Goals

tmlblueandwhite
Dec 24, 2022
20 min read

Updated: Dec 27, 2022

GLOSSARY

https://web.archive.org/web/20170328004825/http:/blog.war-on-ice.com/

It’s important to know that xG-models differs depending on which source you’re using (I’m using evolving-hockey.com in this example). Basically these xG-models try to combine quantity and quality of chances. Expected goals depends on shot placement (distance and angle), shot type (slap shot, wrist shot etc.), if it’s a rebound and if it’s a first timer. Breakaways, odd man rushes and cross ice passing are not accounted for in xG-models.

In other words, xG-models are not perfect, but at least they try to account for the quality of the chances.

Expected goals correlates better (R-squared = 0.4771) with GF% than corsi does. This sounds about right, as we’re now factoring in the quality of each shot attempt. It’s important to note that goaltending and shooting ability is still not factored in. This means, you can interpret xG as the expected number of goals with average goaltending and average shooting. So if your team has an elite goaltender or elite shooters (like Ovechkin and Laine) you should expect GF% to be higher than xGF%.

Expected goal models and found they’re generally built with one of two methods: Logistic Regression: A statistical modeling technique that uses other predictor variables to predict the probability of one binary variable. In this case, the binary variable is whether a shot becomes a goal and the predictor variables are distance, shot angle, game strength state, whether a shot was a rebound, etc.Gradient Boosting: A machine learning technique for regression and classification problems.

The two key differences here are that models built on gradient boosting produce slightly superior results to those built on logistic regression, but logistic regression is much easier to code and implement and still provides good results.

SHOT QUALITY

http://hockeyanalytics.com/Research_files/Shot_Quality.pdf

EXPECTED GOALS

http://www.hockeyanalytics.com/Research_files/NHL-Expected-Goals-Brian-Macdonald.pdf

ASSESSING SHOT QUALITY

https://puckstopshere.blogspot.com/2007/08/assessing-shot-quality.html?m=1

(Aug 2007)

The NHL records six different kinds of shots. They are wrap-around, tip-in, backhand, slapshots, wristshots and snapshots. They also record the distance of each shot.

From this data, it is possible to produce a model that will give the expected number of goals from a given group of shots.

By taking into account shot type and location, it is possible to produce a spreadsheet model that gives the expected number of goals for a given group of shots against.

To be meaningful, any spreadsheet model of expected goals from a group of shots must include enough shots to be statistically valid, but must not go back so far in history that changing circumstances are averaged into the data (as an obvious example shot quality from the 1950’s when goalies went maskelss will be very different from that of today).

GOALTENDER PERFORMANCE

https://web.archive.org/web/20130701180326/http://www.behindthenet.ca/blog/2007/12/2007-08-5v5-goaltender-performance.html

I’ve put together a simple first-order system to analyze goaltender performance. First, I calculated the probability of scoring from 2x2-foot quadrants in the OZ.

I then took every shot faced by each goaltender and calculated the number of goals we’d expect an average goaltender to allow is he faced the exact same set of shots.

At any rate, the table below shows the number of goals allowed by each NHL goaltender this season (minimum: 300 mins) and the expected number of goals allowed by an average goaltender.

SH% BY DISTANCE

https://web.archive.org/web/20160509004817/http://www.boysonthebus.com/2013/10/28/shooting-percentage-by-shot-distance/

I’ve been exploring the concepts of shot distance, how they relate to shot quality, and what that can tell us about teams and players.

I think it’s important generally to be able to tell the difference between harmful shot situations and more harmless ones.

First, I wanted to find shooting percentage (and therefore, save percentage as well) by shot distance

We can see that there’s an obvious relationship between shot distance and scoring goals. Without getting into the details about the kinds of shots, rebounds, etc, I don’t think it’s a controversial proposition to suggest getting closer to the net to take shots is better than taking shots from further out.

Think of each offensive attack (or sortie) as an effort in getting into the best scoring position possible before attempting to get a shot off. Many times, this involves getting closer to the net, like a rugby team getting closer to a try. When you’ve exhausted your options in getting any closer, you shoot.

Logic would suggest that bad teams and players do poorer in this objective of getting closer to take shots than good ones. Getting closer not only aids in a shot on net going in, it also helps your shot attempt to hit the net:

The closer you are to the net, generally, the higher your odds of hitting the net are.

Now just think about those two engines for a second:

· Closer shots are more likely to hit the net

· Closer shots that hit the net are more likely to be goals

· These two forces compound on each other quickly to end in pretty dramatic results

Every foot of ground you can gain before letting your shot go increases your odds of scoring in a non-linear fashion. 8.7% of shot attempts from 20 feet result in goals. That percentage increases to 11.4% at 15 feet. But if you take a shot at 10 feet, it has a 17.4% chance of going in.

The point at which shots start to go in at below league average is about 27 feet. Anything nearer than that and shots go in at above league average, or looked at in a certain sense, it’s the point where shots become dangerous

By using this past data, we’ll be able to build expected goal probabilities for each shot taken on the ice, no matter how far away they are. Then we’ll be able to add these all up to show the expected number of goals a player is on the ice for and against based on shot distances.

PIECING THE SHOT PUZZLE TOGETHER

https://web.archive.org/web/20131116225427/http://www.boysonthebus.com/2013/11/12/piecing-the-shot-puzzle-together/

Over the last few weeks I’ve been experimenting with new formulas to determine how shot distance, shot type, and shot quantity can all be intertwined into a new shot-based statistic. Basically, I’m coming up with a way to weight any given shot taken on the ice based on its objective “danger” of going in. To do this, I’ve mined 5 years of data between 2007 and 2012. I’ve taken to calling this concept “Expected Goals”, as giving shot attempts a probability of going in also handily aggregates into how many goals you’d expect a team to create or allow given league-average goaltending.

Different shot types have different likelihoods of going in based on distances.

Now that I’ve found the observed shooting percentages, I’ll need to create mathematical functions to model them.

Add up all the probabilities of the shot attempts your team takes, divide by the combined probabilities of the shots taken by both teams, and that’s your team’s Expected Goal %.

Now why even do this? Some rationale:

· It helps to even out the randomness of goaltending seen in goal scoring ratios.

· This expected goal measure takes out something a player has little control over (his goaltender) and concentrates on the things he can control, namely attacking and defending.

· It also takes out the randomness of who’s doing the shooting.

This metric measures how good each team is at creating and denying danger. It has the added benefit of expressing things in a unit everyone can understand — goals.

Further, compare a team’s actual goals to its expected goals and you start to get a sense of what influence non-average goaltending has had on a team’s fortunes.

If we think player should have been on for 10 goals for and 15 goals against, but he’s actually been on for 10 goals for and 5 goals against, we know he’s been incredibly lucky to get great goaltending behind him.

This works at both the team and player level, just like how traditional shot metrics like Corsi or Fenwick can be applied.

It’s important to keep some things in mind. This doesn’t account for any contextual factors such as quality of team, quality of competition, or zone starts. It’s just like raw shot differentials in that regard.

These formulas assume a wrist shot taken at 12 feet during a zone pressure is the same as one taken on a breakaway. It’s the best we can do at this point.

DTMAH EXPECTED GOAL MODEL (2 ARTICLES)

http://donttellmeaboutheart.blogspot.com/2015/05/nhl-expected-goals-model.html?m=1

Shot quality and possession metrics have always been somewhat a point of contention. Expected Goals (ExpG) helps to combine these two facets in hopes of providing better information about the game.

The model works by assigning a value to each shot taken over the course of a season based on the model’s predicted probability of that shot resulting in a goal. To calculate a team’s final ExpG all you have to do is sum up all of these probabilities and there you have it.

Basically, it uses a bunch of independent variables to produce the odds of binary outcome occurring, in our case, yes a goal was scored or no a goal wasn’t scored.

Factors used:

· Adjusted Distance

The farther a shot the lower likelihood it results in a goal

· Type of Shot

Snap/Slap/Backhand/Wraparound/etc…

· Rebound – Yes/No?

A rebound is defined as a shot taking place less than 4

· Score Situation

Up a goal/down a goal/tied/etc…

There will always be some outliers in a given season but I think the model goes a relatively good job. The chart below shows that ExpG comes out on top when compared to Corsi and Scoring Chances in terms of correlation to real goals for and against in a given season.

http://donttellmeaboutheart.blogspot.com/2015/05/updated-nhl-expected-goals-model.html?m=1

The only substantial change from the previous version is that this one now includes rush shots.

As it has been previously shown that rush shots just by the very fact that they are rush shots result in a higher shooting percentage.

The ExpGF correlation jumped slightly from 0.58 to 0.61 yet the ExpGA correlation stayed consistent at 0.60. That isn’t to say adding rush shots didn’t effect the model. There is definitely some difference both positive and negative on certain teams, typically within the 10 goal range.

XTRAHOCKEYSTATS eGF EXPLAINED

https://archive.is/XNLkC

(Jan 2016)

The reason I created eGF is because the concept that all Corsi events are equal drives me crazy. They are so obviously not equal, and players play the game (i.e. generate and receive Corsi events) in such vastly different manners that it’s insane (to me) to think that the difference in events ‘even out in the long run’. So my first inclination was to include some kind of shot location metric in my eGF model to compensate for that.

What I actually did was break the zone into 27 separate areas and then looked at SH% from each of those areas further broken down into shot type. Each area/type was then given a score based on the conversion rate for that area.

But it still didn’t feel right.

Hockey is a sport where there are so many moving parts that it just feels wrong to look at a particular instant in time (in this case a shot event) and pretend that we have really any idea about what’s going on in the game.

Basic logic will tell you that because we see higher shooting percentages (i.e. fewer chances required to score the same number of goals) in shootouts than in powerplays, which in turn have higher percentages than even strength play, that the position of defenders relative to attackers – or defender readiness versus attacker readiness – plays a key role in determining chance conversion.

Factors used: length of play, rebounds, rushes, giveaways/takeaways, faceoffs.

Length Of Play:

The longer a play the harder it is to score. This kind of flies in the face of conventional wisdom – which says that possession leads to goals. While that’s true, when you bury your opponent in their end it becomes difficult to score.

Basically, other than small sample weirdness, long plays don’t convert to goals anywhere near as often as quick ones do.

Start Of Play:

A play must start with either a face-off, a shot attempt against, a giveaway/takeaway.

How the play starts has a reasonable impact on the chance for that play to convert to a goal.

Rebounds & Rushes:

I won’t bother going into rebound or rush work – since others have already proven that those are valuable contributors to determining conversion success.

Shot Quantity:

Adding additional shots into the play does not necessarily improve the chance to score by much. Which, if you think about for a minute means each additional shot lowers SH% and raises SV%. The very interesting thing about this is in goaltender analysis – a Randy Carlyle coached team (for example) would have a very high expected SV% as they are more likely to have multi shot plays against.

Shot Location:

Shot location matters a lot. Everyone knows it and everyone believes it – so I’m not going to go over it.

Next Step:

All we need to do is design an algorithm that classifies chances appropriately based on the known quantities that we have (play start, length, shot locations, rush, and rebounds).

In effect, that’s all that eGF is, putting those factors into a black box and putting each chance into a bucket based on the (expected) odds that it will convert to a goal.

This assumes an average shooter shooting at an average goalie. I think it makes no sense to include shooter (or goalie) quality in the calculation for eGF. In an expected goals model you shouldn’t have the target of exactly matching GF – we already have a stat for that – GF!

What you are trying to do is find players who are putting themselves (or teammates) in good scoring positions while limiting opponent chances. If you include shooter quality you immediately will find that the players with high historical SH% will be at or near the top of your eGF model all the time, while players who face them will have a higher eGA (all else equal). To me that doesn’t help you learn about who is playing well or not – it’s basically like putting in a reverse QualComp measure – the harder your comp the worse your eGF% (all else equal)!

Number Of Chances By Quality:

The absolute number of chances per season are relatively constant (by chance type, by season) and they tend to decrease in quantity as they progress from low quality to higher quality.

The vast majority of plays are just garbage where nothing much happens (Q0 and Q1).

Number Of Goals By Quality:

Despite the huge number of total chances in the Q0 and Q1 buckets, very few result in actual goals and again the absolute number per bucket is relatively constant in each season. A massive number of the actual goals (>50% if memory serves) are the result of extremely good chances (Q4/Q5), despite the low total number of those types of chances.

Conversion Rates By Quality:

The conversion rates are very steady on a season to season basis.

So to recap to this point – basically we view each play as a whole rather than discrete events. We then examine as many factors as we can to determine the overall quality of the chance and place that chance into a quality bucket (0 to 5). We then have a very good idea of the conversion rate for each chance so we can count the number of chances each team (or player) had and multiply by the appropriate conversion rate to reach eGF.

CORSICA - SHOT QUALITY & EXPECTED GOALS

https://corsica.hockey/blog/2016/03/03/shot-quality-and-expected-goals-part-i/

https://corsica.hockey/blog/2016/08/13/shot-quality-and-expected-goals-part-1-5/

(March 2016)

It’s foremost important to understand nobody (worth listening to) has or will argue that shot quality does not exist

That some shots are better than others is a core tenet of hockey and indeed any such sport.

The crux of people’s skepticism towards the relevance of shot quality in hockey analysis is the variance in this measure that has been observed at both the team and player levels.

Hockey is fraught with randomness and this imposes limitations on one’s ability to predict future outcomes or performance.

Despite this, we expect some semblance of persistent influence on certain aspects we assume to be driven by talent. When this persistence or repeatability is absent, I believe it’s fair to question a metric’s worth as an evaluative tool.

xG stats are by-products of assigning goal expectancy to shots:

· Shot type (Wrist shot, slap shot, deflection, etc.)

· Shot distance (Adjusted4 distance from net)

· Shot angle (Angle in absolute degrees from the central line normal to the goal line)

· Rebounds (Boolean – Whether or not the shot was a rebound)

· Rush shots (Boolean – Whether or not the shot was a rush shot)

· Strength state (Boolean – Whether or not the shot was taken on the powerplay)

The rationale here is that each shot subset should respond to the variables differently. Namely, distance and angle do not influence slap shot quality in the same manner as they do, say, deflections. In addition, the relationship between goal expectancy and distance or angle are not assumed to be linear. That is to say, the model is not bound by the idea that shot quality changes at a constant rate along the scales of distance and angle.

What factors are not included in the model: Particularly shooter and goaltender talent, as well as pure luck which we know is pervasive in what we’re attempting to measure.

The logical progression from having developed a method with which to assign goal expectancy to shots is to apply it to the end of player and/or team evaluation. Let xG define Expected Goals, the sum of goal fractions expected from observed unblocked shots. ixG will denote the xG value of unblocked shots taken by a player, while xGF and xGA will represent the xG value of on-ice shots For and Against respectively. xGF% is analogous to GF%, where goals have been substituted with xG. We can easily observe how the inclusion of a shot quality element yields a measure closer to true goal share.

This idea of projecting future outcomes is of great importance in analyses relating to hockey and indeed a great number of fields. In its present condition, 5v5 xGF% is not a better predictor of future 5v5 GF% than CF% at the player level.

Though descriptive of shot quality, the xG model has not yet shown to be appreciably predictive of future shot quality or goals at the on-ice level

MONEYPUCK xG MODEL

http://moneypuck.com/about.htm

This model predicts the probability of each shot being a goal. Factors such as the distance from the net, angle of the shot, type of shot, and what happened before the shot are key factors in the model. By adding up all the probabilities of a team’s shots during a game, we can calculate the team’s expected goals in that game. The model was built using gradient boosting.

MoneyPuck’s expected goals model uses a different variable strategy than other expected goals like from Corsica Hockey or HockeyGraphs.com. The MoneyPuck expected goals model does not explicitly use variables for rebounds or rush shots. Rather, it looks at the ‘speed’ between events: The distance on the ice between the shot and the event before it divided by the amount of time that’s elapsed. Also, for rebound shots the model looks at the change in angle between the shots divided by the amount of time between the two shots.

In general, the shots with the highest goal probability are quick rebounds shots close to the net where there has been a large change in shot angle from the original shot.

Variables In The Model:

1.) Shot Distance From Net

2.) Time Since Last Game Event

3.) Shot Type (Slap, Wrist, Backhand, etc)

4.) Speed From Previous Event

5.) Shot Angle

6.) East-West Location on Ice of Last Event Before the Shot

7.) If Rebound, difference in shot angle divided by time since last shot

8.) Last Event That Happened Before the Shot (Faceoff, Hit, etc)

9.) Other team’s # of skaters on ice

10.) East-West Location on Ice of Shot

11.) Man Advantage Situation

12.) Time since current Powerplay started

13.) Distance From Previous Event

14.) North-South Location on Ice of Shot

15.) Shooting on Empty Net

Flurry Adjusted Expected Goals:

Flurry adjusted expected goals is a statistic that discounts the expected goal value of the 2nd, 3rd, 4th, etc shots in a flurry of shots. These shots are discounted because they only had the opportunity to occur because the team did not score on a previous shot. Otherwise the puck would be back at center ice. This concept was discussed in a presentation at the Vancouver Hockey Analytics conference. Flurry adjusted expected goals have been found to be more repeatable and also more predictive of future winning than regular expected goals.

The definition of a flurry adjusted expected goal is:

Flurry Adjusted Expected Goal Value = Chance of Not Scoring in Flurry Yet * Regular Expected Goal Value of Shot

Shooting Talent Adjusted xG:

By simulating each player’s career shots thousands of times, we can see the probability that a simulated player set to any true shooting talent level (such as 10% above average), would do as well as that player actual has.

MoneyPuck ran over 10,000,000 simulations in total for the 800+ players who have played in the 2019-2020 NHL season. The result of this is probability distribution graphs of the likelihood of every player truly being a given shooting talent level, which you can see on the bottom of the player shooting maps. We then take the weighted average of the probabilities to get the best estimate of the player’s true shooting talent

Expected Rebounds and ‘Created’ Expected Goals

Just as every shot has an expected goal value, it can also have an expected rebound value. This is the probability that the shot will generate a rebound. Rebounds are modeled in the same way expected goals are using the same variables. If a goalie gives up more rebounds than this model predicts, it may be a sign that the goalie has poor rebound control or that goalie plays for a team that struggles clearing out the front of the net. Cole Anderson of Crowd Scout Sports has also done research into expected rebounds, with a focus on the goaltending side.

We can also calculate the expected goals that are likely to come from a rebound of a shot. This metric is called ‘expected goals of expected rebounds’ (xGoals of xRebounds). The rebound shot does not need to be taken by the same player. In fact, the rebound does not need to actually even occur. The shot just needs to have attributes that are more likely to generate a rebound. As there is a lot of luck in getting a rebound or not, this metric credits players who have shots that are likely to produce rebounds in general.

Expected Goals Of Expected Rebounds = Probability of the Shot Generating a Rebound * The Expected Goals of The Possible Rebound Shot

Some shots actually have a higher xGoals of xRebounds than the xGoals of the shot itself. These are usually shots that occur far from the net by defensemen.

By combining xGoals from non-rebound shots and xGoals of xRebounds, we can create a metric called ‘Created Expected Goals’. This metric attempts to give credit to the player who does the work generating the xGoals. Compared to the xGoals metric, it punishes players who just feed on the rebounds of other’s shots. Defensemen tend to do better in this metric than xGoals, while some centres often due worse. While we cannot accurately always assign credit for ‘creating’ an xGoal, this metric tries to make it more fair than just giving all the credit to the shooter. xGoals from rebounds are given no direct credit in this metric. Rather, credit is given to players who take shots that are likely to generate juicy rebounds.

Created Expected Goals = xGoals of Non-Rebound Shots + xGoals of xRebounds

BARLOWE - NHL EXPECTED GOALS MODEL

https://rstudio-pubs-static.s3.amazonaws.com/311470_f6e88d4842da46e9941cc6547405a051.html

Mainly coding stuff

GAME THEORY & xG MODELS

https://crowdscoutsports.com/game-theory/expected-goal-xg-model/

Hockey is also a zero-sum game. Goals (and expected goals) only matter relative to league average. Original iterations of the expected goal model built on a decade of data show that goals were becoming dearer compared to what was expected. Perhaps goaltenders were getting better, or league data-scorers were recording events to make things look harder than they were, or defensive structures were impacting the latent factors in the model or some combination of these explanations.

Modeling each season separately, total season xG will be very close to actual goals. This also grades goaltenders on a curve against other goaltenders each season. If you are stopping 92% of shots, but others are stopping 93% of shots (assuming the same quality of shots) then you are on average costing your team a goal every 100 shots. This works out to about 7 points in the standings assuming a 2100 shot season workload and that an extra 3 goals against will cost a team 1 point in the standings. Using xG to measure goaltending performance makes sense because it puts each goalie on equal footing as far as what is expected, based on the information that is available.

Crudely, each goal prevented is worth about 1/3 of a point in the standings. Implying how many goals a goalie prevents compared to average allows us to compute how many points a goalie might create for or cost their team. However, a more sophisticated analysis might compare goal support the goalie receives to the expected goals faced.

xG’s also are important because they begin to frame the uncertainty that goes along with goals, chance, and performance. What does the probability of a goal represent? Think of an expected goal as a coin weighted to represent the chance that shot is a goal. Historically, a shot from the blueline might end up a goal only 5% of the time. After 100 shots (or coin flips) will there be exactly 5 goals? Maybe, but maybe not. Same with a rebound from in tight to the net that has a probability of a goal equal to 50%. After 10 shots, we might not see 5 goals scored, like ‘expected.’ 5 goals is the most likely outcome, but anywhere from 0 to 10 is possible on only 10 shots (or coin flips).

Goaltending is a simple position, but the range of outcomes, particularly in small samples, can vary due to random chance regardless of performance. Results can vary due to performance (of the goalie, teammates, or opposition) as well, and since we only have one season that actually exists, separating the two is painful. Embracing the variance is helpful and expected goals help create that framework.

It is important to acknowledge that results do not necessarily reflect talent or future or past results. So it is important to incorporate uncertainty into how we think about measuring performance. Expected goal models and simulations can help.

FBG EXPECTED GOALS MODEL (2 ARTICLES)

https://fooledbygrittiness.blogspot.com/2018/01/expected-goals-model.html?m=1

1. Distance: Distance of shot from the net

2. Angle: Angle of shot

3. Shot Type: Slap Shot, Snap Shot, Wrist Shot, Deflected, Tip-In, Wrap-Around, Backhand

4. Off Wing: If the player took the shot from his off wing

5. Empty Net: If the net was empty

6. Strength: 5v5, 4x5, 5x5, 3x3…etc. for the shooting team

7. Score Category: Score differential for the shooting team. It spans from -3+ to 3+ (I just bin everything above 3 and below -3)

8. Is Forward: If the shooter is a forward

9. Is Home: If the shooter plays for the home team

10. Distance Change: Distance from previous event

11. Time Elapsed: The difference in time from the last event

12. Angle Change: The change in angle if it’s a rebound shot (last event was an SOG <= 2 seconds ago)

13. Previous Event & Team: Whether the previous event was a Fac, Sog, Block/Miss, or a Take/Hit (I changed gives to takes for the other team) and for which team. This is represented by eight dummy variables (the four choices for both teams).

As you can see I chose to not model rushes and rebounds explicitly (in a similar fashion to Moneypuck) like most other models do. I can’t imagine it makes too much of a difference but this is how I personally like it. I also looked into incorporating shift info into the model like Macdonald (How long the shooter was on the ice for and the average shift length for both the Ev. Team and the Opp. Team). Some early testing suggested the the importance of those features were small and since it takes a while to calculate the info for every shot I chose not to include them

Another thing to note is I chose not to include “shooting talent” as a model feature. I plan on writing more on this in the near future so I’ll keep this brief but I think whether or not to include it as a model feature depends on what you are trying to measure. I also think more care could be taken in how it’s calculated.

Nothing really new was done here but it’s always good to go over your methodology if you make the model outputs public. I’m also sure that the model could be improved in certain areas. Some inputs could be added (like shooter talent) and some parts could possibly be cleaned up a little.

https://fooledbygrittiness.blogspot.com/2018/03/shooter-talent-and-expected-goals.html?m=1

While I do generally agree with this, I do have a concern. I guess the best way to explain it is by talking about what we want our “shooter talent” input to do. A standard xG model controls for the situational factors (sans the shooter) and tries to determine the probability of a shot of being a goal. But we know this isn’t enough. If a shot with the same xG is taken by Steven Stamkos and Zac Rinaldo…I think we’d all agree one shot has a better chance of going in (if not I guess you can just stop reading now). So, therefore, we want to put shooter talent input to account for how much better than expected we think this player is.

Sh% is a combination of: the standard probability of one’s shots (“if this player get high quality chances”) and the player’s shooter talent (How much better than expected they do). A player could have a high sh% because of the quality of shots he takes or because of his actual shooting ability. We need to distinguish between the two and just focus on how much better than expected this player is controlling for the quality of shots he gets (since the other variables in our model attempt to control for the quality of the chance itself).

Well, so how do we do that? I talked about it a while back (somewhat clumsily) and it really just comes down to Goals/xGoals (this would be our “shooter talent” multiplier). Then for each player we could use his previous data and regress it to get our best estimate of his multiplier.

https://github.com/HarryShomer/xG-Model

More code.

EW – xG MODEL WITH CODE

https://rpubs.com/evolvingwild/395136

(June 2018)

Note: lots more code here too.

Currently, all public models use all situations fenwick shots with strength state as a specific feature variable (a “categorical” variable). Given the significant differences in play styles and scoring rates between even-strength, powerplay offense, shorthanded offense, and empty net situations in hockey, it seemed like a good idea to build four separate models for each of these specific play states. In the initial stages of testing, we determined that there was a benefit to creating separate models for each of these four strength states. Additionally, we hoped to incorporate multiple models in an ensemble; however, the additional training time and size of the data made this impractical. Our final model consists of four separate models: even-strength (5v5, 4v4, 3v3), powerplay and man-advantage situations (5v4, 4v3, 5v3, 6v5, 6v4), shorthanded offense (4v5, 3v4, 3v5), and shots directed towards an empty net. We used an algorithm called “eXtreme Gradient Boosting” – better known as “XGBoost” – for all four of these models.

Clearly, shot distance is the most “important” variable in the model, regardless of strength state.

Variable Importance:

· Shot Distance (By far most important)

· Seconds Since Last Shot

· Shot Angle

· Distance From Last Shot

· Game time (Middling)

· Shot Type (Middling)

· Score Effects (negligible outside third)

We wanted to include a shooting talent variable; however, the algorithm did not feel the same. This variable (in each model) was never used in any decision tree that was generated – essentially, it determined that this variable added no value for predicting goals.

It makes intuitive sense that shooting talent should be included in xG, at least we think it does. We both have our theories about this, but this warrants additional research. Harry Shomer developed a very clever method where an initial xG model was built, and the xG values from that model were incorporated into a second model as a shooting talent variable here. He found that this increased the model’s performance. This is a method we would like to potentially explore in a future version of this model.

The shots with the highest probability of being a goal are those that are taken directly in front of the net that follow another shot. The model has identified this sequence as being an extremely high-danger situation.

Future Work: While we’d really like complete passing data, the above statements are mostly conjecture at this point. Both of us are fairly confident that any xG model would benefit somewhat significantly from knowing where a pass came from and when it occurred… and we haven’t even touched on zone entries/exits. A lot of this data is currently being tracked publicly, and this could be incorporated into future versions of xG models. Regardless, we feel the model still performs well given the data available.