top of page

WAR 1 - Euclidean Distance

Writer: tmlblueandwhitetmlblueandwhite

HOCKEY AND EUCLID — CALCULATING STATISTICAL SIMILARITY BETWEEN PLAYERS

(Mar 29, 2015)

 

Plotting all regular skaters since the 2005-2006 season and separating forwards and defensemen by two measures (typically Rel CF% and P/60 at 5v5). 

 

I could then show the position of a particular skater on the graph, and more interestingly, generate a list of the skaters closest to that position.  These would be the player’s closest statistical comparables according to the two dimensions chosen.

 

The method I used to identify the points closest to a given player’s position was simply to take the shortest distances as calculated by the Pythagorean theorem.  This method worked fine for two variables, but the real fun begins when you expand to four or more.

 

In order to generalize the player similarity calculation for n-dimensional space, we need to work in the Euclidean realm. 

 

Euclidean space is an abstraction of the physical space we’re familiar with, and is defined by a set of rules.  Abiding by these rules can allow us to derive a function for “distance,” which is analogous to the one used above.  In simple terms, we’re calculating the distance between two points in imaginary space, where the n dimensions are given by the measures by which we’ve chosen to compare players.

 

Similarity is the distance between the two points in Euclidean n-space divided by the maximum allowable distance for that function, subtracted from one. 

 

The nature of the Similarity equation means that a 98% similarity between players indicates the “distance” between them is 2% of what the maximum allowable distance is.

 

HOCKEY AND EUCLID — INTRODUCTION TO BOMBAY RATINGS

(April 2015)

 

The similarity calculation evaluates “distance” between players, each occupying a position in imaginary space.  This space has as many dimensions as there are categories by which you choose to compare players, and the limits of each dimension are set by the maximum and minimum recorded values

 

HOCKEY AND EUCLID: CALCULATING STATISTICAL SIMILARITY BETWEEN PLAYERS PART 2

(Feb 2016)

 

Uses Euclidean Distance formula to determine similarity between players.

 

Imagine plotting each NHL player-season by dimensions corresponding to their goal and assist rates. Neighbours clustering around a given point would represent that point’s (player’s) closest comparables in these two measures. Recall that similarity between data is inversely proportional to distance.

 

Transitioning to Euclidean space allows us to generalize this approach for numbers of dimensions beyond the three we can visualize and interpret. We may now derive a standard similarity formula that functions in any number of dimensions.

 

One significant wrinkle in this approach is that we want to break free from the assumption that all dimensions are equally important. Thankfully, weights can easily be included in our formula. This allows for a great deal of flexibility.

 

The Distance Formula and Similarity Formula go here

 

The Similarity values may be interpreted as such: The Similarity between two players subtracted from 100 gives the percentage of the maximum allowable distance that is represented by the distance between the two positions occupied by the players.

 

In other words, a 98% similarity between players would mean the distance between their positions in imaginary space is 2% of the largest possible distance that could exist between two points in the space bounded by observed values of each dimension. If you had plotted players according to G/60 and A/60 as described above, this maximum distance would be given by the distance between two corners of the plot.

 

Recent Posts

See All

WAR 2 - Player Contribution

WAR. GAR. Euclidean Distance. Player Similarity Score. Player Contribution. GVT. ThOR. Point Shares. Game Score. Delta Box Score. BPM. WPAR

WAR 3 - GVT

WAR. GAR. Euclidean Distance. Player Similarity Score. Player Contribution. GVT. ThOR. Point Shares. Game Score. Delta Box Score. BPM. WPAR

WAR 4 - ThOR

WAR. GAR. Euclidean Distance. Player Similarity Score. Player Contribution. GVT. ThOR. Point Shares. Game Score. Delta Box Score. BPM. WPAR

Comments


bottom of page