clock menu more-arrow no yes

Filed under:

A peek behind the curtain: how do numbers get analyzed?

New, comments

"Six goals of goal differential equals one win."

"Shot differential is about 55% of winning."

"You can't predict a team's playoff results based on the momentum they showed in their last five or ten games."

We cite a lot of this kind of analysis on here. Some of it is done by people with statistics backgrounds, but very little of it actually requires a statistics background. As part of my ongoing effort to bring a wider audience into the stat-based conversation, I'm going to take a look at the very simple basis for some of these analyses.

Let's imagine you were wondering how valuable a goal is. A goal scored in a close game might be worth a point in the standings, but a goal scored in a blowout isn't. So how do we decide how much a guy adds to the team by creating 15 more goals per year than his replacement would?

Well, the Flyers have a goal differential of +44 this year, and they have 17 points more than the league average. So maybe each goal is worth 17/44 = 0.39 points. But last year they had a goal differential of +11 and had 4 points less than the league average. So unless you think scoring goals cost the team points, it should be clear that we can't just look at one or two data points to figure this out; we need a way to find a trend across a larger data set.

The good news is that this isn't hard -- if you have Excel, it won't even require arithmetic. Just paste a few years' worth of standings into Excel, insert a scatterplot chart where the x-data is goal differential and the y-data is points, and then right-click to add a trendline. Here's how it looks:


That black trendline is the best you can do if asked to draw a straight line through or near the data points (it's called a linear regression, and yes, that'll be on the quiz). Excel tells us that the equation for the line is y = 0.3415x + 91.447, which means that on average, a team with a goal differential of 0 will finish the season with 91 or 92 points, and that every extra goal is worth about 0.34 points. Since 0.34 is about 1/3, we'll say three goals is roughly a point and six goals is roughly a win. It's that simple.

Obviously, that's not an exact rule. Teams with a goal differential of +6 will sometimes finish at the predicted 93 points, but sometimes they'll be at 98 and sometimes they'll be at 85. That's where that other equation on the chart comes in, the one that says R2 = 0.90154.

R-squared is the answer to the question "does what we put on the x-axis determine the thing on the y-axis completely, not at all, or somewhere in between?" Or to put it in more visual terms, "in our graph, how close are the dots to the line?"

In this case, the R2 of 0.90 means that 90% of a team's expected point score comes from their goal differential and 10% comes from other factors; the dots are pretty close to the line, but not right on it. We can't say from this what those other factors are that move some dots a little bit above or below our trendline. There might be a lot of them -- luck, total goals per game, how early a coach likes to go with an empty net, clutchness, determination -- but we know they'll add up to about 10% of the reason why a team finishes with a certain number of points.

So that's all you need to start doing some basic analysis of your own. Interested to see how important faceoffs are? Figure out what the slope is when you correlate faceoff differential with points or goal differential (a faceoff win looks at first glance to be worth about 1/75 of a game win). Curious about whether more physical teams tend to have less skill? Look for a correlation (high R2) between hits and giveaways (there's almost none; if anything teams with more hits show fewer giveaways).

Let us know what you find out!