Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Indy 500: 'Greatest Spectacle In Racing' Set For Sunday

Judging Goalies: Should We Include PK Save Percentage?


I've seen a number of discussions lately about the best way to predict future goaltender performance. The analytical community showed long ago that because a goalie doesn't see very many PK shots per year, simple luck doesn't come anywhere near balancing out and a goalie's PK Sv% bounces around almost completely randomly from year to year.

From that, it was natural to infer that the penalty kill just adds noise to our measurement of goalies and that we should focus on even strength save percentage (ES Sv%) instead of total save percentage. This would also presumably remove any unfair advantage a goalie gets in total save percentage by playing for a team that doesn't take many penalties. And so it became a widespread belief that ES Sv% was the best measure of goalie talent.

I personally made that argument just a few days ago, arguing that James Reimer's ES Sv% is a better predictor of his future results than his overall Sv% is. And yet when I went looking for an article that showed this directly, that made the leap from theoretical to empirical, I couldn't find any.

Star-divide

I asked around and most of the people who I talked to thought they'd seen such an article, but nobody could quite put their finger on it. Finally, Kent Wilson of Flames Nation tipped me off to an article by Tom Awad. The article basically asked the question, "if I want to see how a guy's going to do next year, which of this year's numbers should I look at?"

The answer was that whether you are trying to predict a guy's overall performance or just his even strength performance, you are better off looking at his total numbers for this year than his even strength numbers.

This is consistent with the idea that PK Sv% and ES Sv% measure largely the same talent, and that the variability of the PK Sv% comes mostly from the small sample sizes. If that were the case, removing the penalty kill results would be kind of like removing the last five games of each year -- the goalie's performance in the last five games isn't reproducible from year to year, so the data doesn't have much value on its own, but it still helps improve the overall sample size.

I wanted to figure out how much difference this really makes, and whether the answer to that question is dependent on how much of a sample we have to work with. To answer that, I started with the even strength and overall data for every goalie who has played since ES data became available in 1997-98. At the end of each season, I logged each goalie's career totals up to that point and his totals from that date forward. I could then look at how the career numbers predicted the future as a function of how many career starts the player had.

Predicting_save_percentages_medium

The blue curve represents how well we do at predicting a goalie's future overall Sv% by looking at his current career ES Sv% -- it trends upwards because the more games he has played so far, the more we know his true talent and the better our predictions are. The red curve shows how we do by looking at his current career ES Sv% instead when we make our predictions, and the green curve is the difference between the two.

It turns out that until the goalie has about 150-175 starts, the two measures perform almost identically in predicting a goalie's future -- it doesn't matter whether you use career Sv% or career ES Sv% early in his career. Once a guy gets up towards 200 starts, ES Sv% does start to look like a better measure (the green curve rises above 0), although the fact that the gap closes again by 300 starts leaves me wondering if this is just a statistical quirk.

The above plot includes all of the goalies who played in the last 14 years, even the ones who didn't play much. We're trying to predict a guy's future save percentage, but if the guy only plays 6 more games, it won't really matter whether he has a 20- or 200-game history for us to look at; we'll probably lose to the randomness over that 6-game sample.

To see whether that kind of noise was what made ES Sv% look like the better predictor over large sample sizes, I filtered the data to only include guys who went on to face at least 2000 more shots and repeated the analysis. Here's what we see in that case:

Predicting_save_percentages_3_medium

Now we see a much more steady rise in predictive power as a function of games played. This is partly because we have reduced the noise, but also partly because we have introduced some selection bias: a goalie is only likely to eclipse 300 starts if he plays fairly well, and he is only likely to face 2000 more shots if he continues playing well, so the correlations get pretty strong because the filter has introduced some bias.

However, that bias probably affects both inputs equally, so the difference between the correlations shouldn't be impacted much. And now we see that the noise wasn't causing the rise in the green curve at higher sample sizes; it was obscuring it. It's probably fair to say that ES Sv% does appear to be a better predictor than overall Sv% over large sample sizes (100+ games).

So the overall picture then is that with small sample sizes you want to include all available data, but with large sample sizes you want to focus on the most relevant data. Tom Awad showed that overall save percentage will give the best outcomes if you are using only a single year to make your predictions. Up to about 100-150 games of career numbers, overall save percentage and even strength save percentage perform similarly. And in the long run, after 150+ games, even strength save percentage is the better predictor of a goalie's future success.


Star_divide_medium

Statistical post-script

I've chosen to look at the overall Sv% (rather than ES Sv%) as the measure of future performance because I think that's what we're trying to maximize when we pick a goalie -- ES Sv% comes into the conversation because we think it might be a better input, not because we think it's a more important output of the prediction. However, this could conceivably inflate the predictive power of overall Sv%; if a goalie is on a team that consistently takes a lot of penalties, his overall Sv% might be consistently lower than ES Sv% would predict.

As a check, I also looked at how ES Sv% and overall Sv% do when predicting future ES Sv%, and the results were almost exactly the same as when predicting future overall Sv%. So I'm not worried about this as a possible confounding factor. Here's the analogous plot to the first one above:

Predicting_save_percentages_2_medium

Comment 18 comments  |  Add comment  |  2 recs  | 

Do you like this story?

Comments

Display:

Wow, the difference between the filtered and unfiltered results is amazing. That’s awesome.

Lightning strikes once, Hextall strikes twice!
"I think there is virtue in pissing off idiots." - Fehr and Balanced

by hintzy64 on Jan 25, 2012 12:23 PM EST reply actions  

I kinda

want to see the results of the data that was filtered out, by itself…

A lot of things make sense now.

by Sparki on Jan 25, 2012 3:50 PM EST up reply actions  

Even I

thought about that bump at around 200 starts. My immediate thought was, “we need a significant number of goalies with 600 starts!! It’s weather vs. climate!”

Then Eric put in a clever filter and all was right with the world once again.

I also assume that including PPSV% is completely useless, due to an extreme dearth of shots, perhaps until about 300 starts.

by Georgia_Flyer on Jan 25, 2012 12:31 PM EST reply actions  

In overall save percentage, I included both PK and PP. I doubt the PP matters much, and that way we’re evaluating the widely-available Sv% numbers.

by Eric T. on Jan 25, 2012 1:03 PM EST up reply actions  

Is this the stat that shows that we don’t need to worry about Bryz sitting on the bench for the next 8 years?

by thasmin on Jan 25, 2012 1:43 PM EST reply actions  

Yes. It’s also the stat that shows that we could have had the same piece of mind for just a few pennies less.

Driving Play - The Blog with Three First Lines

by Chase W on Jan 25, 2012 4:38 PM EST up reply actions  

Regression

Another fascinating post :-)

Two questions:

1) Have you, or anyone else, ever tried regressing future save pct. on past ES, PP, PK? Some adjustments would have to be made for repeated sampling on the same goalies (clustering, probably), and some other covariates would probably help (years in the league, etc.), but I think it would be interesting to see.

2) Is there evidence of score effects on sv%, like there seem to be on corsi/fenwick?

by wndowd on Jan 25, 2012 3:38 PM EST reply actions  

1) I haven’t done anything more complicated than calculate the single-variable correlations given here. I tend to shy away from deeper statistical analysis because I’m not an expert and I know there are a lot of ways to screw it up.

2) The article that comes to mind is http://www.arcticicehockey.com/2011/3/14/2041124/how-does-the-defensive-shell-work which saw a slight decrease for the trailing team, from about 7.0% when tied to about 6.5% when down by one late (I presume those numbers include missed shots).

by Eric T. on Jan 25, 2012 3:48 PM EST up reply actions  

They do, and you also have this link: http://www.arcticicehockey.com/2009/10/29/1105149/shooting-percentage-by-game-state, which shows beyond just the 1-goal state.

Blueshirt Banter - Where Rangers' Fans Matter
Tracking the Rangers - Numbers don't lie. They just don't agree with you.
Twitter: RangerSmurf
"Oh, that sensible and sober* Rangers fan guy who is cool, actually" - Dominik, Lighthouse Hockey
*Statement has not been verified nor regressed

by George E. Ays on Jan 25, 2012 4:28 PM EST up reply actions  

This

GMAT verbal section question, Philadelphia sports version.
In 2015, which one of the following will prove to be a better investment?
(a) Ilya Bryzgalov's contract (b) Ryan Howard's extension (c) Mike Vick's extension (d) Greek bonds from 2009 (e) Papelbon's bloat deal

by Bud in TN on Jan 25, 2012 9:27 PM EST up reply actions  

So the overall picture then is that with small sample sizes you want to include all available data, but with large sample sizes you want to focus on the most relevant data.

Stratify!

GMAT verbal section question, Philadelphia sports version.
In 2015, which one of the following will prove to be a better investment?
(a) Ilya Bryzgalov's contract (b) Ryan Howard's extension (c) Mike Vick's extension (d) Greek bonds from 2009 (e) Papelbon's bloat deal

by Bud in TN on Jan 25, 2012 9:28 PM EST reply actions  

Nice work.

I was about to point out that overall would have PK frequency built in and, boom you went over that in the postscript.

Driving Play - The Blog with Three First Lines

by JaredL on Jan 25, 2012 10:05 PM EST reply actions  

I really enjoy this guys articles I’m into stats myself

by StevenKerwood on Jan 27, 2012 1:20 PM EST via iPhone app reply actions  

Late addition, courtesy of Twitter. Matt Fenwick’s comment led to Tyler Dellow taking a look back in ’08 at how an extremely high PK Sv% in one year can predict a decline of overall Sv% in the following year.

by Eric T. on Jan 27, 2012 3:48 PM EST reply actions  

It would have been a bit disconcerting if Total Save% failed to best quantify the situation. Power plays are a big component of the game and taking them out of consideration as was previously done feels like, well like leaving out flyballs in quantifying pitcher, or walks. It’s not the best analogy, there’s probably a better one with UZR , I just can’t think of one.

by j reed on Jan 28, 2012 9:14 AM EST reply actions  

It’s not that PK Sv% were taken out completely, it’s that since PK Sv% fluctuates wildly from year to year, it was better to weight it less. But like anything, if you get enough of a sample size on PK Sv% to get a baseline, it’s useful.

In other words, if you looked at Ryan Miller’s Vezina-winning year, you’ll see a 0.919 PK Sv% helping him get a 0.929 Sv%. We see the 0.929, say that’s very high, spot the 0.919 and say “that’s not indicative of his skill, so don’t evaluate him on it.”

Man-crushin' on Boucher since 1999 and Matt Calvert since May 2010
Broad Street Hockey - Makin' it look mean since 1967.
SB Nation Philly - Associate Editor

by Geoff Detweiler on Jan 29, 2012 12:44 AM EST up reply actions  


User Tools

All the Philadelphia Flyers news and commentary that's fit to print.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Flyers-orange-crush_small
NHL Draft 2012: Options on defense in the first round

Recent FanPosts

Patal_small
Andrew Johnston Scouting Report: A first-hand look at the Flyers newest prospect
Small
What being a Hockey fan means to me.
Small
Could Parise and Weber be in Flyers' future?
Mick_jagr_2_small
SB Nation app
Small
Hockey Stick Help
Copy_of_137494800_slide_small
The 2011-12 Philadelphia Flyers season in GIFs
37938_10150235117290484_539355483_13709206_6888144_n_small
Ilya Bryzgalov has chance to take shot at Flyers fans, does
Small
Can the Flyers win the Cup with Bryz?
Carcillo_small
Flyers in the Off-Season

+ New FanPost All FanPosts >


Managing Editor

Screen_shot_2012-01-09_at_12 Travis Hughes

Associate Editors

67865_878600804923_14200876_46395212_2220_n_small Geoff Detweiler

Headshot2_film_grain_small Ben Rothenberg

Soccer_face_small Eric T.

Contributors

163830_478172269164_824914164_5517468_4313370_n_small ToddtheFox

Clarke-tee_small KreiderDesigns

D150_small Teemu H