The stat guys talk a lot about variance, luck, and regression to the mean. A lot of it doesn't mesh up with our intuition, and I wanted to meander through its implications a little bit.
I'll start with a true parable. A friend of mine is a high school math teacher, and every year when they started the unit on probability theory, he would give his students a homework assignment: flip a coin 100 times and write down the results.
Put yourself in the shoes of a tenth grader in his class. Are you really going to waste your time actually doing this? No, you're just going to write down some H's and T's and turn it in. You know it's random, so you're not stupid enough to just alternate HTHTHT. No, you're clever -- you mix it up a little, maybe something like HTTHTHHTTH.
The next day, my friend the teacher walks over to your desk, asks to see your homework, and announces that you cheated. He goes around the room checking the assignments and calling out the cheaters. How did he know?
It turns out that people's intuition about this kind of thing is just really really bad, so much so that a high school teacher can look at your homework and tell in three seconds whether you really flipped the coin or just pretended you did. When you filled it out, you knew not to just alternate heads and tails, but to include some two-in-a-row's. Maybe even a couple of three-in-a-row's.
What you didn't realize is that an actual coin flipped 100 times would (on average) give three streaks of at least six in a row. More often than not, it'll include a streak of at least eight in a row. There's no way you put that many heads in succession when you were trying to be random, because our minds -- unlike the coin -- have memory. That memory influences the outcomes -- after you've flipped four heads in a row in real life, you're still 50-50 to get another heads, but in your mental simulation you're very likely to put a tails down because you think this streak is getting too long and you're due for a tails.
Real coins are streakier than we think. Which brings me to the point of this article (finally).
NHL hockey, like coin flips, is a probabilistic exercise. There's luck involved. That doesn't mean teams are equally likely to win -- talent can tilt the odds in your favor, but on any given play, on any given night, there's luck all over the place. Sometimes the puck bounces over a defenseman's stick when he's about to get a much-needed clear. Sometimes a funny bounce off the boards comes out in the crease. Sometimes an assignment gets blown. Sometimes a goalie loses sight of the puck at a key moment.
If these things happen to you over and over, then you just aren't very good -- how often they happen to you in the long run, that's the talent portion of the deal. But whether, by chance, you happen to make the right plays tonight, or tomorrow, or the next night, or all three nights...that's the luck component.
We looked at this a couple of weeks ago, when someone published a comparison of what an actual player's results were to a simulated coin-flip player with the same career stats. What we found -- much like my friend the high school teacher -- was that a result that was completely in line with what you'd expect from a random process felt streaky to the average viewer, because we expect more consistency than random processes really give.
Let's compare the results of the actual Flyers, who are 39-15-6, with a simulated Flyers team that has a 39/60 chance of winning, 15/60 chance of losing in regulation, and 6/60 chance of losing in overtime on any given night.
Well look at that. The simulated random Flyers, who have exactly the same chance of winning on any given night, appear to be much more streaky than our actual Flyers. The SimFlyers had a terrible first 15 games, then fixed their system or shook up the lines or something and it really started to come together. They hit their stride at game 26 and won 11 straight and 19 of 20. But they got overconfident and the league figured them out and they stumbled...
No, my computer program did not get overconfident. Sometimes the simbounces just don't go your way. Don't try to impose psychology on the randomness, just understand it and try to see the underlying talent.
But how do we know what the underlying talent really is? How do we know we aren't just looking at a hot streak?
Well, that brings me back to my teacher friend. Whether they counted them or not, the students would typically end up with somewhere between 47 and 53 heads in their fake simulations. But not only are simulations streakier than you imagine, they also don't even out at the expected value as fast as you might imagine.
For example, since I didn't fix the point total in my simulation -- just the probability for any given game -- I had to run the simulation 38 times to get one where the SimFlyers ended up with the same 84 points that the real Flyers currently have. In those 38 runs, the SimFlyers ended up as high as 102 points and as low as 75 points, despite having the same 39-15-6 expected score as the real Flyers. So we never really know from looking at a small sample like this what the true talent is.
This is why we like to focus on the statistics where the sample size is larger. There aren't very many goals per game, so it takes a long time for the goals scored to level off at the true talent level. But there are a lot of shots taken in each game, so that evens out pretty quickly and gives a good sense of whether you should expect your team to control play and outshoot the opponents for the rest of the year. This is the Corsi number -- it's the difference between the number of shots your team takes and the number of shots the opponents take (whether they hit the net or not -- including missed shots helps get that sample size up faster!)
OK, back to our real Flyers. What do we know about their talent level? What can we expect from them going forwards? I'll look at three things:
Controlling play: The Flyers have the fifth-best Corsi in tied situations (we focus on tied situations because teams get desperate and shoot more when they're behind, which skews the numbers). The only Eastern Conference team ahead of them is New Jersey, who is ahead by just a hair. There have been plenty of shots to reach a true talent level, so we should expect the Flyers to continue to control the play more often than not -- though not always, of course.
Scoring goals: We expect the Flyers to keep winning the shot battle, but will it keep turning into goals? Remember, it takes longer for the goal scoring luck to even out -- it takes about five years to establish a player's true shooting talent. Assuming everyone's career numbers are their true talents, the Flyers have been a bit lucky so far this year -- they're at 10.2%, when you'd expect 9.8% based on their career numbers. That difference is worth about 8 extra goals, or maybe three points in the standings (goal differential of ~5-6 goals is worth about one win).
Saves: Again, it takes a couple of years to establish a goalie's true talent. This makes things tough, because our starting goalie only has a fraction of a season under his belt. We saw in yesterday's post that goalies who had similar rookie seasons ended up with a median even strength save percentage for their career of .924, so let's assume that's Bob's true talent, and we'll use Boosh's established .910 average. That would give the Flyers an expected ESS% of .918, while their actual ESS% has been .928, a difference of 14 goals, or maybe five points in the standings.
All told, the Flyers have probably been playing a bit over their heads so far. Take roughly eight points of luck away and they're still right in the mix for the #1 seed, but it's a dogfight. Fortunately, the puck won't remember that they got lucky earlier; those eight points are already in the bank and we're playing with house money from here on out. And what's left is a very small sample size, so it's completely possible that the team might play even better down the stretch.
But the takeaway message is that if the Flyers only score 10 or 12 points over the next ten games, it doesn't mean they're not trying hard enough or that the coach is screwing things up; these things happen. We've seen enough to know that our coin is a special one that should come up heads more than 50% of the time, but there'll be plenty of tails mixed in along the way.