clock menu more-arrow no yes mobile

Filed under:

Statsketball: A New Age of Statistics

Exploring the implications of Kirk Goldsberry and Harvard Statistics' "Expected Play Value."

David Richard-US PRESSWIRE

Until three days ago, my plan for this weeks' Statsketball was to update my playoff projections using a correspondingly updated model. Instead, you'll be seeing that article next week.

Everyone. We need to talk.

The game done changed.

Three days before the time of this writing, Kirk Goldsberry published this piece on Grantland, detailing the development and use - by Goldsberry and two Harvard PhD students in spacial statistics - of a new stat: Expected Possession Value, or EPV.

This is the point in the article where I would usually say, "this concept is fairly simple," but, honestly, the concept here is not fairly simple, which is why Goldsberry uses a several thousand word article to explain it. I will sum up Goldsberry's article, as well the statistic, but it will probably behoove you to read the original piece before reading this any further.

So, the SportsVU cameras that have been installed in every arena track each player's position on the court 25 times every second. These cameras aren't like film, cameras, though. They keep data every time they track a player - data that includes how fast a player runs, how quickly a player rotates, how far players tend to stay relative to other players, where players tend to drift relative to the basket. All of these positions are spat out as thousands and thousands of numbers that all can only be made sense out of with a supercomputer. But, thanks to the cameras, we know how each player tends to move.

We know, too, roughly, the expected value of almost any shot in the game., for example, has the expected value of every single shot on the court from every single player. By that, I mean it has every shot, not just shots broken into zones (though it only has data from last year). On top of that, we also know how inclined some players are to pass, and where they are inclined to pass to, thanks to both assist % stats (which tell you how many assists a player gets while on the court) as well as new SportsVU data and data trackers like those at, who kept track of the most frequent recipients of assists from each player.

All of this gave Goldsberry et. al. an idea: if we know where each player is on the court at all calculable moments, we know how likely each player is to pass, and we know the expected value of almost every shot, why can't we use the SportsVU cameras to calculate the likely expected value of each individual possession on the court? Or, why can't we estimate how many points a team is likely to score by fielding "X" lineup with "Y" formation that moves in "Z" manner?

As an example, if the Mavericks trotted out a lineup of Jose-Monta-Marion-Dirk-Dalembert, and then they ran a Horns set into a backscreen curl for Calderon, could we get an estimate of the expected value of that possession, based on who gets open, the passing lanes available, who's passing, who's likely to shoot, and so on and so forth?

The answer, as it turns out: yes, we absolutely can do that. We can, and Goldsberry and his team already have. Not only that, but the data gets better, because it updates that expected value calculation at least for every tenth of a second on the court, yielding a new expected value for each step of each player.

Before I start talking about the implications of that kind of data, I'd like to make a few important points:

1) This is absolutely not a predictive model. The point of EPV is not to predict how well a team will do, or how much it will actually score on any given play. The idea is not to say, "well, they're using a Vince-Wright pick and roll, so go ahead and throw up 1.5 points on the board." It's a descriptive model, in that it tells us how effective "X" lineup is at doing "Y", and at what points, and under what conditions, that play actually becomes effective.

Finding the EPV of a player or a game is remarkably similar to finding the points per 100 possessions. In fact, if you were to find the average EPV for a whole game, that's exactly what you'd be finding: the points per possession for that game. The difference between the two, and the thing that makes EPV so valuable, is that EPV gives us the ability to understand how efficiency changes as a play progresses. EPV isn't an attempt to say, in one stat, how good a team or player is. It is, however, an attempt to understand the impact of a team's geometry and playcalling in terms of offensive efficiency, and how that all changes over time, which we have never understood very well before, quantitatively.

2) This is going to be the same note as note #1, because I cannot stress it enough: yes, there are things this stat doesn't do, including: indicate how good a player is, indicate how good a team is, account for every single scenario when it calculates averages, predict wins, and so on and so forth. What it can do, and do very well, is give us an idea of the team's offensive efficiency, and why that efficiency is what it is, in terms of team geometry, how that efficiency changes over even a tenth of a second, and how it changes over the course of a play. It isn't designed to do anything else. Don't try and make it do anything else.

3) This isn't a stat that's great for describing individual players, I imagine, because it's a stat whose utility is primarily derived from its ability to account for every player at once. Simplifying that data into a single player seems to, to a degree, defeat the purpose. However, as Goldsberry points out, it has a lot of utility with guards because it gives us a better idea of the impact of those players who can create for others, which we've never seen before, but I'll write more about that later. The real point here, is that when we use EPV to describe an individual, it's useful to only a very limited extent with guards and shooters, but it doesn't tell us as much when it comes to big men and post players. Once again, that's fine. Ultimately, that's not its design.

4) If you were, at any point, inclined to say, "how is this useful? Can't we just watch the game to learn this stuff?" then I'm just going to tell you, right now, to follow this link and read my explanation about why we can't trust the eye test for a lot of basketball analysis. I'm not gonna go there, because I already have, and because it's well understood at this point. Just because we have a major new milestone in basketball statistics doesn't mean we have to re-hash the "eye test" argument all over again.

Ok, with that out of the way, I want to talk about how this stat has the potential to move us forward to a completely new era of statistical analysis in Basketball.

There are two things, mainly, that we've never understood very well about the game of basketball; or, at least, that we've never been able to quantify well enough to justify our assertions. The first is defense, and what exactly makes a good defense and how defensive players have to move in order to be effective in a team scheme. The second is "creation," and what exactly makes a good "shot creator" and how valuable those players actually are.

To that last point: basketball has had a cult of the shot creator that has suffered some blows over the last few years. Five years ago, no one questioned that Rudy Gay and Monta Ellis were a net benefit to a team, and that Kobe, the king of hard shots, was the best player in the NBA. Now, we've moved towards appreciating efficiency but we've never really been able to answer the question: do shot creators, even low efficiency ones, add or take away value, and if so who, why and how? Is there a value to being able to manufacture a shot out of nowhere that's slightly better than a hard shot, but still not as good as a wide open one?

For both things, though, we have some idea of how they work. Most analysts know how to read an NBA defense and can tell you what makes a good one and what doesn't (as in Zach Lowe's most recent piece); though the value of "shot creation" is still hotly debated.

There's still a difference between "having a pretty good idea how they work" and being able to prove it, however. Goldsberry's new statistic gives us that power.

How, though?

Well, the issue with both things has been that the value of defenders and their actions and the value of shot creators is dependent entirely upon space and the usage thereof. The problem, correspondingly, is that the stats that we have that describe defense describe only the results of the plays: how many points Devin Harris allow? How many points do the Mavs allow per 100 possessions?

EPV, however, describes the process of defending. You could use it to go through a game log, for example, and see that, at the precise moment of a pick, the Mavs are allowing "X" EPV. Knowing that, we can ask, "how are they defending the pick and roll?" What's the EPV at that moment, and why?

Where Points Allowed per 100 possessions give us an indication of how successful a defensive possession was, EPV-allowed can give an indication of why those possessions were or weren't successful, because we've finally been given a statistic that accounts for the position of the players at all times. Up until now, the best we've been able to do is to look at film to try and give our best explanation for why teams were succeeding or not succeeding on defense. Now, however, we are near access to data that allows us to prove (or disprove) what film room specialists theorize.

The same movement from instinct to theory can now be used to find the value of shot creators. After all, the value of a "guy who can create his own shot" is inherently the difference between the odds of success during a broken play and the odds of success the moment before said "shot creator" shoots or passes. The shot creator theoretically has the ability to raise the odds of success for scoring, meaningfully, on his own.

With EPV, we can finally calculate that value in easy to understand ways. We can look, first, at the EPV of a play as it enters the shooter's hands, after a play has broken down Then, after the player has "created space" or "created an opportunity" we can find the value of his shot creation. Eventually, we can find the average value of shot creators on the whole.

We can take that even further, too. There's the theory that the presence of shot creators (Rudy Gay when he was in Memphis is a great example of this), creates a built in safety valve that encourages broken plays to happen. Once again, with EPV, we can look at the volume with which plays break down with the "shot creator" in the lineup, versus the rate at which plays break down without such a player, and determine whether or not this is accurate.

But, of course, calling the dawn of EPV a "new age of basketball statistics" doesn't make sense if all it does is illuminate us a bit about defense and shot creation. It does far more than that.

First thing's first, this stat is a triumph of database management and number crunching in a way that has implications that go far beyond the basketball world. Goldsberry's article has gotten press and attention from people in every major computer science field, and if they can really make this experiment work, then it's going to be a breakthrough for far more than just basketball. Its potential in terms of the processing efficiency in all sorts of fields honestly makes the basketball findings semi-trivial.

But, that aside, quantifying defense and shot creation are more or less the last frontiers of basic, descriptive NBA statistics. They may seem like small things, especially because we have a decent grasp on both as it is, but this is incredible in part because if we can finally put these pieces together, then we've done it. We've quantified the basic stuff. Every basic part of the game can be given a number to describe it.

That last bit isn't a bad thing either: having these stats will never replace watching the games, and that's not what I mean. It's exciting, mainly, because we will finally have a number to describe everything we see. Every moment that we watch that is incredible and frantic and amazing will have a number ascribed to it that we can point to in order to say, "hey, look how freakin amazing this was!"

But, correspondingly, from then on, all the statistics we do will become much more complicated. Most of the relevant calculations for figuring out what happens on a basketball court stop being a simple matter of adjusting points for pace - division and multiplication, mostly - and start being about correlation regressions and normal distributions and econometrics.

When Goldsberry says that "the days of the armchair analyst might be numbered," he's saying that the average analyst doesn't have access to the type of Harvard supercomputer that can make sense of EPV data. Still, I wonder to what degree he might be right in the sense that his stat means the end of basic, descriptive analytics - paving the road for much harder predictive data and projections which have a much higher barrier to entry, and to basic comprehension.

That, though, is what we're looking at. The development of EPV into what it can ultimately be, and then onto the next era - is something completely new in the world of basketball. That, alone, is worth being excited about.