The analytics movement in basketball has been gaining steam over the last couple years, but Grantland's Zach Lowe recently has said that, "The debate on this stuff is over. Math has won." Sometimes it seems like Lowe is the arbiter on these kinds of judgements, and it's hard not to feel like his proclamation carries a weight of legitimacy that someone else might not have.
Nonetheless, I feel like a lot of the debate that I see around "advanced stats" and their usage revolves around a fundamental misunderstanding of what exactly these stats are, what they aim to find, and why they're useful in the first place (and I see this misunderstanding from both sides, too). So, since this is a column whose intent is to use these stats to their utmost utility, I thought it might behoove myself and my readership to include a one-column-have-it-all primer on what these advanced stats are and why we care.
For what it's worth, this particular column is intended to be used as a long term resource just as much as it is intended to be a one-read article.
So, why do we need advanced stats? What, exactly, do stats tell us that watching a game can't? Why do we keep spouting out numbers about a sport that we can just turn on our TV's and watch?
The answer to this question is somewhat dependent on why you watch basketball. Basketball, remember, is a fundamentally meaningless way to have fun, and if you're enjoying the sport purely as a spectator and you don't find that statistics add to the enjoyment of the game, then really, you don't need stats at all. Leave them alone and go watch, it's fine, there's absolutely nothing wrong with that.
If you want to make an argument about how you think basketball works, though -- no matter who you are or what you're trying to argue -- the dynamic between the game and stats changes, and you're going to need to know your way around metrics to make a compelling argument.
The problem is, if you're gonna try and figure out how the game of basketball works objectively (the way that analysts try to), your eyes are going to lie to you. You're going to miss things. This is scientifically proven. No matter how much of a master you think you are at following the game, your brain is hard-wired to cling onto certain things you see that aren't indicative of long-term play, and miss other things that are. We're set up to, instinctively, be wrong.
Psychologists were onto the "advanced metrics" debate long before the basketball world was, the debate just came in a different form. Basically, way back in the late 1500's, it occurred to Sir Francis Bacon that, maybe, if we're ever going to claim that we understand something scientifically, we should probably come up with a rigorous system of testing our observations to confirm or refute the ideas we get from observing the natural world. Thus, the Scientific Method was born, and it's pretty much the same now as it was then: get an idea, isolate only the elements you need to test, test it, see if the idea holds up.
Somewhere around the early 1970's, psychologists started to wonder: "well, hey, the Scientific Method is great, and it's really the only way to prove our ideas, but why, exactly, do we need it in the first place? Why is it that our observations aren't enough? Why are we wrong so often?" What they came up with is the scientific delineation of cognitive bias; we have a bunch of natural and constant tendencies that make us often irrational and bad at making judgements.
It's those biases that make scientific rigor necessary to prove hypotheses, and it's those biases that makes an objective understanding of really any facet of basketball impossible from just watching.
For example: one of the most dangerous biases to our understanding of basketball is Confirmation Bias -- the tendency to seek out only evidence that proves our ideas, and to ignore the evidence that may disprove those ideas.
So if a player (like, say, Jae Crowder) gets drafted under the impression that he has a certain skillset (shooting, rebounding), someone who's just watching the game will probably latch onto the made 3's and offensive rebounds that they see, and use this to confirm the idea that Jae is a solid shooter and rebounder, when in reality, he's fairly bad at both. It's a part of why it often takes a really long time to shake reputations that guys have coming into the league.
It's not worth going into all the different biases that affect, mostly negatively, our ability to draw valid conclusions from watching games, because this article would be, roughly, infinitely long. Which should also tell you why you're likely to miss stuff pretty often.
Stats, then, are basketball analysts' way of testing our hypotheses to make sure that we're not being blinded by any of this bias. A corollary to this, too, is that stats should never, ever be used before, or in-lieu-of, watching the game, but always as a defense or refutation of what we see already. And, for what it's worth, I don't believe that I've ever seen stats used as a replacement for watching the game, but rather, I've seen them as a resource for better understanding what we see.
It's also important to remember what exactly these stats are. So often, it feels like I see stats discussed -- on both sides -- as some weird, amorphous way of seeing basketball differently that's either a confusing scourge on an otherwise pure, gorgeous sport, or a brilliant, door opening way into seeing a confusing sport for more of what it actually is.
Advanced Stats are simply numbers that do neither of these things by themselves, and that generally are not particularly complicated. Essentially, a group of people who are good at math and also big fans of basketball gradually realized over the last few years that if they were going to continue and try and prove things that are happening in the sport analytically, they needed better proof than what was in the box score. What they came up with are new sets of numbers that are better at breaking down what is actually happening on the basketball court, moment to moment.
Another thing to remember is that none of these numbers are ever "wrong." For the numbers to be "wrong," math as a construct would have to be invalid, and I promise, it isn't. Every single number we use does accurately reflect what that stat is supposed to measure. However, some stats, and their corresponding measurements, can be (and often are), interpreted poorly.
For example, when the Mavericks have had Shawn Marion on the court this season, mathematically and without question, the Mavs have been worse. This is a true statement. It would be a poor interpretation, though, to say that Marion is a drain on the Mavs. A more accurate interpretation might be that he's been in some really awful lineups, and that it's probably in Dallas' best interest to stop playing him at Power Forward.
All of that in mind, it's probably best to now talk about and explain the major different advanced stats that you'll see both in this column and in statistically minded columns all throughout the blogosphere. Hopefully, if you ever need a reminder about a certain type of statistic, you can refer back to this column as a refresher.
For consistencies sake, I've broken the current metrics down into three major categories: Pace Adjusted Stats, Aggregate Stats, and Advanced Stats.
Pace Adjusted Stats
These were really the first major advanced stats, along with John Hollinger's PER, and they're pretty much the most useful by a wide margin. They're by far the most widely used as well.
The idea behind Pace Adjusted Stats is simple: normal measures of offensive and defensive efficacy (points per game and points allowed per game) were not good enough because, simply, some teams play faster than others. Meaning that some teams have more possessions in a game than others, which means that some players have more chances to score, assist, rebound, etc. than others get. After all, most of the teams that lead the league in points per game aren't necessarily the best, but they are the fastest.
What Pace Adjusted Stats do is they give us a way to measure how good an offense or defense is by taking speed of play out of the equation. The solution became Points per 100 Possessions and Points Allowed per 100 possessions (or Points per Possession, depending on what you prefer).
Instead of taking the average points scored in a game, we can take the average points scored every time a team takes the ball up or down the court. Same goes for Rebounds, Assists, Steals, Turnovers, and Blocks. All of those metrics are normally influenced by speed of the game, and the Pace Adjusted versions give us a better measure of how well a player assists, steals, and blocks the ball on every play.
- Team Points Per Possession (Synergy Sports): The average number of points that a team scores, not per game, but per each possession played. This measure attempts to take pace out of consideration in order to give the ability to directly compare different team's offenses, without having to worry about how fast they play. Points Per Game is found by dividing Total Points by Games Played. Points per Possessions is found by Dividing Total Points by Possessions Played (or, less accurately but more easily found, by dividing Points Per Game by Possessions per Game).
- Team Points Per 100 Possessions (OffRtg or ORtg; NBA.com, Basketball-Reference.com): The exact same thing as Points Per Possession, but arbitrarily bigger because big numbers are more fun and decimals are yucky. To find this, you multiply Points Per Possession by 100. An average offense scores about 103-104 points per 100 possessions. 115 is unbelievable, and below 100 is pretty bad.
NBA.com also includes Points per 100 Possessions that a team scores while Player X or Lineup Y is on the court. This is a good measure of an individual's contribution to team offense.
- Team Points Allowed Per 100 Possessions (DefRtg or DRtg; NBA.com, Basketball-Reference.com): How many points that a team typically allows, pace adjusted for possessions instead of "per game." Best measure of a defense, the same way that Points per 100 Possessions is the best measure of offense. Below 95 is unbelievably good, below 100 is solid, 102 is average, and 105 or above is awful.
NBA.com also includes the Points per 100 Possessions that a team allows while Player X or Lineup Y is on the court. This is a good measure of an individual's contribution to team defense.
- Player Points per 100 Possessions (ORtg; Basketball-Reference.com): This is an attempt to see how well a player scores on his own, which is more difficult to calculate. Instead of Points Per Game, it's more like the "average number of points scored over the course of 100 possessions where this player tried to score." It's imperfect, but a good measure nonetheless. 100 is bad, 105 is pretty average, 115 is great, 120 or above is elite.
- Player Points Allowed Per 100 Possessions (DRtg; Basketball-Reference.com): Same as Player Points per 100 Possessions, but a defensive measure. Closer to "average number of points that opponents scored when they tried to score on this player." 110 or above is really bad, 104 is average, 100 is solid, and 95 or below is elite.
- Assist Ratio (NBA.com): Same as Points per 100 Possessions, except with assists. How many Assists a team or player accumulates, on average, over the course of 100 possessions played.
- Turnover Ratio (NBA.com; TOV%, Basketball-Reference): Turnovers per 100 possessions.
- Assist Percentage (AST%; NBA.com, Basketball-Reference): This is the start of a different way to measure contributions than per 100 possessions. Instead of measuring assist per possession, Assist Percentage finds the portion of assists a player contributed out of the total assists that the team had. Put another way, AST% is the percentage of assists that a player made out of all the assists that the team had while that player was on the court.
- Rebound Percentage (TREB%, NBA.com, Basketball-Reference): Same thing as assist percentage, but with rebounding. It's the portion of rebounds that a player collects out the total rebounds that are available while that player is on the court. This can also be broken into Defensive Rebounding Percentage (DREB%) and Offensive Rebounding Percentage (OREB%).
- Steal Percentage (STL%, Basketball-Reference): Steals per 100 possessions.
- Block Percentage (BLK%, Basketball-Reference): Blocks per 100 possessions.
- Usage Rate (USG%, NBA.com, Basketball-Reference): Out of 100 possessions, the number of possessions that an individual will end with a shot, assist, or turnover, while on the court. Put another way, this is the percentage of a team's plays that a player uses while on the court.
Aggregate Stats
Along with Pace Adjusted Stats, Aggregate Stats are generally pretty simple measures. The idea, here, is that these are stats that linearly constructed out of other statistics to give you one, single holistic measure of something like on-court contribution or shooting percentage.
Sometimes, too, these stats need to come with a caveat: eFG% and TS% are just fine by themselves, but things like PER and NetRtg both attempt to say, with one number, how good a player is, and neither of them totally succeed. These measures are all really valuable, but no one should look at a PER and just think "welp, that's it, that's a player's value in one number," because most of these metrics have pretty glaring flaws.
Nontheless, they're simple enough to still be transparent: PER may not be a perfect reflection of a player's contribution as it might try to be, but it is a great reflection of how integral a player is to a team's offense -- a different, but still highly valuable, measurement.
It's also probably worth noting that there are lots more of these. PER-like aggregate stats pop up all over the place, but these are the ones that a) have found more or less universal acceptance and b) I'm likely to use.
- Effective Field Goal Percentage (eFG%; NBA.com, B-Ref): It's always been hard to get a good, single stat to measure how good a player is at shooting. Both eFG% and TS% try to account for this by weighting the fact that 3 pointers are worth more than 2 pointers. eFG% is a single shooting percentage that calculates a players' FG% as if a 3-pointer was 1.5 shots, since a 3-pointer is worth one and a half 2's (though the player's attempted shots remains the same). So if, for example, Dirk made 6 shots out of 10 (60%) in a game, but 2 of those were 3-pointers, his eFG% would be 7 makes out of 10 shots, or 70%.
- True Shooting Percentage (TS%; NBA.com, B-ref): TS% is very similar to eFG%, though True Shooting % also tries to account for the value of free throws as well as 3-pointers. The formula is a bit more complicated, though its functionality, and even the philosophy behind its execution, is the same as in eFG%. Its formula is (Points scored) / (2(Field Goal Attempts + (0.44)(Free Throw Attempts)). The idea is that "points" will account for the extra value of Free Throws and 3-pointers (as opposed to Field Goals Made). The Field Goal Attempts and Free Throw Attempts are multiplied by 2 because a shot is worth two points, and Free Throw Attempts are weighted by 0.44 because (after some fairly complicated regression) it turns out that a Free Throw is worth approximately 0.44 of a single attempt in eFG%.
- Expected Points Per Shot (XPPS; Hickory-High.com): XPPS was developed by the head of Hickory-High -- and Mavs Outsider Report contributor -- Ian Levy. XPPS is ingeniously simple: every major shot (in the restricted area, in the midrange, etc) has an average shooting percentage for the entire league from that location. By extension, then, every major shot has an expected value, or, how many points you're likely to get, on average, from any given shot (found by multiplying the average shooting percentage and shot value). What Ian does is he takes all those expected values, sees how many shots a team or player takes in each location, and calculates the expected number of points a team will get per shot.
For example, here's a graph of each team's Actual Point per Shot (Points divided by Attempts) over an axis of XPPS to see how each team is doing in terms of shot selection, and in terms of how well they shoot relative to their shot selection:
- Player Efficiency Rating (PER; B-Ref): PER is in probably the poster boy of advanced stats: it's by far the most complicated of these basic measures, and it's the baby of advanced stats' poster boy, John Hollinger -- a basketball expert, actual statistician, and now the VP of the Memphis Grizzlies' Basketball Operations. PER is Hollinger's attempt at a one-number-tells-it-all stat; a stat that in one number boils down a player's entire contribution to a team. For the most part, PER is calculated by adding the different "per 100" stats (points, assists, rebounds, steals, blocks per 100 possessions) and subtracting turnovers per 100 possession; though, there are a few tweaks here and there.
PER is, for what it's worth, a very, very good measure of offensive efficiency and contribution, which shouldn't be much of a surprise, given that Hollinger has contributed scholarly research on the subject of per 100 stats and their value in terms of overall team success. What you may notice, however, is that very few of these stats involve defense. That's PER's big flaw. It's a great measure of offense, but just offense. It's a really awful indicator of defensive efficiency.
- Net Points Allowed per 100 possessions (NetRtg; NBA.com, B-Ref): This is a way to measure overall performance; it's simply Points Scored per 100 Possessions with Points Allowed per 100 Possessions taken away, to tell you how much better or worse the offense is than the defense. This tells you, roughly, how much better or worse a team or player is than the average team or person.
Advanced Stats
Despite the fact that every category prior to this one is ostensibly termed an "advanced stat," that's kind of a misnomer. The previous stats, as I've said, are mathematically quite simple, and are small adjustments to things we already know. True Advanced Stats are far more complicated. They usually involve fairly complicated statistical regressions, calculus, increasingly more impressive computer systems, or a combination of all of them.
Most of these are taken with lots of caveats, not because there's anything dubious about them, but because that kind of intensive math tends to find quite specific bits of information. For example, I can use career averages and recent streaks to calculate a probability for how well a player is likely to shoot over the course of another week, month, or year. But, it's only possible to find a probability, and it's only a range. There's a limit to what that can do, even if it opens new doors.
That said, no statistician finding this information will typically present it without a description of the kinds of caveats one should keep in mind when using this data.
I'm only including a few specific bits of data in here, but anything that uses probability distributions or regressions probably qualifies. Typically these kinds of analytics are not conducive to formulas, and so are hard to boil down here. If you're interested in more of these kinds of stats, you can look at some articles I've written using these stats before, as well as the Wages of Wins blog, among others.
- SportsVU Player Tracking Data (NBA.com): A lot of to-do has been made about the new SportsVU cameras that have been installed in every arena, and for good reason. There's a lot of potential in the SportsVU player tracking data to tell us new things we haven't even dreamt of seeing before, thanks to a set of motion tracking cameras that track each player's position on the court 25 times per second. Thanks to those cameras, NBA.com has available new data ranging from how far and fast each player travels each game, and how many potential rebounds a player grabs. The data available to us at the moment pales in comparison to what could potentially become available, and the probability is high that teams already have access to some of this pandora's box. If you'd like to know more about the cameras themselves you can look here, or if you'd like to read on the potential statistical landscape that the SportsVU cameras might promise, you can look here.
- Wins Produced (Wagesofwins.com): Win Shares and Wins Produced are aiming to calculate the same thing, but are actually calculated completely differently. Wins Produced is a Wages of Wins production based almost entirely off of the academic analysis of John Hollinger, correlating different per 100 stats and Wins, and seeing approximately how many Wins each major stat adds. For example, each three pointer made is worth approximately 4/999's of a win. They use this research to calculate approximately how many wins an individual player might be worth. For a more detailed explanation of this process, head here.
- Win Shares (Basketball-Reference): Win Shares has the same goal of Wins Produced: calculating the number of Wins an individual player is worth. Win Shares are calculated using a lot of the same research, as well, but the process is different, and it's mostly grounded on the seminal work of Dean Oliver in his must-read book for statheads Basketball on Paper. For all intents and purposes, it's a more accessible version of a "wins produced" stat. Like most of these tell-all stats, its major issue is accurately measuring defense, since there is still no good set of statistical measures for actual defensive impact beyond Points Allowed per 100 Possessions. If you'd like to dig more into Win Shares and their calculation, head here, and to Dean Oliver's book.
I apologize for the length, but I wanted a single primer to be able to refer to for the rest of this column's life.
Statsketball is a registered trademark.