inside the numbers: April 2009

Tuesday, April 28, 2009

free throw trends over the years

My brother sent me a link to the New York Times Freakonomics blog (you know, the one with the logo of an apple that tastes like an orange, which is kind of odd because bananas might soon taste like apples) that has some new insight into an earlier New York Times article about how free throw shooting has a roughly constant, non-improving percentage over time. In both the NBA and NCAA, overall free throw percentages have remained fairly constant for several decades. Freakonomics adds that while overall percentages remain constant, the very best from each NBA season (20 players with the top free-throw percentages) have an upward trend. As is mentioned, this inevitably is reaching a plateau since you can’t shooter higher than 100%.

Both these trends are interesting and I think deserve a little more research. If the overall percentage is staying the same and the best are getting better does that mean the worst are getting worse? Or could this be a product of the NBA growing from 9 teams in 1960s to 30 today (more players could push the extremes out while keeping the same mean)? Or could this be a problem with redistributing the number of free throw attempts to the worse free throw shooters? I don’t know if anyone in the 60s instituted the hack-a-Wilt strategy (probably not since it doesn’t sound as good as hack-a-Shaq), but it’s clear today that intentionally fouling a bad free throw shooter is a commonly used tactic in certain situations. Also, if defenses are getting smarter or more informed they would probably be less likely to foul Ray Allen on the open break than if it were Rajon Rondo.

Provided I could get the free throw statistics for every player in every NBA season since the 1950s, I would look at the change in distribution of free throw attempts as well as looking at the average if giving equal weight to each player (with some minimum requirement of free throw attempts). Giving each player equal weight would create a form of standardizing the NBA seasons regardless of what type of free throw shooters are shooting the most.

Monday, April 27, 2009

updated series odds

Eventually I want this table to be on a sidebar, but until then I'll just keep updating it with new posts.

team	games won	% chance of winning series	frequency
CLE	4	100.0	33
DET	0	0.00	33
BOS	2	63.64	11
CHI	2	36.36	11
ORL	2	60.00	10
PHI	2	40.00	10
ATL	1	40.00	20
MIA	2	60.00	20
LAL	3	97.73	44
UTH	1	2.27	44
DEN	2	91.67	84
NOR	1	8.33	84
SAS	1	18.18	22
DAL	3	81.82	22
POR	1	18.18	22
HOU	3	81.82	22

The most influential game over the weekend was Miami winning Game 3 of their series. While the more difficult task may have been winning Game 2 in Atlanta, coming off that win and winning Game 3 bumped them up to a 60% chance of winning the series from 31.11% before that game. After a team without home court advantage wins Game 2 to tie the series they are likely to lose Game 3 (55.56% chance) even though Game 3 is on their home floor.

Other notes:

Orlando swings the series back in their favor with a win at Philadelphia

While New Orleans doesn't improve their odds a whole lot they keep themselves from essentially losing the series since no one has come back in a best-of-7 series in the NBA down 0-3.

Chicago's exciting win only increases their odds by 7.79%, although a loss would've been devastating. In 5 tries no team without home court advantage has come back down 1-3 after leading 1-0.

Not statistically interesting, but Cleveland is the first team onto the next round

Friday, April 24, 2009

playoff odds

As a follow up to my earlier post, I’m going to keep updating the odds of each team winning the series. (I also sent a brief email to Henry Abbott discussing the playoffs along these same lines, which he posted on TrueHoop earlier this week). In the table below I calculated the current odds based on historical data for all best-of-7 series going back to 1977, excluding the NBA finals because the format in that series switches from 2-2-1-1-1 (i.e. 2 home, 2 away, 1 home, 1 away, 1 home) to 2-3-2. My data goes back only to 1977 because prior to that year, the 7 game formats were not consistently the same as the current format.

team	games won	% chance of winning series	frequency
CLE	2	94.81	135
DET	0	5.19	135
BOS	2	71.43	14
CHI	1	28.57	14
ORL	1	45.83	48
PHI	1	54.17	48
ATL	1	68.89	45
MIA	1	31.11	45
LAL	2	91.67	84
UTH	1	8.33	84
DEN	2	94.81	135
NOR	0	5.19	135
SAS	1	31.25	32
DAL	2	68.75	32
POR	1	45.83	48
HOU	1	54.17	48

As we break down these numbers, there are a few interesting things to note. One is that order matters. Orlando, Portland and Atlanta all have home court advantage and all are tied 1-1 in their respective series. However, Orlando and Portland each have a 45.83% chance of winning the series while Atlanta has a 68.89% chance—the difference being that Atlanta lost Game 2 of their series while Orlando and Portland each lost Game 1. While it would seem logical that no matter how you got there, being tied 1-1 is the same, history suggests otherwise. We can also run a test of significance to see if the difference is more than just by random chance. To do this, we set up our test by setting up our null hypothesis, which states that a team with home court advantage that loses Game 1 and wins Game 2 is just as likely to win the series as a team with home court advantage that wins Game 1 and loses Game 2. The alternative hypothesis is that these two situations are not equally likely. We can write these hypotheses as follows:

H₀: P₁ = P₂
H₁: P₁ ≠ P₂
Where H₀ is our null hypothesis and H₁ is our alternative hypothesis. P₁ is the probability of a team winning the series that has home court advantage and has won Game 1 and lost Game 2. P₂ is the probability of a team winning the series that has home court advantage and has lost Game 1 and won Game 2. From our data table above, we can form an equation to calculate the probability of observing these values or more extreme ones based on the assumption that P₁=P₂. Or in other words, assuming that P₁ and P₂ have the same value we calculate the probability of observing a difference of 23.06% (68.89% - 45.83%) or more. This probability is called a p-value. In order to calculate this we use the equation for Z-score, which we can then convert to a probability.

Test statistic

Z = (p1-p2)/(SE)

Where:

SE = sqrt((p)*(1-p))*sqrt((n1+n2)/(n1n2))

And:

p=(n1p1+n2p2)/(n1+n2)

So in our case the values are as follows:

p1 = 0.6889
p2 = 0.4583
n₁ = 45
n₂ = 48

After plugging in the values we get a Z-score of 2.2447, giving us a p-value of 0.0248. In other words, there is a 2.48% chance of observing results with at least a difference of 0.2306 if P₁ and P₂ were equal. Since this is such a low percentage we can conclude that P₁ and P₂ are not equal. Therefore a team with home court advantage that wins Game 1 and loses Game 2 is more likely to win its series than a team that loses Game 1 and wins Game 2.

Thursday, April 23, 2009

lies, damned lies, and statistics

"There are three kinds of lies: lies, damned lies, and statistics."

This quotation, which Mark Twain borrowed from Benjamin Disraeli, is probably the most popular saying about statistics (I have no statistical evidence to back up this claim). However, the statistics themselves are not lies—it's the interpretation of them that sometimes are. Uncovering the lie can be tricky; it could be a data collection error, a poorly worded survey, placement of causation when only a correlation is observed, or just bad logic.

Yesterday, Mike Bianchi of the Orlando Sentinel wrote an article about the impossibility of a team with home court advantage winning a best-of-7 series after losing the first two games at home. This article was written before Game 2 of the Orlando/Philadelphia series and he was discussing the odds of Orlando being able to come back down 0-2 if they lost Game 2 (which they didn’t). Here’s his reasoning:

"Dating to 1947, there have been 378 seven-game series played in the NBA and only three teams have lost their first two playoff games at home and ended up winning the series. Translation: the Magic would have less than a 1 percent chance of winning the series should they lose tonight."

In this scenario, the statistic isn’t wrong, Mike Bianchi is wrong. It does measure something, just not what he says it does. Essentially, it is a rough estimate of the probability of a team with home court advantage losing Games 1 and 2 at home and winning the series. This is not the probability of a team with home court advantage winning the series that already lost Games 1 and 2 at home. In this case, instead of having the total number of series played as the denominator you have to use the total number of series where the team with home court advantage lose Games 1 and 2. Let’s call that number x, which is undoubtedly much smaller than 378. Therefore, 3/x is much larger than 3/378. The subtle distinction in phrasing creates a significant difference in numerical value.

To illustrate the point, I’ll use data that goes back 20 years (partly because I haven’t collected data that goes back to 1947 and a lot of the formats were significantly different than they are now; instead of 2 games at home then two away they would often trade every game). Out of 188 series played, there were two occurrences in which the team with home court advantage lost Games 1 and 2 at home and went on to win the series and eight occurrences where the team with home court advantage lost Games 1 and 2 at home and went on to lose the series. Applying Bianchi’s logic, had Orlando lost last night, they would’ve had a 1.06% (2/188) chance of winning the series. If we continue to use his logic to calculate Philadelphia’s odds of winning the series had they won last night, it comes to 4.26% (8/188). It now becomes clear that there’s an error in his logic since there’s 94.68% (100 - [4.26 + 1.06]) chance that no one wins the series in this scenario.

Bianchi’s error is that he uses a given event (that the team with home court advantage will lose Games 1 and 2) that is highly improbable and includes that low probability into his calculation. In the example above, rather than using 188 as the denominator, we would substitute it for the total number of series where the home team lost Games 1 and 2. So the actual historical value is 20% or 2/10, rather than the 1% chance Bianchi would give a team in this situation.