inside the numbers: 2009

Thursday, May 14, 2009

crazy stat of the night

Check out last night's box score: Carmelo Anthony had 30 points on 13-22 shooting and zero free throw attempts. I'd be interested to know when the last time a player has scored 30 or more points with zero free throw attempts in the playoffs. If it has happened before I'm guessing there are very few occurences.

Thursday, May 7, 2009

playoff suspensions

Check out these videos and see if you can see any consistency in some of Stu Jackson's decisions for NBA playoff fouls.

Robert Horry body checks Steve Nash into the scorer's table

Punishment: suspended 2 games (Amare and Diaw were each suspended for leaving the bench area).

Rajon Rondo throws Hinrich into the scorer's table then tries to elbow him in the head

Punishment: nothing

Kenyon Martin shoves Dirk Nowitzki

Punishment: $25,000 fine

Dwight Howard elbows Samuel Dalembert in the head

Punishment: suspended one game

Kobe Bryant elbows Ron Artest somewhere between his chest and throat

Punishment: Essentially nothing (upgraded to a flagrant 1)

Derek Fisher doing an NFL hit on Luis Scola

Punishment: Suspended one game

Rafer Alston slaps Eddie House in the back of the head

Punishment: Suspended one game

Rajon Rondo slaps Brad Miller in the face

Punishment: nothing

And most recently, Kendrick Perkins elbows Mickael Pietrus in the throat

Punishment: nothing

For the record, the Celtics are 3 for 3 on questionable plays reviewed by the league in this post season.

I wanted to look at the numbers to see if having a player that is suspended helps or hurts a team's chances of winning. This year the Magic won Game 6 vs Philadelphia when Howard was suspended and last night they won Game 3 vs Boston when Alston was suspended. The Lakers also won last night without Fisher. The Spurs won Games 5 and 6 when Robert Horry was suspended (although in Game 5 Phoenix didn't have Amare or Diaw due to suspension). I also remember Phoenix winning a Game 5 over the Lakers in 2006 after Raja Bell choke-slammed Kobe in Game 4. I haven't been able to find historical suspension data in the playoffs to fully investigate but it appears that no suspension for Perkins may be a bad thing for Boston.

Wednesday, May 6, 2009

winning games 1 and 2

I was reading this post over at thirdqquartercollapse.com (an Orlando Magic blog) and thought a statistic that erivera7 quoted was worth some additional analysis.

[W]hen an NBA team nets a 2-0 series lead, the series victory probability is
93.5% (203-14).

This got me wondering if it matters whether the winner of Games 1 and 2 have home court advantage (meaning they won Games 1 and 2 at home) or not (meaning they won Games 1 and 2 on the road), especially since both the Magic and Rockets are in that situation. It seems that a team would be more likely to win the series if they won Games 1 and 2 on the road than a team that won Games 1 and 2 at home. If you win on the road you have 3 out of the next 5 possible games at home whereas if you win games 1 and 2 at home only 2 out of the next 5 are at home.

Using data from best-of-7 series going back to 1977 and excluding the NBA Finals, I got roughly the same overall statistic as erivera7. Out of 147 series where a team has led 2-0, that team has won 138 times or 93.88% of the time. However, the team with home court advantage won the series 128 out of 135 times (94.81%) when leading 2-0 while the team without home court advantage won 10 out of 12 times (83.33%). Interestingly, the team with home court advantage appears to have a better chance of winning a series when up 2-0 than a team without home court advantage that also leads 2-0 even though they would have more home games left. However, this is not statistically significant as it results in a p-value of 0.336. For a review on proportion tests you can look at my earlier post here or you can use a proportion test calculator found here. Since there are very few times in history when a road team has won Games 1 and 2 it is likely that the difference observed is due to random chance. So if Orlando and/or Houston win tonight there's no evidence that Boston or LA have any better chance of winning their series than Dallas has of beating Denver.

Friday, May 1, 2009

overtime madness

The most talked about matchup so far has been the Boston/Chicago series. Four out of six of their games have gone to overtime, one of those went to double overtime, and the game last night went to triple overtime. To put it another way, out of 13 times that the buzzer has sounded at the end of an overtime period or at the end of a 4th quarter, 7 times the score has been tied.

To emphasize just how unlikely a series is to have 4 overtime games out of the first 6 played, I estimated the probability of any one game going to overtime using this year's regular season data. Since I couldn't easily find the total number of games that went to overtime, I estimated it by using team data and looking at total minutes, subtracting at the total number of minutes in regulation, then dividing by 5 to see how many overtime periods were played in total. The minute statistic is subject to rounding error and I ended up getting 81.76 overtime periods played in the regular season. I use this exact number because without additional information there's no way to know if the true number is really 82, 81, or some other number close to that. Our estimate for the probability of a game going to overtime is then the number of overtime periods divided by the total number of games played, or 81.76/1230 = 0.06647. This is most likely a high estimate since it's assuming that every overtime game had one overtime period, which is almost certainly false but is still a very close estimate. So if any one game has a 6.647% chance of going to overtime, the chances of 4 out of 6 games in a row going to overtime can be calculated using this equation:

(n!/((k!)*(n-k)!))*(p^k)*(1-p)^(n-k)

where:

n=the number of games (6)

k=the number of games that go into overtime (4)

p=the probability any one game goes into overtime (0.06647)

After plugging in the values we get the result of 0.00025518 or 0.025518%. Basically 2.5 times out of every 10,000 series that go at least 6 games will have 4 overtime games in the first 6 games played. If you add in the possibility of having 5 or 6 overtime games the result is only slightly higher at 0.00026254. Also, we can calculate the chances of out of 13 possible times to close out a game that 7 of those times end up tied. The value for p will change slightly since we are including overtime periods, so instead of 81.76/1230 we use 81.76/(1230+81.76) = 0.06233. The value for n changes to 13 and the value for k changes to 7. In this scenario the probability ends up being 0.00000426282 or 0.000426282%. Essentially, just over 4 times out of every 10 million. Again, if we add in the possibility of having more than 7 ties out of 13 attempts, the probability increases slightly to 0.00000448341.

As a result, I don't think we're going to see a series like this ever again.

Tuesday, April 28, 2009

free throw trends over the years

My brother sent me a link to the New York Times Freakonomics blog (you know, the one with the logo of an apple that tastes like an orange, which is kind of odd because bananas might soon taste like apples) that has some new insight into an earlier New York Times article about how free throw shooting has a roughly constant, non-improving percentage over time. In both the NBA and NCAA, overall free throw percentages have remained fairly constant for several decades. Freakonomics adds that while overall percentages remain constant, the very best from each NBA season (20 players with the top free-throw percentages) have an upward trend. As is mentioned, this inevitably is reaching a plateau since you can’t shooter higher than 100%.

Both these trends are interesting and I think deserve a little more research. If the overall percentage is staying the same and the best are getting better does that mean the worst are getting worse? Or could this be a product of the NBA growing from 9 teams in 1960s to 30 today (more players could push the extremes out while keeping the same mean)? Or could this be a problem with redistributing the number of free throw attempts to the worse free throw shooters? I don’t know if anyone in the 60s instituted the hack-a-Wilt strategy (probably not since it doesn’t sound as good as hack-a-Shaq), but it’s clear today that intentionally fouling a bad free throw shooter is a commonly used tactic in certain situations. Also, if defenses are getting smarter or more informed they would probably be less likely to foul Ray Allen on the open break than if it were Rajon Rondo.

Provided I could get the free throw statistics for every player in every NBA season since the 1950s, I would look at the change in distribution of free throw attempts as well as looking at the average if giving equal weight to each player (with some minimum requirement of free throw attempts). Giving each player equal weight would create a form of standardizing the NBA seasons regardless of what type of free throw shooters are shooting the most.

Monday, April 27, 2009

updated series odds

Eventually I want this table to be on a sidebar, but until then I'll just keep updating it with new posts.

team	games won	% chance of winning series	frequency
CLE	4	100.0	33
DET	0	0.00	33
BOS	2	63.64	11
CHI	2	36.36	11
ORL	2	60.00	10
PHI	2	40.00	10
ATL	1	40.00	20
MIA	2	60.00	20
LAL	3	97.73	44
UTH	1	2.27	44
DEN	2	91.67	84
NOR	1	8.33	84
SAS	1	18.18	22
DAL	3	81.82	22
POR	1	18.18	22
HOU	3	81.82	22

The most influential game over the weekend was Miami winning Game 3 of their series. While the more difficult task may have been winning Game 2 in Atlanta, coming off that win and winning Game 3 bumped them up to a 60% chance of winning the series from 31.11% before that game. After a team without home court advantage wins Game 2 to tie the series they are likely to lose Game 3 (55.56% chance) even though Game 3 is on their home floor.

Other notes:

Orlando swings the series back in their favor with a win at Philadelphia

While New Orleans doesn't improve their odds a whole lot they keep themselves from essentially losing the series since no one has come back in a best-of-7 series in the NBA down 0-3.

Chicago's exciting win only increases their odds by 7.79%, although a loss would've been devastating. In 5 tries no team without home court advantage has come back down 1-3 after leading 1-0.

Not statistically interesting, but Cleveland is the first team onto the next round

Friday, April 24, 2009

playoff odds

As a follow up to my earlier post, I’m going to keep updating the odds of each team winning the series. (I also sent a brief email to Henry Abbott discussing the playoffs along these same lines, which he posted on TrueHoop earlier this week). In the table below I calculated the current odds based on historical data for all best-of-7 series going back to 1977, excluding the NBA finals because the format in that series switches from 2-2-1-1-1 (i.e. 2 home, 2 away, 1 home, 1 away, 1 home) to 2-3-2. My data goes back only to 1977 because prior to that year, the 7 game formats were not consistently the same as the current format.

team	games won	% chance of winning series	frequency
CLE	2	94.81	135
DET	0	5.19	135
BOS	2	71.43	14
CHI	1	28.57	14
ORL	1	45.83	48
PHI	1	54.17	48
ATL	1	68.89	45
MIA	1	31.11	45
LAL	2	91.67	84
UTH	1	8.33	84
DEN	2	94.81	135
NOR	0	5.19	135
SAS	1	31.25	32
DAL	2	68.75	32
POR	1	45.83	48
HOU	1	54.17	48

As we break down these numbers, there are a few interesting things to note. One is that order matters. Orlando, Portland and Atlanta all have home court advantage and all are tied 1-1 in their respective series. However, Orlando and Portland each have a 45.83% chance of winning the series while Atlanta has a 68.89% chance—the difference being that Atlanta lost Game 2 of their series while Orlando and Portland each lost Game 1. While it would seem logical that no matter how you got there, being tied 1-1 is the same, history suggests otherwise. We can also run a test of significance to see if the difference is more than just by random chance. To do this, we set up our test by setting up our null hypothesis, which states that a team with home court advantage that loses Game 1 and wins Game 2 is just as likely to win the series as a team with home court advantage that wins Game 1 and loses Game 2. The alternative hypothesis is that these two situations are not equally likely. We can write these hypotheses as follows:

H₀: P₁ = P₂
H₁: P₁ ≠ P₂
Where H₀ is our null hypothesis and H₁ is our alternative hypothesis. P₁ is the probability of a team winning the series that has home court advantage and has won Game 1 and lost Game 2. P₂ is the probability of a team winning the series that has home court advantage and has lost Game 1 and won Game 2. From our data table above, we can form an equation to calculate the probability of observing these values or more extreme ones based on the assumption that P₁=P₂. Or in other words, assuming that P₁ and P₂ have the same value we calculate the probability of observing a difference of 23.06% (68.89% - 45.83%) or more. This probability is called a p-value. In order to calculate this we use the equation for Z-score, which we can then convert to a probability.

Test statistic

Z = (p1-p2)/(SE)

Where:

SE = sqrt((p)*(1-p))*sqrt((n1+n2)/(n1n2))

And:

p=(n1p1+n2p2)/(n1+n2)

So in our case the values are as follows:

p1 = 0.6889
p2 = 0.4583
n₁ = 45
n₂ = 48

After plugging in the values we get a Z-score of 2.2447, giving us a p-value of 0.0248. In other words, there is a 2.48% chance of observing results with at least a difference of 0.2306 if P₁ and P₂ were equal. Since this is such a low percentage we can conclude that P₁ and P₂ are not equal. Therefore a team with home court advantage that wins Game 1 and loses Game 2 is more likely to win its series than a team that loses Game 1 and wins Game 2.

Thursday, April 23, 2009

lies, damned lies, and statistics

"There are three kinds of lies: lies, damned lies, and statistics."

This quotation, which Mark Twain borrowed from Benjamin Disraeli, is probably the most popular saying about statistics (I have no statistical evidence to back up this claim). However, the statistics themselves are not lies—it's the interpretation of them that sometimes are. Uncovering the lie can be tricky; it could be a data collection error, a poorly worded survey, placement of causation when only a correlation is observed, or just bad logic.

Yesterday, Mike Bianchi of the Orlando Sentinel wrote an article about the impossibility of a team with home court advantage winning a best-of-7 series after losing the first two games at home. This article was written before Game 2 of the Orlando/Philadelphia series and he was discussing the odds of Orlando being able to come back down 0-2 if they lost Game 2 (which they didn’t). Here’s his reasoning:

"Dating to 1947, there have been 378 seven-game series played in the NBA and only three teams have lost their first two playoff games at home and ended up winning the series. Translation: the Magic would have less than a 1 percent chance of winning the series should they lose tonight."

In this scenario, the statistic isn’t wrong, Mike Bianchi is wrong. It does measure something, just not what he says it does. Essentially, it is a rough estimate of the probability of a team with home court advantage losing Games 1 and 2 at home and winning the series. This is not the probability of a team with home court advantage winning the series that already lost Games 1 and 2 at home. In this case, instead of having the total number of series played as the denominator you have to use the total number of series where the team with home court advantage lose Games 1 and 2. Let’s call that number x, which is undoubtedly much smaller than 378. Therefore, 3/x is much larger than 3/378. The subtle distinction in phrasing creates a significant difference in numerical value.

To illustrate the point, I’ll use data that goes back 20 years (partly because I haven’t collected data that goes back to 1947 and a lot of the formats were significantly different than they are now; instead of 2 games at home then two away they would often trade every game). Out of 188 series played, there were two occurrences in which the team with home court advantage lost Games 1 and 2 at home and went on to win the series and eight occurrences where the team with home court advantage lost Games 1 and 2 at home and went on to lose the series. Applying Bianchi’s logic, had Orlando lost last night, they would’ve had a 1.06% (2/188) chance of winning the series. If we continue to use his logic to calculate Philadelphia’s odds of winning the series had they won last night, it comes to 4.26% (8/188). It now becomes clear that there’s an error in his logic since there’s 94.68% (100 - [4.26 + 1.06]) chance that no one wins the series in this scenario.

Bianchi’s error is that he uses a given event (that the team with home court advantage will lose Games 1 and 2) that is highly improbable and includes that low probability into his calculation. In the example above, rather than using 188 as the denominator, we would substitute it for the total number of series where the home team lost Games 1 and 2. So the actual historical value is 20% or 2/10, rather than the 1% chance Bianchi would give a team in this situation.