Monday, March 3, 2008

Why We're Here

The idea for this blog stems from an incredibly interesting map of the birthplaces of NBA players that I stumbled across here. This map appealed to two of my passions: the NBA, and amateur data analysis. I've composed, with the help of this map, the best starting five of NBA players possible from each state and constructed a March Madness-style single elimination playoff between those teams. On this blog, together we'll go through the tournament, matchup by matchup, each weekday from today until April 15th, to pick a "state champion." We'll do this with the help of basketball simulation software that I created for my senior research project in college. This introductory post describes how the teams and bracket were constructed; the following technical post will say a bit about the the simulator, and describe the way in which I see the blog unfolding; then this evening I'll present the first matchup, between Kansas and Washington D.C.

While there were 22 states that produced enough players for a full squad, some states can call no current NBA player their own. Apologies to Alaska, Arizona (somewhat surprisingly), Hawaii, Maine, Montana, Nebraska, New Mexico, North Dakota, Rhode Island, and Vermont. Sometimes, the 'best' five was easy to determine; sometimes, I had to make tough decisions about which players to include, as some states have produced quite a few players. I informed my judgments with statistics, but I also tried, insofar as it was possible, to create a balanced team by position. I'll make a note of a tough decision between players where it seems relevant.

There were 18 states that had a few NBA players born there, but less than a full starting five. I partnered each of those 18 states together, creating 9 full squads. This process was somewhat arbitrary: I tried to first match numbers: if one state had three players, I matched it with a state that had two. Beyond that, I looked at positions: if there was a state with four players and no point guard, I matched it with a state that produced one point guard, for example. I didn't consider the quality of the players at all when I made these decisions; it might be the case that a different arrangement of partnered states would change the results of this tournament significantly.

I had at this point a running total of 31 squads. I made two executive decisions. First, the District of Columbia produced enough players for a full squad, so I added them in to make the total 32. Second, the state of California has so many more people than other states and produced so many good players that I decided to cut the state in two: Northern California and Southern California produce separate squads, for a final total of 33 squads.

This number lends itself to an NCAA bracket-style competition--almost. Since 32 is a power of 2, we have one team too many to make a clean bracket, so we'll have a play-in game. I've seeded the teams by total state population, or in the case of the combined teams, by the populations of the two combined states (For California, I divided up the major metropolitan areas between northern and southern and assumed that rural areas come out in the wash, which is accurate enough for my purposes). The two teams with the lowest population, Kansas and Washington D.C., meet tomorrow, with the winner facing top seeded Texas on Wednesday. The full bracket, with seedings and dates for contests, can be found here. I'm not thrilled with the presentation at Bracket Maker, but it seems to me to be the best available that is free and Mac-compatible. If anyone has a better solution, please leave a comment on this post. Thanks for stopping by and read on for more details!

No comments: