Wednesday, April 30, 2014

Syrian Refugee with googleVis

In a previous post I showed the timeline of refugees leaving Syria for other countries.  The number of refugees leaving Syria is remarkable.  Understanding the strain this presents on countries is difficult to grasp.  Below I've attempted to visualize the population and proportion of refugees each of these countries hosts.  The tree map below shows the number of refugees (color:  red (lower) - green (higher) ) when compared between countries and the relative size of the host country (population) being represented by square size in the plot.  To the right of the tree map is a sankey diagram showing the relative volume of refugees in different countries in different times.  Notice how the volume to Lebanon increases dramatically from 2012 to 2013.

Though the number of refugees would strain any countries' economy, Lebanon is experiencing a unique proportion of the burden in this regard.  The number of refugees has had a considerable impact on the overall population percentage of the country.  The weight of hosting this many refugees is particularly straining to Lebanese laborers, rent prices, and other resources of the country.  

Thursday, April 24, 2014

Boston Marathon Winner

In a previous post I described the current level of competition between countries/continents in the marathon and specifically the Boston Marathon.  The dominance of Africa in the marathon was discussed at length and recent times for the winners of past Boston Marathon events were shown as the times descended and the winners were represented by African countries.  It was also mentioned that recently, the range of expectation has changed to where I thought it would allow an athlete representing a non-African country to win the race.

Monday this came true.

Meb Kflezighi (USA) won the Boston Marathon and set a personal record in the process (not bad for an almost 39 year old).  There are a couple of interesting things about his win in the Boston Marathon that I think allow for a shift in the way spectators and participants view this race.

As can be seen in the graph below, spectators and runners alike had a mutual understanding that of all the countries represented the chances of a non-African country winning were slim from the sheer numbers of African runners that can run the kind of times needed to win.  Meb's win changes that expectation for participants who on their best day could run in the 2:07-2:10 range.

For spectators, it allows for a shift in understanding about the variance in races.  Seven of the runners in the elite field had run a marathon in under 2:05.  Relative to the potential of the field and past times, the race this year was not run remarkably fast at least for these athletes.  As can be seen below this time was very comparable (within 1 minute) of the last time this race was won by American Greg Meyer in 1983.

In 1983 Greg Meyer ran a race time that was faster than what could be "expected" to win the race at that time ("expected" that is within the confidence interval).  30 years later our expectations change again where the times of 30 years ago can win races.

I think variance in this regard is healthy for the interest in a sport.  It increases the interest of spectators because the possibilities are more wide-ranging and (arguably) the race is more entertaining.  It provides hope to all marathon runners who can run times within what could be "expected".  Meb's win also provides inspiration across age groups who may have seen running as a mid-20s to early 30s sport.  This next year's Boston Marathon will no doubt see a very competitive field like previous races.  The shifting expectations about this race will hopefully lead to increased participation from athletes and heightened interest by spectators.    

Friday, April 18, 2014

The Current Golden Era of the NBA

We are in the midst of what many are calling a "golden age" of the NBA.  Being in the midst of a time where attention to the sport has seemingly increased is difficult to quantify.  For most people who have had an interest in the NBA over a long period of time, the current state of the game just "feels" like a time unlike recent years.  Awareness of the calibre of game we are witnessing is important to more fully appreciating the games and the players we get to see perform.

In the last couple of years we have seen two giants in the game emerge as contenders (Durant/Lebron) that reminds many of the Bird/Magic years.  Friendship coupled with competition in the kind of way that keeps you glued to the screen when they play.  The most valuable player (MVP) distinction, I would argue is a reasonable way to see where the game is at in terms of the quality of play in the league.  The site basketball-reference provides an enormous amount of data on the sport and is a great place to begin looking at MVP as a metric for determining the "era" of current play in the NBA.

Below is a heatmap showing different statistics gotten from the website for players that were awarded the MVP in different years.  The colors show a distribution of how the players ranked compared to eachother based on these yearly stats (Red>Blue).  On the left side of the heat map is a dendrogram showing how players could be grouped based on these stats.

The stats are total games (Games), field goal % per game (FGoalPercen), free throw % (FTPercen), assists per game (Assists), rebounds per game (Rebounds), minutes per game (Minutes), average points per game (Pts), and player age (Age).  Next we take this same dendrogram and divide players into clusters using a method (kmeans) based on the above statistics.  The red lines outline the different clusters we get when creating 5 of them.  Again these groups are based on the similarity in these stats between players.

What results is a set of data where we can see how MVPs could be grouped based on the stats in the heatmap above.  Kevin Durant hasn't been awarded the MVP yet, but let's just assume he does, and his current stats don't change at all after these 81 regular season games (this is all on a per game basis).

Clearly, cluster/group 2 or what I will call the "Golden Era Group" is the largest.  Even though some players arguable shouldn't be in this group, it's mostly comprised of players that reflective NBA watchers can agree were apart of what many have called "golden age(s)" in the NBA.  Also interesting to note are the Bill Russell and Wilt Chamberlain clusters.  In the case of Wilt Chamberlain his rebound and shooting numbers were much higher than his peers, whereas Bill Russell is placed into his own group because of his free-throw % being in the 50%-60% range...or much lower than his MVP peers.

Here are other players in the "Golden Era Group" with their Points per game against the year.  Notice how comparable Durant is to other giants in the "Golden Era", and how amazing Jordan was compared to his MVP peers.

In general, we can see that recent years' MVP awards are grouped with Bird, Magic, and Jordan.  As a proxy for measuring each players' performance in the league, measuring the performance of MVPs seems to indicate that the current level of play of the best in the NBA could be associated with these by-gone eras of greatness.  In many ways knowing that the current level of play is comparable is intuitive without looking at the numbers, just by watching the game.  In support of feeling like it's a "golden age" of the NBA there are numbers to support it.

Time for the playoffs.....

Wednesday, April 2, 2014

Boston Marathon Winners and Challenging Africa

The marathon is dominated by African runners.  David Epstein in a relatively recent interview mentions about a specific tribe in Kenya called the Kalenjin, "There are 17 American men in history who have run under 2:10 in the marathon...there were 32 Kalenjin who did it in October of 2011". The times and number of African runners reaching those times times rarely achieved by their racing counterparts is impressive.  Below is a graph showing the top 50 times recorded by Association of International Marathons and Distance Races (AIMS) over the past few years.  

The Boston Marathon is perhaps the most sought after race for marathon distance runners.  At least in the US, qualifying for the Boston Marathon can be the pinnacle achievement for an avid runner's career.  As one would expect, this race draws runners from all over the world who seek the prestige and purse of winning the Boston Marathon.  Over time the winners of this race have changed, as arguably, the physiology (and arguably culture) of runners has become more of a factor since access to the race has become easier over time (for more on Kenyan physiology and culture as running determinants, see this Radiolab podcast).  Like in all marathon races, the times are getting lower and African runners have shown a clear dominance over the last several years.  In the graph below you can see the descent into Boston Marathon winning times that 30-40 years ago were unimaginable.  

Here are the same times and years broken out by continent instead of country.  Notice the break in dominance of winning this marathon from Europe/North America to Africa in the mid 1980s.  Prior to this time, the race enjoyed a larger amount of variety in countries/continents winning the race.

The grayish line intersecting these points is basically a confidence interval (95% confidence interval).  One could interpret any point within this grey area as a time that would not be a statistical outlier or a time that could be expected to win the Boston Marathon.  The interesting thing about this graph is how the gray area is now widening in the past few years.  This is partially because of the fastest marathon ever run is included in this graph (This was done by Geoffrey Mutai in 2011, which did not count as a world record formally because of the change in relief of the Boston Marathon).  Notwithstanding this time, we also see times more recently that have historically been run by North Americans, Australians, Asians, and Europeans.  Though it is clear that Africa demonstrates clear dominance in this marathon and others, the times that African participants have been running are not insurmountable from a historic perspective.

This "widening" of race time expectations I believe provides opportunities to continents and countries who have run races at this speed in the past.  The question now becomes how many runners in these continents/countries can currently run at these paces.  There are some.  Both Ryan Hall and Dathan Ritzenhein are US runners who have run marathons in 2:08, which would make them both very competitive with the recent winners of the Boston Marathon.

Stripping out the African countries we can see the times of other continents over the past several years.  In fact all of these times fit into the range of "expectation" (95% confidence interval) of the most recent races.

Running this fast a race must take into account multiple other factors such as weather, injury, etc.  However, based on the data of previous races, the times produced by these runners in the graph above would have been very competitive if not won previous recent years' marathons.  There may not be as many challengers in other continents, but those challenging African runners stand a chance.  More recently if a non-African runner had run the Boston Marathon in what would possibly have been their best race, they would have had a great chance at winning.