Friday, March 24, 2017

Neural Networks for Learning Lyrics

I created a Twitter account which was inspired by a couple Twitter accounts that applied a particular type of machine learning technique to learn how two (at the time) presidential hopefuls spoke. I thought, why not see what a model like this could do with lyrics from my favorite rock n roll artist?
Long short term memory (LSTM) is a recurrent neural network (RNN) that can be used to produce sentences or phrases by learning from text. The two twitter accounts that inspired this were @deeplearnthebern and @deepdrumpf which use this technique to produce phrases and sentences.
I scraped a little more than 300 of his songs and have fed them to a LSTM model using R and the mxnet library. Primarily I used the to build and train the model…great site and tools.  The tutorials on their site are very helpful and particularly this one.

The repository is here that contains the code for the scraper and other information.
Follow deeplearnbruce for tweets that are hopefully entertaining for Springsteen fans or anyone else. 

Saturday, December 10, 2016

Supreme Court Politics

I had wanted to post this before the US Election, but time constraints didn't allow.  With the potential for new Supreme Court Justices in the next four years, many voters and namely single-issue voters rallied behind Donald Trump for his seeming support for a conservative justice.  Most of the people I spoke with were primarily concerned with the potential appointment of a justice who could help in concentrating efforts in overturning abortion.  For a more in-depth look on Trump, his stances and commentary on overturning abortion I found this article to be helpful.

I was curious in the past how the political leanings of the Justices has changed over time and if opportunities like a conservative bench with a republican president have occurred in the past.  I was curious because I wondered if an appointment during the Trump administration would change anything.

The graph below shows the amount of abortions over time during different presidents and each point shows the political split of the bench (Democrat-Republican).

Political leanings on a bench aren't indicative of a pro-life vote.  However, learning more conservative politically does provide potential for a favorable pro-life vote.  I found this information interesting since the split on the court has predominantly been conservative for the last 40 years and only recently has it become more liberal.  The graph speaks for itself and the code for it is available here.

This information isn't put out here to necessarily change minds on this issue.  That's best done around tables where both sides can listen, but I thought this was helpful for understanding some of the numbers and history on this particular circumstance.  

Wednesday, January 20, 2016

State of the Union Speeches and Data

I've done a couple posts on the SOTU speeches.  In the past these dealt with word count, approval, and the vague notion that the applause the president receives has a relationship with his approval rating at that time (which had a lower correlation this year in fact).

Wired had a good article highlighting the sentiment in the current and previous State of the Union (SOTU) speeches.  They went through the speech for several of the past years, highlighted the events that occurred each year, and gave the corresponding frequency or usage of terms in the speech that communicated the impact of those events.  This blog post is not duplicating the article.  I did see the graph though and wanted to see if I got a similar sentiment score for the speeches.  I used the 'syuzhet' library in R to conduct the analysis (big thanks to Matthew Jockers for the package).

The graph is similar to the one in the Wired article, but not entirely.  Some smoothing was involved and perhaps a different sentiment analysis technique.  We do see a similar finding in the most recent SOTU speech:  it ended with the highest sentiment score out of all the speeches.  Several of the speeches in my analysis showed a curving up toward the end, which would in general go along with "ending on a positive note".  Additionally, one can see the "valleys" or lower sentiment values occurring between the 50 and 75 time intervals.  This isn't too surprising in that the same speech writer is being used and that the SOTU has perhaps a more standard sentiment form (another analysis perhaps?).  

This same library has a function which scores certain words to emotional categories.  These 10 categories include a positive/negative categorization.  Along with these, I added in the applause count for each speech and the approval rating for each year for the time period of the speech.  The matrix below depicts the correlation values of each category with corresponding color.  Additionally, I added in a p-value scoring for each relationship, those >.1 were given bubbles.

There's a lot here in terms of what could be said about the speeches but I'll only say a few things that I thought were interesting.  The applause/approval rating correlation showed a weaker value than last year (-.5), which isn't too surprising since this is probably spurious anyways.  Negative word categorization and applause had a higher correlation than positive word categorization and applause.  Meaning, when comparing applause and negative word use across speeches, these counts varied in a similar way (applause count higher - negative word count higher and vice versa).  Speeches with words categorized as "anger" or "fear" had a weak correlation to the applause count.  Conversely, speeches with words categorized in emotions like "joy", "surprise", and "trust" portray a stronger correlation with applause count in those same speeches.  So perhaps to get more applause in general, certain positive words are better than others?  Yoda's advice about fear would make sense here in that words associated with fear tend to vary similarly to words associated with anger.

We also see a decent amount of correlation among more positive emotions as well as within more negative emotions.  This refers back to the common "curve" that these speeches may have.  In that the sentiment used year over year tend to be similar, or at least the emotional categorization of words follow similar patterns.

Thanks to Matthew Jockers, Taiyun Wei, and Hadley Wickam for their work on the 'syuzhet', 'corrplot', and 'ggplot' packages respectively.  Code for the above analysis is on my github page.