UPDATED: Sentiment Analysis with “sentiment”

I was looking for a quick way to do sentiment analysis for comments from an employee survey. I came across this post here by Gaston Sanchez.

The guide is a little dated now (the “sentiment” package needs to be manually downloaded, ggplot2 has been updated, setting up a Twitter API has changed, etc). Since I found Gaston’s guide useful, I’ve included some updated steps to effectively get the same output that they provided previously.

This example looks for the sentiment of tweets about the #UCLfinal.

NOTE: R version 3.1.2 through R Studio

Step 1 – Install packages

You will only be required to install these packages the first time.

# Required packages for the plots
install.packages(c("plyr","ggplot2","wordcloud","RColorBrewer","httr","slam","mime","R6"," Rcpp"))

#Required packages to connect to your Twitter API
install.packages(c("twitteR", "bit","bit64","rjson","DBI")

# Required packages for sentiment
install.packages(c("NLP","tm","Rstem"))

Step 2 – Install ‘sentiment’

The sentiment package is not available from all the CRAN server, so you can install it manually. Download “sentiment_0.2.tar.gz” from http://cran.r-project.org/src/contrib/Archive/sentiment/

# Update [directory] with the location where you have saved "sentiment_0.2.tar.gz"
install.packages("[directory]", repos = NULL, type = "source")

Step 3 – Load all your packages

You will need to load these packages for each new session.

library(plyr)
library(ggplot2)
library(wordcloud)
library (RColorBrewer)
library(httr)
library(slam)
library(mime)
library(R6)
library(twitteR)
library(bit)
library(bit64)
library(rjson)
library(DBI)
library(tm)
library(Rstem)
library(NLP)
library(sentiment)
library(Rcpp)

Step 4 – Set up your API with Twitter

Go to https://apps.twitter.com/ and sign in (you’ll need to create a twitter account if you haven’t already)

Click on ‘Create New App’

Complete the compulsory fields, accept the Developer Agreement (note you can enter a placeholder Website if you don’t have one) and click ‘Create your Twitter Application’.

After the application management page loads click ‘Keys and Access Tokens’ and note your consumer key and secret.

Click ‘Create my access token’ and note your access token and token secret.

Step 5 – Connect to Twitter

Enter the authentication details below

# Authenticate with Twitter

api_key <- "[your key]"
api_secret <- "[your secret]"
token <- "[your token]"
token_secret <- "[your token secret]"
setup_twitter_oauth(api_key,api_secret,token,token_secret)

If you get the following prompt:

[1] "Using direct authentication"
Use a local file to cache OAuth access credentials between R sessions?
1: Yes
2: No

Press 1 and execute to save a local copy of the OAuth access credentials.

Step 6 – Harvest tweets

Now it’s time to harvest the tweets for analysis. Note, if you’re setting behind a firewall this may not work. If so, tweak your firewall settings. Additionally, it might take a minute to harvest the tweets.

# harvest some tweets
some_tweets = searchTwitter("uclfinal", n=1500, lang="en")

# get the text
some_txt = sapply(some_tweets, function(x) x$getText())

Step 7 – Prepare text for sentiment analysis

# remove retweet entities
some_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", some_txt)

# remove at people
some_txt = gsub("@\\w+", "", some_txt)

# remove punctuation
some_txt = gsub("[[:punct:]]", "", some_txt)

# remove numbers
some_txt = gsub("[[:digit:]]", "", some_txt)

# remove html links
some_txt = gsub("http\\w+", "", some_txt)

# remove unnecessary spaces
some_txt = gsub("[ \t]{2,}", "", some_txt)
some_txt = gsub("^\\s+|\\s+$", "", some_txt)

# define "tolower error handling" function 
try.error = function(x)
{
   # create missing value
   y = NA
   # tryCatch error
   try_error = tryCatch(tolower(x), error=function(e) e)
   # if not an error
   if (!inherits(try_error, "error"))
   y = tolower(x)
   # result
   return(y)
}

# lower case using try.error with sapply 
some_txt = sapply(some_txt, try.error)

# remove NAs in some_txt
some_txt = some_txt[!is.na(some_txt)]
names(some_txt) = NULL

Step 8 – Perform sentiment analysis

Please note that the classifying the polarity and emotion of the tweets may take a few minutes

# classify emotion
class_emo = classify_emotion(some_txt, algorithm="bayes", prior=1.0)

# get emotion best fit
emotion = class_emo[,7]

# substitute NA's by "unknown"
emotion[is.na(emotion)] = "unknown"

# classify polarity
class_pol = classify_polarity(some_txt, algorithm="bayes")

# get polarity best fit
polarity = class_pol[,4]

Step 9 – Create a data frame in order plot the results

# data frame with results
sent_df = data.frame(text=some_txt, emotion=emotion,
polarity=polarity, stringsAsFactors=FALSE)

# sort data frame
sent_df = within(sent_df, emotion

 This is what the first 5 rows of data may look like for df_sent

sentiment analysis R - first 5 rows

Step 10 – plot the emotions and polarity of the tweets

# plot distribution of emotions
ggplot(sent_df, aes(x=emotion)) +
geom_bar(aes(y=..count.., fill=emotion)) +
scale_fill_brewer(palette=”Dark2″) +
labs(x=”emotion categories”, y=”number of comments”) +
labs(title = “Sentiment Analysis of Tweets about UCL Final\n(classification by emotion)”, plot.title = element_text(size=12))

Sentiment analysis in R - emotionality

# plot distribution of polarity

ggplot(sent_df, aes(x=polarity)) +
geom_bar(aes(y=..count.., fill=polarity)) +
scale_fill_brewer(palette=”RdGy”) +
labs(x=”polarity categories”, y=”number of tweets”) +
labs(title = “Sentiment Analysis of Tweets about UCL Final \n(classification by polarity)”,plot.title = element_text(size=12))

sentiment analysis of tweets about UCL final - polarity

# separating text by emotion

emos = levels(factor(sent_df$emotion))
nemo = length(emos)
emo.docs = rep(“”, nemo)

for (i in 1:nemo)
{
tmp = some_txt[emotion == emos[i]]
emo.docs[i] = paste(tmp, collapse=” “)
}

# remove stopwords
emo.docs = removeWords(emo.docs, stopwords(“english”))

# create corpus
corpus = Corpus(VectorSource(emo.docs))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = emos

# comparison word cloud
comparison.cloud(tdm, colors = brewer.pal(nemo, “Dark2”), scale = c(3,.5), random.order = FALSE, title.size = 1.5)

Sentiment analysis in R - word cloud

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s