I was looking for a quick way to do sentiment analysis for comments from an employee survey. I came across this post here by Gaston Sanchez.
The guide is a little dated now (the “sentiment” package needs to be manually downloaded, ggplot2 has been updated, setting up a Twitter API has changed, etc). Since I found Gaston’s guide useful, I’ve included some updated steps to effectively get the same output that they provided previously.
This example looks for the sentiment of tweets about the #UCLfinal.
NOTE: R version 3.1.2 through R Studio
Step 1 – Install packages
You will only be required to install these packages the first time.
# Required packages for the plots
install.packages(c("plyr","ggplot2","wordcloud","RColorBrewer","httr","slam","mime","R6"," Rcpp"))
#Required packages to connect to your Twitter API
install.packages(c("twitteR", "bit","bit64","rjson","DBI")
# Required packages for sentiment
install.packages(c("NLP","tm","Rstem"))
Step 2 – Install ‘sentiment’
The sentiment package is not available from all the CRAN server, so you can install it manually. Download “sentiment_0.2.tar.gz” from http://cran.r-project.org/src/contrib/Archive/sentiment/
# Update [directory] with the location where you have saved "sentiment_0.2.tar.gz"
install.packages("[directory]", repos = NULL, type = "source")
Step 3 – Load all your packages
You will need to load these packages for each new session.
library(plyr)
library(ggplot2)
library(wordcloud)
library (RColorBrewer)
library(httr)
library(slam)
library(mime)
library(R6)
library(twitteR)
library(bit)
library(bit64)
library(rjson)
library(DBI)
library(tm)
library(Rstem)
library(NLP)
library(sentiment)
library(Rcpp)
Step 4 – Set up your API with Twitter
Go to https://apps.twitter.com/ and sign in (you’ll need to create a twitter account if you haven’t already)
Click on ‘Create New App’
Complete the compulsory fields, accept the Developer Agreement (note you can enter a placeholder Website if you don’t have one) and click ‘Create your Twitter Application’.
After the application management page loads click ‘Keys and Access Tokens’ and note your consumer key and secret.
Click ‘Create my access token’ and note your access token and token secret.
Step 5 – Connect to Twitter
Enter the authentication details below
# Authenticate with Twitter
api_key <- "[your key]"
api_secret <- "[your secret]"
token <- "[your token]"
token_secret <- "[your token secret]"
setup_twitter_oauth(api_key,api_secret,token,token_secret)
If you get the following prompt:
[1] "Using direct authentication"
Use a local file to cache OAuth access credentials between R sessions?
1: Yes
2: No
Press 1 and execute to save a local copy of the OAuth access credentials.
Step 6 – Harvest tweets
Now it’s time to harvest the tweets for analysis. Note, if you’re setting behind a firewall this may not work. If so, tweak your firewall settings. Additionally, it might take a minute to harvest the tweets.
# harvest some tweets
some_tweets = searchTwitter("uclfinal", n=1500, lang="en")
# get the text
some_txt = sapply(some_tweets, function(x) x$getText())
Step 7 – Prepare text for sentiment analysis
# remove retweet entities
some_txt = gsub("(RT|via)((?:\b\W*@\w+)+)", "", some_txt)
# remove at people
some_txt = gsub("@\w+", "", some_txt)
# remove punctuation
some_txt = gsub("[[:punct:]]", "", some_txt)
# remove numbers
some_txt = gsub("[[:digit:]]", "", some_txt)
# remove html links
some_txt = gsub("http\w+", "", some_txt)
# remove unnecessary spaces
some_txt = gsub("[ t]{2,}", "", some_txt)
some_txt = gsub("^\s+|\s+$", "", some_txt)
# define "tolower error handling" function
try.error = function(x)
{
# create missing value
y = NA
# tryCatch error
try_error = tryCatch(tolower(x), error=function(e) e)
# if not an error
if (!inherits(try_error, "error"))
y = tolower(x)
# result
return(y)
}
# lower case using try.error with sapply
some_txt = sapply(some_txt, try.error)
# remove NAs in some_txt
some_txt = some_txt[!is.na(some_txt)]
names(some_txt) = NULL
Step 8 – Perform sentiment analysis
Please note that the classifying the polarity and emotion of the tweets may take a few minutes
# classify emotion
class_emo = classify_emotion(some_txt, algorithm="bayes", prior=1.0)
# get emotion best fit
emotion = class_emo[,7]
# substitute NA's by "unknown"
emotion[is.na(emotion)] = "unknown"
# classify polarity
class_pol = classify_polarity(some_txt, algorithm="bayes")
# get polarity best fit
polarity = class_pol[,4]
Step 9 – Create a data frame in order plot the results
# data frame with results
sent_df = data.frame(text=some_txt, emotion=emotion,
polarity=polarity, stringsAsFactors=FALSE)
# sort data frame
sent_df = within(sent_df, emotion
This is what the first 5 rows of data may look like for df_sent
Step 10 – plot the emotions and polarity of the tweets
# plot distribution of emotions
ggplot(sent_df, aes(x=emotion)) +
geom_bar(aes(y=..count.., fill=emotion)) +
scale_fill_brewer(palette=”Dark2″) +
labs(x=”emotion categories”, y=”number of comments”) +
labs(title = “Sentiment Analysis of Tweets about UCL Finaln(classification by emotion)”, plot.title = element_text(size=12))
# plot distribution of polarity
ggplot(sent_df, aes(x=polarity)) +
geom_bar(aes(y=..count.., fill=polarity)) +
scale_fill_brewer(palette=”RdGy”) +
labs(x=”polarity categories”, y=”number of tweets”) +
labs(title = “Sentiment Analysis of Tweets about UCL Final n(classification by polarity)”,plot.title = element_text(size=12))
# separating text by emotion
emos = levels(factor(sent_df$emotion))
nemo = length(emos)
emo.docs = rep(“”, nemo)
for (i in 1:nemo)
{
tmp = some_txt[emotion == emos[i]]
emo.docs[i] = paste(tmp, collapse=” “)
}
# remove stopwords
emo.docs = removeWords(emo.docs, stopwords(“english”))
# create corpus
corpus = Corpus(VectorSource(emo.docs))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = emos
# comparison word cloud
comparison.cloud(tdm, colors = brewer.pal(nemo, “Dark2”), scale = c(3,.5), random.order = FALSE, title.size = 1.5)