R Language / R News

Twitter sentiment analysis with R

by Sergey · April 28, 2014

This article is originally published at https://www.analyzecore.com

We will study a dictionary-based approach for Twitter sentiment analysisRecently I’ve designed a relatively simple code in R for analyzing Twitter posts content via calculating the number of positive, negative and neutral words. The idea of processing tweets is based on the presentation http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais.

The words in the tweet correspond with the words in dictionaries that you can find on the internet or create by yourself. It is also possible to edit these dictionaries. Really great work, but I’ve discovered some issue.

There are some limitations in the API of Twitter. It depends on the total number of tweets you access via API but usually, you can get tweets for the last 7-8 days (not longer, and it can be 1-2 days only). The 7 to 8 days time limit doesn’t allow us to analyze historical trends.

My idea is to create a storage file in order to accumulate historical data and bypass API’s limitations. If you extract tweets regularly, you would analyze the dynamics of sentiments with the chart like this one:

Furthermore, this algorithm includes a function that allows you to extract quite a few keywords that you are interested in. The process can be repeated several times a day and data set for each keyword will be saved separately. It can be helpful, for example, for doing competitors analysis.

Let’s start. We need to create Twitter Application (https://apps.twitter.com/) in order to have an access to Twitter’s API. Then we will get Consumer Key and Consumer Secret.

#connect all libraries
 library(twitteR)
 library(ROAuth)
 library(plyr)
 library(dplyr)
 library(stringr)
 library(ggplot2)

#connect to API
 download.file(url='http://curl.haxx.se/ca/cacert.pem', destfile='cacert.pem')
 reqURL <- 'https://api.twitter.com/oauth/request_token'
 accessURL <- 'https://api.twitter.com/oauth/access_token'
 authURL <- 'https://api.twitter.com/oauth/authorize'
 consumerKey <- '____________' #put the Consumer Key from Twitter Application
 consumerSecret <- '______________'  #put the Consumer Secret from Twitter Application
 Cred <- OAuthFactory$new(consumerKey=consumerKey,
                                                       consumerSecret=consumerSecret,
                                                       requestURL=reqURL,
                                                       accessURL=accessURL,
                                                       authURL=authURL)
 Cred$handshake(cainfo = system.file('CurlSSL', 'cacert.pem', package = 'RCurl')) #There is URL in Console. You need to go to, get code and enter it in Console

save(Cred, file='twitter authentication.Rdata')
 load('twitter authentication.Rdata') #Once you launched the code first time, you can start from this line in the future (libraries should be connected)
 registerTwitterOAuth(Cred)

#the function for extracting and analyzing tweets
 search <- function(searchterm)
 {
 #extact tweets and create storage file

list <- searchTwitter(searchterm, cainfo='cacert.pem', n=1500)
 df <- twListToDF(list)
 df <- df[, order(names(df))]
 df$created <- strftime(df$created, '%Y-%m-%d')
 if (file.exists(paste(searchterm, '_stack.csv'))==FALSE) write.csv(df, file=paste(searchterm, '_stack.csv'), row.names=F)

#merge the last extraction with storage file and remove duplicates
 stack <- read.csv(file=paste(searchterm, '_stack.csv'))
 stack <- rbind(stack, df)
 stack <- subset(stack, !duplicated(stack$text))
 write.csv(stack, file=paste(searchterm, '_stack.csv'), row.names=F)

#tweets evaluation function
 score.sentiment <- function(sentences, pos.words, neg.words, .progress='none')
 {
 require(plyr)
 require(stringr)
 scores <- laply(sentences, function(sentence, pos.words, neg.words){
 sentence <- gsub('[[:punct:]]', "", sentence)
 sentence <- gsub('[[:cntrl:]]', "", sentence)
 sentence <- gsub('\\d+', "", sentence)
 sentence <- tolower(sentence)
 word.list <- str_split(sentence, '\\s+')
 words <- unlist(word.list)
 pos.matches <- match(words, pos.words)
 neg.matches <- match(words, neg.words)
 pos.matches <- !is.na(pos.matches)
 neg.matches <- !is.na(neg.matches)
 score <- sum(pos.matches) - sum(neg.matches)
 return(score)
 }, pos.words, neg.words, .progress=.progress)
 scores.df <- data.frame(score=scores, text=sentences)
 return(scores.df)
 }

pos <- scan('C:/___________/positive-words.txt', what='character', comment.char=';') #folder with positive dictionary
 neg <- scan('C:/___________/negative-words.txt', what='character', comment.char=';') #folder with negative dictionary
 pos.words <- c(pos, 'upgrade')
 neg.words <- c(neg, 'wtf', 'wait', 'waiting', 'epicfail')

Dataset <- stack
 Dataset$text <- as.factor(Dataset$text)
 scores <- score.sentiment(Dataset$text, pos.words, neg.words, .progress='text')
 write.csv(scores, file=paste(searchterm, '_scores.csv'), row.names=TRUE) #save evaluation results

#total score calculation: positive / negative / neutral
 stat <- scores
 stat$created <- stack$created
 stat$created <- as.Date(stat$created)
 stat <- mutate(stat, tweet=ifelse(stat$score > 0, 'positive', ifelse(stat$score < 0, 'negative', 'neutral')))
 by.tweet <- group_by(stat, tweet, created)
 by.tweet <- summarise(by.tweet, number=n())
 write.csv(by.tweet, file=paste(searchterm, '_opin.csv'), row.names=TRUE)

#chart
 ggplot(by.tweet, aes(created, number)) + geom_line(aes(group=tweet, color=tweet), size=2) +
 geom_point(aes(group=tweet, color=tweet), size=4) +
 theme(text = element_text(size=18), axis.text.x = element_text(angle=90, vjust=1)) +
 #stat_summary(fun.y = 'sum', fun.ymin='sum', fun.ymax='sum', colour = 'yellow', size=2, geom = 'line') +
 ggtitle(searchterm)

ggsave(file=paste(searchterm, '_plot.jpeg'))

search("______") #enter keyword

Finally, we will get four files:

storage file with initial data,
file with tweets rating,
file with the number of tweets of each type (positive / negative / neutral) as of date,
and the chart that looks like:

SaveSave

The post Twitter sentiment analysis with R appeared first on AnalyzeCore by Sergey Bryl' - data is beautiful, data is a story.

Thanks for visiting r-craft.org
This article is originally published at https://www.analyzecore.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Twitter sentiment analysis with R

You may also like...

Categories

Twitter sentiment analysis with R

You may also like...

readxl 1.2.0

Releasing RQGIS 0.2.0

AI for Good: slides and notebooks from the ODSC workshop

Categories