R Language / R News

Twitter sentiment analysis based on affective lexicons with R

by Sergey · May 12, 2014

This article is originally published at https://www.analyzecore.com

We will study another dictionary-based approach that is based on affective lexicons for Twitter sentiment analysis Continue to dig tweets. After we reviewed how to count positive, negative and neutral tweets in the previous post, I discovered another great idea. Suppose positive or negative mark is not enough and we want to understand the rate of positivity or negativity.

For example, if word “good” has 4 points rating, but “perfect” has 6. In this way we can try to measure the rate of satisfaction or opinion in tweets and take a chart with the trend as the following:

We need another dictionary for managing this task, specifically the dictionary with a rating of words. We can create it or find results of great research of affective ratings (e.g. here).

And of course, our algorithm should bypass Twitter’s API limitation via accumulating historical data. This approach was described in the previous post.

Note, I will use average rating for evaluating tweets based on words rating it consists of. For example, if we’ve found “good” (4 points) and “perfect” (6 points) in the tweet, it would be evaluated as (4+6)/2=5. In this way, we will avoid the influence of several negative words that could have a higher total rating, e.g. one word “good” (4 points) should have a higher rating than three words “bad” (for 1,5 points each).

Let’s start. We need to create Twitter Application (https://apps.twitter.com/) in order to have an access to Twitter’s API. Then we will get Consumer Key and Consumer Secret. And finally, our code in R:

#connect all libraries
 library(twitteR)
 library(ROAuth)
 library(plyr)
 library(dplyr)
 library(stringr)
 library(ggplot2)

#connect to API
 download.file(url='http://curl.haxx.se/ca/cacert.pem', destfile='cacert.pem')
 reqURL <- 'https://api.twitter.com/oauth/request_token'
 accessURL <- 'https://api.twitter.com/oauth/access_token'
 authURL <- 'https://api.twitter.com/oauth/authorize'
 consumerKey <- '____________' #put the Consumer Key from Twitter Application
 consumerSecret <- '______________'  #put the Consumer Secret from Twitter Application
 Cred <- OAuthFactory$new(consumerKey=consumerKey,
                                                       consumerSecret=consumerSecret,
                                                       requestURL=reqURL,
                                                       accessURL=accessURL,
                                                       authURL=authURL)
 Cred$handshake(cainfo = system.file('CurlSSL', 'cacert.pem', package = 'RCurl')) #There is URL in Console. You need to go to, get code and enter it on Console

save(Cred, file='twitter authentication.Rdata')
 load('twitter authentication.Rdata') #Once you launched the code first time, you can start from this line in the future (libraries should be connected)
 registerTwitterOAuth(Cred)

#the function for extracting and analyzing tweets
 search <- function(searchterm)
 {
 #extract tweets and create storage file
 list <- searchTwitter(searchterm, cainfo='cacert.pem', n=1500)
 df <- twListToDF(list)
 df <- df[, order(names(df))]
 df$created <- strftime(df$created, '%Y-%m-%d')
 if (file.exists(paste(searchterm, '_stack_val.csv'))==FALSE) write.csv(df, file=paste(searchterm, '_stack_val.csv'), row.names=F)

#merge the last extraction with storage file and remove duplicates
 stack <- read.csv(file=paste(searchterm, '_stack_val.csv'))
 stack <- rbind(stack, df)
 stack <- subset(stack, !duplicated(stack$text))
 write.csv(stack, file=paste(searchterm, '_stack_val.csv'), row.names=F)

#tweets evaluation function
 score.sentiment <- function(sentences, valence, .progress='none')
 {
 require(plyr)
 require(stringr)
 scores <- laply(sentences, function(sentence, valence){
 sentence <- gsub('[[:punct:]]', '', sentence) #cleaning tweets
 sentence <- gsub('[[:cntrl:]]', '', sentence) #cleaning tweets
 sentence <- gsub('\\d+', '', sentence) #cleaning tweets
 sentence <- tolower(sentence) #cleaning tweets
 word.list <- str_split(sentence, '\\s+') #separating words
 words <- unlist(word.list)
 val.matches <- match(words, valence$Word) #find words from tweet in "Word" column of dictionary
 val.match <- valence$Rating[val.matches] #evaluating words which were found (suppose rating is in "Rating" column of dictionary).
 val.match <- na.omit(val.match)
 val.match <- as.numeric(val.match)
 score <- sum(val.match)/length(val.match) #rating of tweet (average value of evaluated words)
 return(score)
 }, valence, .progress=.progress)
 scores.df <- data.frame(score=scores, text=sentences) #save results to the data frame
 return(scores.df)
 }

valence <- read.csv('dictionary.csv', sep=',' , header=TRUE) #load dictionary from .csv file

Dataset <- stack
 Dataset$text <- as.factor(Dataset$text)
 scores <- score.sentiment(Dataset$text, valence, .progress='text') #start score function
 write.csv(scores, file=paste(searchterm, '_scores_val.csv'), row.names=TRUE) #save evaluation results into the file

#modify evaluation
 stat <- scores
 stat$created <- stack$created
 stat$created <- as.Date(stat$created)
 stat <- na.omit(stat) #delete unvalued tweets
 write.csv(stat, file=paste(searchterm, '_opin_val.csv'), row.names=TRUE)

#chart
 ggplot(stat, aes(created, score)) + geom_point(size=1) +
 stat_summary(fun.data = 'mean_cl_normal', mult = 1, geom = 'smooth') +
 ggtitle(searchterm)

ggsave(file=paste(searchterm, '_plot_val.jpeg'))
 }

search("______") #enter keyword

Finally, we will get 4 files:

storage file with initial data,
file with tweets rating,
cleaned (without unvalued tweets) file with tweets and dates,
the chart where we can see the density of tweet ratings and mean as a trend that looks like:

SaveSave

The post Twitter sentiment analysis based on affective lexicons with R appeared first on AnalyzeCore by Sergey Bryl' - data is beautiful, data is a story.

Thanks for visiting r-craft.org
This article is originally published at https://www.analyzecore.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Twitter sentiment analysis based on affective lexicons with R

You may also like...

Categories

Twitter sentiment analysis based on affective lexicons with R

You may also like...

Why You Should Write Clean Code

Playing with the US Population

Using Python Pandas dataframe to read and insert data to Microsoft SQL Server

Categories