This article is originally published at https://rcrastinate.blogspot.com/

As if the R world needed another example of Twitter visualizations, right? Well, here we go anyway.

At the beginning of 2013, Pablo Barberá released the first version of his R package '

streamR' (

CRAN link). With this package, you can tap into the streaming capabilities of the

Twitter API. I did so for 10 consecutive days. Luckily, one of those days was September, 22nd - the day of the 'Bundestagswahl' (parliamentary elections) in Germany.

I've decided to put the code at the end of this post. If you want to try things out, you can find the code by scrolling to the bottom. Please note, that you may not be able to replicate all of this because you do not have the Twitter stream data available.

So, let's first visualize all tweets between September, 17th and September 27th on a map. (As always: Please click on the image to enlarge it).

We color all tweets containing "btw" (the hash tag for the elections was #btw13), "wahl" or "wähl" in yellow. "wähl" is the stem of the German verb for "elect" or "choose".

Some of you may know a certain webcomic by

Randall Munroe. In this post about geographic profile maps, he made an important point. And it also applies to this scenario. Let's look at another map with the sizes of all German cities with a population of over 100,000 people visualized as circles (the greater the radius of the circle, the greater the population of the city - proportions of circles are the same as the proportions of population sizes).So, unsurprisingly, more people tweet more and the more tweets there are, the more tweets are about the elections. The following picture shows only tweets about the elections and city sizes.

So, what if we try to partial out city size? To do so, I use a set of German cities (taken from the dataset world.cities) and iterate through it. For every city, I look for tweets that originated within a 'box' around this city. This 'box' is defined by 0.03 units longitude and latitude (this value can be set in the function 'find.tweets' finding tweets around each city). Then, I take the population size of each city and predict number of tweets by it. I then use the residuals of this model for visualization. These residuals can be interpreted as 'number of tweets with city size partialled out'. Have a look for all tweets in the 10 days.

This is quite interesting. Berlin, the capital of Germany, and sometimes thought of as the 'center of innovation' in Germany does not seem to tweet more than would be expected given its population size. But what is going on with Greifswald in the north-east?!? The answer is quite simple: There is a

weather station in Greifswald tweeting weather stats every 5 minutes! So, let's exlude the weather station and do the residualization stuff again:

And now, let us do the same, but only with tweets concerning the elections.

Now, we will overlay the last to maps.

Please note, that the sizes of orange circles (residual number of tweets about elections) and red circles (residual number of all tweets) cannot be prepared with one another. I.e., the size of the orange circle within a red circle does not symbolize the number of tweets about the elections compared to all tweets. Red circles can be compared with red circles, orange circles can be compared with orange circles.

Now, in a last plot, I want to visualize the number of tweets during the day of the elections. Citizens were allowed to vote till 18:00 h (marked with a black line). The number of all tweets per hour mentioning 'btw13', 'wahl' and 'wähl' is visualized by yellow bars. The percentage of tweets about the elections can thus be seen by the relation between blue and yellow bars. It's quite nice to see how tweets mentioning the critical patterns rise in numbers and percentage just before the end of elections (I looked into some of them, many of them are motivations to go voting). As soon as polling stations closed, number of tweets rose while tweets about the elections went down again. Most people seemed to mind other business as soon as the elections were over. Another possibility is that after polling stations closed, tweets about the elections concentrated on commenting on the results and not on the process of voting itself.

So much for now. I will post the R code used for this post as soon as I tidied it up a little :)

EDIT: Code now available.

load(<dataset with tweets>) # variable name 'twit'

library(rworldmap)

library(scales)

library(maps)

library(data.table)

# plot empty map

par(mar=c(0,0,0,0))

plot(getMap(resolution="low"), xlim = c(5,15), ylim = c(47,55))

twit$day <- substr(twit$created_at, 9, 10)

twit$hour <- as.numeric(substr(twit$created_at, 12, 13))+2

twit$day.hour <- paste0(twit$day, "-", twit$hour)

# extract election tweets

btw <- twit[grep("btw13|wahl|wähl", twit$text, ignore.case=T),]

# plotting all tweets

points(x=twit$lon, y=twit$lat, pch = 19, col = alpha("blue", 0.05), cex = 0.6)

# plotting election tweets

points(x=btw$lon, y=btw$lat, pch = 19, col = alpha("yellow", 0.05), cex = 0.7)

# get German cities

all.ger.cit <- world.cities[world.cities$country.etc == "Germany",]

# find tweets for each city

# 'tol' parameter sets tolerance in lat and lon

find.tweets <- function (slat, slon, twit.dat, tol = 0.03) {

subset(twit.dat, (r.lat >= slat-tol & r.lat <= slat+tol) & (r.lon >= slon-tol & r.lon <= slon+tol))

}

all.ger.cit <- as.data.frame(all.ger.cit)

all.ger.cit$n.tweets <- apply(all.ger.cit, 1, FUN = function (row) {

row.lat <- as.numeric(row["lat"])

row.lon <- as.numeric(row["long"])

f.tweets <- find.tweets(slat=row.lat, slon=row.lon, twit.dat=twit, tol=0.03)

nrow(f.tweets)

})

# predict number of tweets by population size of the city

mod <- lm(n.tweets ~ pop, data = all.ger.cit, na.action = "na.exclude")

all.ger.cit$res.n.tweets <- residuals(mod)

# plot residual number of tweets

all.ger.cit$plot.col <- ifelse(all.ger.cit$res.n.tweets > 0, "red", "blue")

par(mar=c(0,0,0,0))

plot(getMap(resolution="low"), xlim = c(5,15), ylim = c(47,55))

points(x=all.ger.cit$long, y=all.ger.cit$lat, cex = all.ger.cit$res.n.tweets/200,

pch = 19, col = alpha(all.ger.cit$plot.col, .7))

# same for only election tweets

all.ger.cit$n.tweets.btw <- apply(all.ger.cit, 1, FUN = function (row) {

row.lat <- as.numeric(row["lat"])

row.lon <- as.numeric(row["long"])

f.tweets <- find.tweets(slat=row.lat, slon=row.lon, twit.dat=btw, tol=0.03)

nrow(f.tweets)

})

mod2 <- lm(n.tweets.btw ~ pop, data = all.ger.cit, na.action = "na.exclude")

all.ger.cit$res.n.tweets.btw <- residuals(mod2)

all.ger.cit$plot.col.btw <- ifelse(all.ger.cit$res.n.tweets.btw > 0, "orange", "blue")

plot(getMap(resolution="low"), xlim = c(5,15), ylim = c(47,55))

points(x=all.ger.cit$long, y=all.ger.cit$lat, cex = all.ger.cit$res.n.tweets.btw/10,

pch = 19, col = alpha(all.ger.cit$plot.col.btw))

# get day of elections

critical.day <- btw[btw$day == 22,]

critical.day.all <- twit[twit$day == 22,]

# plot barplot

par(mar = c(3,3,2,0))

barplot(table(as.numeric(critical.day.all$hour)), col = "blue", border = "blue", space = 0.2, cex.names = 0.6) -> bp

barplot(table(as.numeric(critical.day$hour)), add = T, col = "yellow", border = "yellow", names.arg = "", xaxt = "n",

yaxt = "n")

legend(x = "topleft", fill = c("blue", "yellow"), legend = c("all", "btw13,wahl,wähl"), bty = "n", border = F)

abline(v = bp[17,1] - ((bp[17,1] - bp[16,1]) / 2), lwd = 5, lty = 1)

**Thanks for visiting r-craft.org**

This article is originally published at https://rcrastinate.blogspot.com/

Please visit source website for post related comments.