As if the R world needed another example of Twitter visualizations, right? Well, here we go anyway.
At the beginning of 2013, Pablo Barberá released the first version of his R package 'streamR' (CRAN link). With this package, you can tap into the streaming capabilities of the Twitter API. I did so for 10 consecutive days. Luckily, one of those days was September, 22nd - the day of the 'Bundestagswahl' (parliamentary elections) in Germany.
I've decided to put the code at the end of this post. If you want to try things out, you can find the code by scrolling to the bottom. Please note, that you may not be able to replicate all of this because you do not have the Twitter stream data available.
So, let's first visualize all tweets between September, 17th and September 27th on a map. (As always: Please click on the image to enlarge it).
We color all tweets containing "btw" (the hash tag for the elections was #btw13), "wahl" or "wähl" in yellow. "wähl" is the stem of the German verb for "elect" or "choose".
Some of you may know a certain webcomic by Randall Munroe. In this post about geographic profile maps, he made an important point. And it also applies to this scenario. Let's look at another map with the sizes of all German cities with a population of over 100,000 people visualized as circles (the greater the radius of the circle, the greater the population of the city - proportions of circles are the same as the proportions of population sizes).
So, unsurprisingly, more people tweet more and the more tweets there are, the more tweets are about the elections. The following picture shows only tweets about the elections and city sizes.
So, what if we try to partial out city size? To do so, I use a set of German cities (taken from the dataset world.cities) and iterate through it. For every city, I look for tweets that originated within a 'box' around this city. This 'box' is defined by 0.03 units longitude and latitude (this value can be set in the function 'find.tweets' finding tweets around each city). Then, I take the population size of each city and predict number of tweets by it. I then use the residuals of this model for visualization. These residuals can be interpreted as 'number of tweets with city size partialled out'. Have a look for all tweets in the 10 days.
This is quite interesting. Berlin, the capital of Germany, and sometimes thought of as the 'center of innovation' in Germany does not seem to tweet more than would be expected given its population size. But what is going on with Greifswald in the north-east?!? The answer is quite simple: There is a weather station in Greifswald tweeting weather stats every 5 minutes! So, let's exlude the weather station and do the residualization stuff again:
And now, let us do the same, but only with tweets concerning the elections.
Now, we will overlay the last to maps.
Please note, that the sizes of orange circles (residual number of tweets about elections) and red circles (residual number of all tweets) cannot be prepared with one another. I.e., the size of the orange circle within a red circle does not symbolize the number of tweets about the elections compared to all tweets. Red circles can be compared with red circles, orange circles can be compared with orange circles.
Now, in a last plot, I want to visualize the number of tweets during the day of the elections. Citizens were allowed to vote till 18:00 h (marked with a black line). The number of all tweets per hour mentioning 'btw13', 'wahl' and 'wähl' is visualized by yellow bars. The percentage of tweets about the elections can thus be seen by the relation between blue and yellow bars. It's quite nice to see how tweets mentioning the critical patterns rise in numbers and percentage just before the end of elections (I looked into some of them, many of them are motivations to go voting). As soon as polling stations closed, number of tweets rose while tweets about the elections went down again. Most people seemed to mind other business as soon as the elections were over. Another possibility is that after polling stations closed, tweets about the elections concentrated on commenting on the results and not on the process of voting itself.
So much for now. I will post the R code used for this post as soon as I tidied it up a little :)
EDIT: Code now available.
load(<dataset with tweets>) # variable name 'twit'
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.