R / R News

Visualizing Arkansas traffic fatalities, part 4

by Nathan · October 25, 2016

This article is originally published at http://www.nathanchaney.com

This is the latest post in a series analyzing Arkansas traffic fatalities. Please take a look at parts 1 (a map of 2015 traffic deaths), 2 (heat maps of fatalities by day from 2000-2015), and 3 (heat maps of fatalities by day of week from 2000-2015) if you haven’t already.

Visualizations

Today’s post is probably my favorite of this series. It piggybacks off parts 2 and 3, in that we further explore the relationship of the time of day to traffic fatalities. The first set of visualizations maps the raw number of traffic fatalities in the US by the time of day. You can click to zoom the image. Each horizontal band represents year between 2000 and 2015. Each row within the band is a day of the week, and each vertical column represents an hour of the day. From left to right (or top to bottom on small devices), you have drunk driving fatalities, non-drunk driving fatalities, and total fatalities.

In this set of visualizations, we can clearly see two things. First, weekend evenings are very hazardous for drunk drivers. Second, we can see two distinct bands for morning and afternoon commutes for non-drunk-driving fatalities.

As I have with the earlier posts, I repeated the same analysis on Arkansas-specific wreck information. Again, the same trends appear to hold, although the bands aren’t as smoothly colored (that tells us the data is a little noiser due to fewer data points). Note that this scale is different than the nationwide set.

Code

We’ll be using the same FARS data we used in the previous two posts. Let’s set up our libraries, import the data into R, and get moving. For a more detailed explanation of what we’re doing here, please refer to part 2.

library(foreign)
library(ggplot2) # v2.1.0.9000
library(plyr)
library(zoo)

data.dir <= "/Path/to/my/data/dir/"

accidents_2015 <- read.dbf(paste(data.dir, "Data/FARS2015NationalDBF/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2014 <- read.dbf(paste(data.dir, "Data/FARS2014NationalDBF/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2013 <- read.dbf(paste(data.dir, "Data/FARS2013NationalDBF/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2012 <- read.dbf(paste(data.dir, "Data/FARS2012/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2011 <- read.dbf(paste(data.dir, "Data/FARS2011/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2010 <- read.dbf(paste(data.dir, "Data/FARS2010/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2009 <- read.dbf(paste(data.dir, "Data/FARS2009/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2008 <- read.dbf(paste(data.dir, "Data/FARS2008/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2007 <- read.dbf(paste(data.dir, "Data/FARS2007/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2006 <- read.dbf(paste(data.dir, "Data/FARS2006/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2005 <- read.dbf(paste(data.dir, "Data/FARS2005/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2004 <- read.dbf(paste(data.dir, "Data/FARS2004/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2003 <- read.dbf(paste(data.dir, "Data/FARS2003/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2002 <- read.dbf(paste(data.dir, "Data/FARS2002/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2001 <- read.dbf(paste(data.dir, "Data/FARS2001/accident.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]
accidents_2000 <- read.dbf(paste(data.dir, "Data/FARSDBF00/ACCIDENT.dbf", sep=""))[,c("STATE", "COUNTY", "HOUR", "DAY", "MONTH", "YEAR", "FATALS", "DRUNK_DR")]

accidents <- rbind(accidents_2015, accidents_2014, accidents_2013, accidents_2012, accidents_2011, accidents_2010, accidents_2009, accidents_2008, accidents_2007, accidents_2006, accidents_2005, accidents_2004, accidents_2003, accidents_2002, accidents_2001, accidents_2000)

# Subset Arkansas wrecks
# Comment out this line for nationwide analysis
accidents <- subset(accidents, STATE == 5)

Now, we need to clean the time of day data, as sometimes the midnight hour was entered as 0; other times as 24; and still other entries contained junk values like 99.

accidents <- subset(accidents, HOUR <= 24 & HOUR >= 0)
accidents$HOUR <- ifelse(accidents$HOUR == 24, 0, accidents$HOUR)

As we did with the other visualizations, we’ll need to add some date columns to determine the day of week and year.

# Add date column
accidents$date <- as.Date(paste(accidents$YEAR, accidents$MONTH, accidents$DAY, sep='-'), "%Y-%m-%d")

accidents <- transform(accidents,
week = as.numeric(format(date, "%U")),
day = as.numeric(format(date, "%d")),
wday = as.numeric(format(date, "%w"))+1,
month = as.POSIXlt(date)$mon + 1,
year = as.POSIXlt(date)$year + 1900)

Next, we’ll summarize the data by drunk/not drunk/all.

# Sum wrecks by drunk/not drunk/all
accidents_drunk <- accidents$DRUNK_DR > 0
accidents_not_drunk <- accidents$DRUNK_DR == 0
summary <- aggregate(FATALS ~ wday + HOUR + YEAR, accidents, sum)
summary_not_drunk <- aggregate(FATALS ~ wday + HOUR + YEAR, accidents, sum, subset=accidents_not_drunk)
summary_drunk <- aggregate(FATALS ~ wday + HOUR + YEAR, accidents, sum, subset=accidents_drunk)

data <- ddply(summary, .(wday, HOUR, YEAR), summarize, sum = sum(FATALS))
data_not_drunk <- ddply(summary_not_drunk, .(wday, HOUR, YEAR), summarize, sum = sum(FATALS))
data_drunk <- ddply(summary_drunk, .(wday, HOUR, YEAR), summarize, sum = sum(FATALS))

Let’s set our max and min so that we can use the same scale across all three plots.

max <- max(c(max(data$sum), max(data_not_drunk$sum), max(data_drunk$sum)))
min <- min(c(min(data$sum), min(data_not_drunk$sum), min(data_drunk$sum)))

Next, we’ll factor the days of week into human-readable format for each of the three data sets.

data$weekday<-factor(data$wday,levels=rev(1:7),labels=rev(c("S","M","T","W","Th","F","Sa")),ordered=TRUE)
data_not_drunk$weekday<-factor(data_not_drunk$wday,levels=rev(1:7),labels=rev(c("S","M","T","W","Th","F","Sa")),ordered=TRUE)
data_drunk$weekday<-factor(data_drunk$wday,levels=rev(1:7),labels=rev(c("S","M","T","W","Th","F","Sa")),ordered=TRUE)

Finally, we’re done wrangling the data. Let’s define a theme for the plots that’s consistent with the previous two posts.

# Theme definitions
heat_map_theme <- theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.spacing.x = unit(0, "points"),
panel.spacing.y = unit(1, "points"),
strip.placement = "outside",
strip.switch.pad.grid = unit(2,"points"),
strip.background = element_rect(fill="gray90", color=NA),
strip.text = element_text(color="gray5"),
axis.ticks = element_blank(),
axis.text.x = element_text(color="gray5", size=8),
axis.text.y = element_text(color="gray5", size=5),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.text = element_text(color="gray5"),
legend.title = element_text(color="gray5"),
plot.title = element_text(color="gray5", hjust=0.5),
plot.subtitle = element_text(color="gray5", hjust=0.5),
plot.caption = element_text(color="gray5", hjust=1, size=6),
panel.background = element_rect(fill="transparent", color=NA),
legend.background = element_rect(fill="transparent", color=NA),
plot.background = element_rect(fill="transparent", color=NA),
plot.margin = unit(c(0,0,0,0), "points"),
legend.key = element_rect(fill=alpha("white", 0.33), color=NA)
)

Now, we’ll simply plot each of the three datasets and save the results.

imagedir <- "/PATH/TO/YOUR/DIRECTORY/"

# Plot and save drunk data
ggplot(data_drunk, aes(HOUR, weekday)) +
geom_tile(aes(fill=sum), na.rm = TRUE) +
facet_grid(YEAR ~ ., drop = FALSE, switch="y") +
scale_fill_gradient(name="Fatalities", low="yellow", high="red", na.value = alpha("white", 0.25), limits=c(min,max)) +
scale_x_continuous(limits=c(-0.5,24.5), breaks=c(2.5,5.5,8.5,11.5,14.5,17.5,20.5), labels=c("0300","0600","0900","Noon","1500","1800","2100"), expand = c(0,0)) +
scale_y_discrete(position="left") +
labs(title = "2000-2015 Traffic Fatalities, Nationwide", x="", y="", subtitle="by Time of Day (drunk driving only)", caption = "(based on data from NHTSA FARS: ftp://ftp.nhtsa.dot.gov/fars)") +
heat_map_theme

filename <- paste(c(imagedir, "2000-2015_fatalities_calendar_TOD (AR, drunk).png"), collapse="")
ggsave(filename, bg = "transparent")

# Plot and save not drunk data
ggplot(data_not_drunk, aes(HOUR, weekday)) +
geom_tile(aes(fill=sum), na.rm = TRUE) +
facet_grid(YEAR ~ ., drop = FALSE, switch="y") +
scale_fill_gradient(name="Fatalities", low="yellow", high="red", na.value = alpha("white", 0.25), limits=c(min,max)) +
scale_x_continuous(limits=c(-0.5,24.5), breaks=c(2.5,5.5,8.5,11.5,14.5,17.5,20.5), labels=c("0300","0600","0900","Noon","1500","1800","2100"), expand = c(0,0)) +
scale_y_discrete(position="left") +
labs(title = "2000-2015 Traffic Fatalities, Nationwide", x="", y="", subtitle="by Time of Day (excludes drunk driving)", caption = "(based on data from NHTSA FARS: ftp://ftp.nhtsa.dot.gov/fars)") +
heat_map_theme

# Save PNG file
filename <- paste(c(imagedir, "2000-2015_fatalities_calendar_TOD (AR, not drunk).png"), collapse="")
ggsave(filename, bg = "transparent")

# Plot and save all data
ggplot(data, aes(HOUR, weekday)) +
geom_tile(aes(fill=sum), na.rm = TRUE) +
facet_grid(YEAR ~ ., drop = FALSE, switch="y") +
scale_fill_gradient(name="Fatalities", low="yellow", high="red", na.value = alpha("white", 0.25), limits=c(min,max)) +
scale_x_continuous(limits=c(-0.5,24.5), breaks=c(2.5,5.5,8.5,11.5,14.5,17.5,20.5), labels=c("0300","0600","0900","Noon","1500","1800","2100"), expand = c(0,0)) +
scale_y_discrete(position="left") +
labs(title = "2000-2015 Traffic Fatalities, Nationwide", x="", y="", subtitle="by Time of Day", caption = "(based on data from NHTSA FARS: ftp://ftp.nhtsa.dot.gov/fars)") +
heat_map_theme

# Save PNG file
filename <- paste(c(imagedir, "2000-2015_fatalities_calendar_TOD (AR, all).png"), collapse="")
ggsave(filename, bg = "transparent")

Conclusion

I said at the beginning that this was probably my favorite of the three sets of visualizations. Do you agree with me that this set of visualizations is the most informative about when traffic fatalities occur?

Edit 7/29/17: I noticed the title of the Arkansas images erroneously said Nationwide. I corrected this mistake.

Thanks for visiting r-craft.org
This article is originally published at http://www.nathanchaney.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Visualizing Arkansas traffic fatalities, part 4

You may also like...

Categories

Visualizing Arkansas traffic fatalities, part 4

Visualizations

Code

Conclusion

You may also like...

How to make a Seaborn scatter plot

Advent of 2022, Day 23 – Working with R

Explaining Predictions of Machine Learning Models with LIME – Münster Data Science Meetup

Categories