Improving R Data Visualisations Through Design
When I start an R class, one of my opening lines is nearly always that the software is now used by the likes of the New York Times graphics department or Facebook to manipulate their data and produce great visualisations. After saying this, however, I have always struggled to give tangible examples of how an R output blossoms into a stunning and informative graphic. That is until now…
I spent the past year working hard with an amazing designer – Oliver Uberti – to create a book of 100+ maps and graphics about London. The majority of graphics we produced for London: The Information Capital required R code in some shape or form. This was used to do anything from simplifying millions of GPS tracks, to creating bubble charts or simply drawing a load of straight lines. We had to produce a graphic every three days to hit the publication deadline so without the efficiencies of copying and pasting old R code, or the flexibility to do almost any kind of plot, the book would not have been possible. So for those of you out there interested in the process of creating great graphics with R, here are 5 graphics shown from the moment they came out of R to the moment they were printed.
This graphic shows the origin-destination flows of commuters in Southern England. In R I used the
geom_segment() command from the brilliant ggplot2 package to draw slightly transparent white lines between the centroids of the origins and destinations. I thought my R export looked pretty good on black, but we then imported it into Adobe Illustrator and Oliver applied a series of additional transparency effects to the lines to make them glow against the dark blue background (a colour we use throughout the book).
This is a crop from a graphic we produced to show the differences between the daytime and nighttime population of London (we are showing nighttime here). It copies the code I used to produce my Population Lines print, but Oliver went to the effort of manually cleaning the edges of the plot (I couldn’t work out how to automatically clip the lines in ggplot2!) by following the red-line I over-plotted. Colours were tweaked and labels added, all in Illustrator.
One of my favourite graphics in the book shows the number if pieces of work by each artist in the Tate galleries. We can only show a small section here, but full-sized it looks spectacular as it features a Turner painting at its centre. The graphic started life as a treemap that simply scaled the squares by the number of artists. R has a very easy to use
treemap() function in the treemap package. Oliver then painstakingly broke the exported graphic to bits, converted the squares to picture frames and arranged them on “the wall”.
This map, showing cyclists in London by time of day, was created from code similar to this graphic. It is an example where very little needed to be done to plot once exported – we only really needed to add the River Thames (this could have been done in R), some labels and then optimise the colours for printing. Hundreds of thousands of line segments are plotted here and the graphic is an excellent illustration of R’s power to plot large volumes of data.
The graphic above (full size here) has been the most popular from the book so far. It takes 2011 Census data and maps people by marital status as well as showing the absolute numbers as a streamgraph. ggplot2 was used to create both the maps and the plot. We stuck to the exported colours for the maps and then manually edited the streamgraph colours. The streamgraph was created with the
geom_ribbon() function in ggplot2.
All the graphics shown so far started life as databases containing, as a minimum, several thousand rows of data. In this final example we show a “small data” example – the lives of 100 Londoners who have earned a blue plaque on one of London’s buildings. The data were manually compiled with each person having 3 attributes against their name: the age they lived to, the age when they created their most significant work, the period of their life they lived in London. Thanks to ggplot2 I was able to use the code below to generate the coarse looking plot above. Oliver could then take this and flip it before restyling and adding labels in Illustrator. They key thing here was that a couple of lines of code in R saved a day of manually drawing lines.
#We order by age of when the person started living in London, this is the order field.
ggplot(Data,aes(order,origin))+geom_segment(aes(xend=order, yend=Age))+geom_segment(aes(x=order,y=st_age, xend=order, yend=end_age), col="red")+geom_segment(aes(x=order,y=st_age2, xend=order, yend=end_age2), col="yellow")+ coord_polar()