*Cross-posted from London: The Information Capital’s website*
Whenever I teach an R class, I usually start by saying how the software is now used by the likes of The New York Times graphics department or Facebook to manipulate their data and produce great visualisations. However, I have always struggled to give tangible examples of how an R output can blossom into a stunning and informative graphic. That is, until now.
I spent the past year working hard with an amazing designer – Oliver Uberti – to create this book of 100+ maps and graphics about London. The majority of graphics we produced for London: The Information Capital required R code in some shape or form. This was used to do anything from simplifying millions of GPS tracks, to creating bubble charts or simply drawing a load of straight lines. We had to produce a graphic every three days to hit the publication deadline, so without the efficiencies of copying and pasting old R code, or the flexibility to do almost any kind of plot, the book would not have been possible. For those of you out there interested in the process of creating great graphics with R, here are 6 graphics shown from the moment they came out of R to the moment they were printed.
This graphic shows the origin-destination flows of commuters in Southern England. In R I used the geom_segment() command from the brilliant ggplot2 package to draw slightly transparent white lines between the centroids of the origins and destinations. I thought my R export looked pretty good on black, but we then imported it into Adobe Illustrator and Oliver applied labels and a series of additional transparency effects to the lines to make them glow against the dark blue background (a colour we use throughout the book).
This is a crop from a graphic we produced to show the differences between the daytime and nighttime population of London (we are showing nighttime here). It copies the code I used to produce my Population Lines print, but Oliver went to the effort of manually cleaning the edges of the plot (I couldn’t work out how to automatically clip the lines in ggplot2!) by following the red-line I included. He then tweaked the colours and added labels in Illustrator.
One of my favourite graphics in the book shows the number of pieces of work by each artist in the Tate galleries. We can only show a small section here, but full-sized it looks spectacular as it features a J.M.W. Turner painting at its centre. The graphic started life as a basic treemap that simply scaled rectangles by the number of works each artist has in the Tate. R has a very easy to use treemap() function in the treemap package. Oliver then painstakingly broke the exported graphic to bits, transformed the squares into picture frames and sculptures and arranged them salon-style in “the gallery”.
This map, showing cyclists in London by time of day, was created from code similar to this graphic. It is an example where very little needed to be done to the plot once exported – we only really needed to add the River Thames (this could have been done in R), some labels and then optimise the colours for printing. Hundreds of thousands of line segments are plotted here, making the graphic an excellent illustration of R’s power to plot large volumes of data.
The graphic above (full size here) has been the most popular from the book so far. It takes 2011 Census data and maps people by marital status as well as showing the absolute numbers as a streamgraph. ggplot2 was used to create both the maps and the plot. We kept the exported colours for the maps but manually edited the colours on the streamgraph. The streamgraph was created with the geom_ribbon() function in ggplot2.
All the graphics shown so far started life as databases containing, as a minimum, several thousand rows of data. In this final example we show a “small data” example: the lives of 100 Londoners who have earned a blue plaque on one of London’s buildings. Oliver manually compiled the data, including 3 attributes for each name: the age they lived to, the age when they created a defining work, the period of their life commemorated by the blue plaque. Thanks to ggplot2, I was able to use the code below to generate the coarse looking plot above. Oliver then took this and flipped it before restyling and adding labels in Illustrator.
#We order by age of when the person started living in London, this is the order field.
ggplot(Data,aes(order,origin))+geom_segment(aes(xend=order, yend=Age))+geom_segment(aes(x=order,y=st_age, xend=order, yend=end_age), col=”red”)+geom_segment(aes(x=order,y=st_age2, xend=order, yend=end_age2), col=”yellow”)+ coord_polar()
As was the case for many graphics in our book, the key thing here was that a couple of lines of code in R saved a day of manually drawing lines.