Pages Menu
TwitterRss
Categories Menu

Posted by on Jun 10, 2013 in R Spatial, Resources | 13 comments

Where is the R Activity?

r_activity

R has become one of the world’s most widely used statistics and visualisation software packages with an ever growing user community. Thanks to the release of log files containing all hits to http://cran.rstudio.com/ server it is possible to make a map showing the parts of the world with the most active R users (specifically those mostly using the RStudio interface). The USA comes top with 3,045,960 requests to the server between October 2012 and June 2013. Japan is in 2nd place with a mere 756,177 requests and Germany 3rd. In all 203 countries appear in the server logs. I have scaled the map according to the number of server requests made and you can clearly see the dominance of Japan, Europe and North America compared with other parts of the world, especially Africa. The map of course isn’t a perfect representation of the number of R users, as you could have one or two people making hundreds of server requests a day versus a large number of people only making a couple. This is why I have entitled the map “Activity” rather than “Users”.  Either way R hasn’t quite achieved global domination but it is getting there…

To create the map I obtained the files following the instructions on the logs download page. I then combined them with the following code (take from here):
setwd("XXX") #this needs to be the directory with the downloaded files in it.
file_list <- list.files()

for (file in file_list){

# if the merged dataset doesn't exist, create it
if (!exists("dataset")){
dataset <- read.csv(file, header=TRUE)
}

# if the merged dataset does exist, append to it
if (exists("dataset")){
temp_dataset <- read.csv(file, header=TRUE)
dataset<-rbind(dataset, temp_dataset)
rm(temp_dataset)
}
print(file)
}

It is then possible to aggregate the data to get the number of requests per country.

dataset$flag<- 1
counts<- aggregate(dataset$flag, by=list(dataset$country), sum)
names(counts)<- c("country", "count")

The next step was to download a world shapefile (containing the country borders) from Natural Earth. This contains the country codes used in the log file (the dataset object above). We can open this file with the maptools package:

library(maptools)
world<-readShapePoly("yourworldshapefile")

It is then possible to join our counts object to the world object to assign the log counts to each country based on the "iso_a2" and "country" fields respectively. The new shapefile is also saved.

world@data = data.frame(world@data, counts[match(world@data[,"iso_a2"], counts[,"country"]),])
writePolyShape(world, "world_r_use.shp")

This next bit is a bit of a cheat as I used the ScapeToad software to create the cartogram. A package exists to do this in R but I find ScapeToad to be more powerful. You can download the shapefile I produced from here. I have then reloaded the new shapefile into R and used the basic plot functions to produce the map.

cartogram<-readShapePoly("world_r_carto.shp")

plot(cartogram)
title(main="R Activity Around the World", sub="Based on cran.rstudio.com Activity Logs October 2012-June 2013")

This is my first stab at looking at the data - there is a lot more that can be done with it!

Share on Facebook
Bookmark this on Google Bookmarks
Share on reddit
Bookmark this on Digg
Share on StumbleUpon
Share on LinkedIn

13 Comments

  1. Nice. Can I use this on the graph gallery ?

    • Sure. Glad you like it.

  2. Fascinating work

    Can you let me now the line(s) of code you got to download the files from the urls set on the CRAN page.
    I’m getting errors with download.file

    Tx

    • The error was because the Rstudio code had
      ‘http//cran-logs.rstudio.com

      instead of ‘http://cran-logs.rstudio.com

      Hadley has been alerted so should be corrected soon

  3. there is more R-Activity in Alaska than in Europe ? (or in India / China ?)

    • This is because the IP addresses are registered to countries so Alaska gets distorted in the same way as the rest of the US. It’s not perfect I know, just a fund first visualisation with these data..

  4. Hi James
    Nice map but shouldn’t it be normalised in some way? As it stands its really just a choropleth based on absolute numbers. Informative enough but … Note that NZ, home of R, doesn’t seem to get a look in.

    Dave

    • I agree it always makes sense to normalise but I am not sure what a good denominator would be? I think per head of population would be a bit odd – we need a rough estimate of statisticians in each country…

      • Yes, the denominator problem seems to me to be a very serious one in any area/value mapping and one that most people, including otherwise sensible statisticians don’t seem to ‘get’. As a first approximation one might argue that #statisticians is proportionate to #population, so this would be vaguely appropriate?

        dave

    • Hey Dave, look at the relative sizes of NZ and Australia——NZ is not doing too badly, actually.

      • Many thanks, I see what you mean and in fact know the NZ stats/geog scene quite well having worked in Hamilton and in ChCh. Given the history of R, I suspect that as a proportion of the ‘at risk’ population NZ will have much more R based activity and even noting your point, the map doesn’t show this. Compare it with Danny’s worldmapper cartograms of country population. As James indicates the denominator problem isn’t trivial and I am prepared to argue that deriving a sensible one is probably the key to effective choropleth mapping. Many cartographers will go so far as to argue that non-normalised choropleths should never be drawn. But I guess you know that already. I suspect that adding a second variable (colour?) to these univariate cartograms is probably the best way to use them in geovis work.
        dave

  5. Can you give an example of how to use the download.file command?

Trackbacks/Pingbacks

  1. Answering “How many people use my R package?” | R-statistics blog - [...] out, and the R blogosphere started buzzing with action: James Cheshire created a beautiful world map which highlights the …
  2. Top 100 R packages for 2013 (Jan-May)! | R-statistics blog - [...] since we know that the countries which uses R the most have these days as rest days (see James Cheshire’s world …
  3. Top 100 R packages for 2013 | spider's space - [...] since we know that the countries which uses R the most have these days as rest days (see James Cheshire’s world …
  4. On the RStudio download logs | Omnia sunt Communia! - [...] Where is the R activity? [...]
  5. My package worldmap! | Research Side Effects - [...] Last week, a colleague draw my attention on this new log files from the Rstudio cloud CRAN mirror, through …
  6. Not only CRAN downloads and Shiny … but also .. rCharts | PremierSoccerStats - [...] several blog posts including a Top 100 of 2013 with some nice graphs from Tal Galili and maps from …
  7. What are the top 100 (most downloaded) R packages in 2013? (from simple statistics) | Baker Chen - […] since we know that the countries which uses R the most have these days as rest days (see James Cheshire’s world …
  8. Analyzing Rstudio CRAN server download logs, and a start towards package recommendation | Stat Of Mind - […] showed how to track R package downloads, Tal Galili looked for the most popular R packages, and James Cheshire also created a map …

Post a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>