The Twitter Languages of London

Last year Eric Fischer produced a great map (see below) visualising the language communities of Twitter. The map, perhaps unsurprisingly, closely matches the geographic extents of the world’s major linguistic groups. On seeing these broad patterns I wondered how well they applied to the international communities living in London. The graphic above shows the spatial distribution of about 470,000 geo-located tweets (collected and georeferenced by Steven Gray) grouped by the language stated in their user’s profile information*. Unsurprisingly, English is by far the most popular. More surprising, perhaps, is the very similar distributions of most of the other languages- with higher densities in central areas and a gradual spreading to the outskirts (I expected greater concentrations in particular areas of the city). Arabic (and Farsi) tweets are much more concentrated around the Hyde Park, Marble Arch and Edgware Road areas whilst the Russian tweeters tend to stick to the West End. Polish and Hungarian tweets appear the most evenly spread throughout London.

Even though the maps represent close to half a million tweets they are still online cipro based on a selective sample- they only include people who have a good location (either through GPS or a specific address) and those who are connected to the internet. I expect the latter requirement will exclude many short term visitors to London, and may explain why there aren’t so many hotspots around London’s landmarks (as is the case with Flickr where people can upload georeferenced images when they get home). In spite of this, I think the information in these maps is useful as a basis for comparison to other cities and it helps to reveal some of the finer patterns within the broad regions mapped by Fischer.

*this is slightly different to Eric Fischer’s method. He used Google’s translation tools to determine the language of each tweet whereas I have taken the stated language of each user because I am more interested in what users feel their preferred language is. I often see English tweeters post in French for example. Google also hasn’t quite mastered the slang or abbreviations that often crop up in Londoner’s tweets.


  1. Ed Freyfogle

    very interesting analysis

    I’m very skeptical on the high volume of Dutch and Scandinavian tweets, I suspect that’s Google’s classifier mis-classifiying English. There’s no way more people are tweeting in Norwegian in London than French, it just doesn’t stand up to common sense. In fairness language identification from such a short string isn’t easy.

    1. James Author

      Hi Ed,

      Thanks for the comment. I’m not using the google identification- this is the language the twitter api has provided for each user.It may be that the French tweeters are less keen to share their location than the Dutch/ Scandinavians?

      I’m not an expert though so would be grateful for more comments about the reliability of these data. From an academic perpective there has not been a lot of consideration of data issues and I think the commercial world is way ahead of us on this. Or maybe not?


  2. Jakob

    The geography of a tweet is one thing – the geographic dispersion of it as in the location of its followers would also be interesting to study (if feasible?) – in order to distinguish the short term visitor from more sustained foreign language social networks in a place like London.

  3. This is a great read James but can I ask, how was the location of the tweets assigned? As far as I know, Twitter doesn’t have a setting function where you specify your precise location.

    In the case that location was assigned by IP address, this would not be 100% reliable as it is well known for IP addresses to vary wildly within a given region which could account for the similar dispersion of locations regardless of language.

    Apologies if the answer is implicit for those in the know, I just enjoy reading your posts and this one is quite interesting.


Comments are closed.