Pages Menu
TwitterRss
Categories Menu

Posted by on Apr 13, 2012 in London, Visualisation | 5 comments

The Twitter Languages of London

Last year Eric Fischer produced a great map (see below) visualising the language communities of Twitter. The map, perhaps unsurprisingly, closely matches the geographic extents of the world’s major linguistic groups. On seeing these broad patterns I wondered how well they applied to the international communities living in London. The graphic above shows the spatial distribution of about 470,000 geo-located tweets (collected and georeferenced by Steven Gray) grouped by the language stated in their user’s profile information*. Unsurprisingly, English is by far the most popular. More surprising, perhaps, is the very similar distributions of most of the other languages- with higher densities in central areas and a gradual spreading to the outskirts (I expected greater concentrations in particular areas of the city). Arabic (and Farsi) tweets are much more concentrated around the Hyde Park, Marble Arch and Edgware Road areas whilst the Russian tweeters tend to stick to the West End. Polish and Hungarian tweets appear the most evenly spread throughout London.

Even though the maps represent close to half a million tweets they are still based on a selective sample- they only include people who have a good location (either through GPS or a specific address) and those who are connected to the internet. I expect the latter requirement will exclude many short term visitors to London, and may explain why there aren’t so many hotspots around London’s landmarks (as is the case with Flickr where people can upload georeferenced images when they get home). In spite of this, I think the information in these maps is useful as a basis for comparison to other cities and it helps to reveal some of the finer patterns within the broad regions mapped by Fischer.

*this is slightly different to Eric Fischer’s method. He used Google’s translation tools to determine the language of each tweet whereas I have taken the stated language of each user because I am more interested in what users feel their preferred language is. I often see English tweeters post in French for example. Google also hasn’t quite mastered the slang or abbreviations that often crop up in Londoner’s tweets.

Share on Facebook
Bookmark this on Google Bookmarks
Share on reddit
Bookmark this on Digg
Share on StumbleUpon
Share on LinkedIn

5 Comments

  1. very interesting analysis

    I’m very skeptical on the high volume of Dutch and Scandinavian tweets, I suspect that’s Google’s classifier mis-classifiying English. There’s no way more people are tweeting in Norwegian in London than French, it just doesn’t stand up to common sense. In fairness language identification from such a short string isn’t easy.

    • Hi Ed,

      Thanks for the comment. I’m not using the google identification- this is the language the twitter api has provided for each user.It may be that the French tweeters are less keen to share their location than the Dutch/ Scandinavians?

      I’m not an expert though so would be grateful for more comments about the reliability of these data. From an academic perpective there has not been a lot of consideration of data issues and I think the commercial world is way ahead of us on this. Or maybe not?

      James

  2. The geography of a tweet is one thing – the geographic dispersion of it as in the location of its followers would also be interesting to study (if feasible?) – in order to distinguish the short term visitor from more sustained foreign language social networks in a place like London.

  3. This is a great read James but can I ask, how was the location of the tweets assigned? As far as I know, Twitter doesn’t have a setting function where you specify your precise location.

    In the case that location was assigned by IP address, this would not be 100% reliable as it is well known for IP addresses to vary wildly within a given region which could account for the similar dispersion of locations regardless of language.

    Apologies if the answer is implicit for those in the know, I just enjoy reading your posts and this one is quite interesting.

    Thanks

Trackbacks/Pingbacks

  1. Tecnologia, densità, parole: le mappe della settimana « Webcartografie - [...] Parole. Che lingue si parlano a Londra? Non soltanto l’inglese. Un’indagine sui messaggi di twitter pubblicata da Spatialanalysis. [...]
  2. Visualising Social Media « Big Data Toolkit - [...] from Twitter data and also James Cheshire, over at spatialanalysis.co.uk, and I have looked at the language distribution of …
  3. Data-Driven Urban Citizenship — The Pop-Up City - [...] of using the same data to pre-empt congestion and resolving them by diversifying traffic, geographic spread of different languages …
  4. Questions of Data Sovereignty -The laws of the city ? « Urban Choreography - [...] of using the same data to pre-empt congestion and resolving them by diversifying traffic, geographic spread of different languages used to tweet across …
  5. Twitter Languages of London « Another Word For It - [...] Twitter Languages of London by James Cheshire. [...]

Post a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>