Last week I attended a “Beyond 2011” Census event organised by the Prof. Dave Martin and the Office for National Statistics (ONS). The attendees came from central and local government, private companies that utlise census data, and a few universities. The majority there (based on an approximate straw poll) believed that there would not be a census “in its current form” in 2021. This is because census data collection is a very costly exercise and its results are becoming more out of date more quickly as the pace of societal change (through migration, for example) increases.

So, we need to find an alternative but from where? Census data collection has been taking place for centuries and has, until relatively recently,  been our only source of “Big Data” concerning the population characteristics of the UK. Why then, when we have finally achieved the computational and analytical capabilities to efficiently analyse it are we prepared to throw it all away? The reason relates to the fact that we have hundreds of big data sources available now and, if used properly,  we can potentially generate the same levels of insight (and more!) as those provided by a traditional census form.

We live in an era of "Big Data" as Techcrunch (click image) explains

For example the 2011 Census only accurately recorded where people live, not where they work. What good is this for emergency planning in central London? Why not use Oyster Card trip data, or mobile phone usage instead to give daily updates on population movement? Another issue is the fact that we have a more mobile population and we miss a lot of change by only taking decennial snapshots. Other government datasets are updated more often, such as the NHS Patient Register or the Electoral Roll, and can provide an indication of where we are living this year, rather than where we lived 5-10 years ago (I moved house 4 times between censuses). We also, and perhaps controversially, hand over loads of personal data every-time we use a store-card or credit card, log into Facebook, use our mobile phone or surf the web. This has contributed, perhaps for the first time, to many private companies having a much better idea about aspects of the population of the UK than the government. The question is do we want companies to share it for the greater good (or evil depending on what you think of the “big brother” state), or should we let them keep the data and have the government spend more to source it itself? We also have to be sure to count those who don’t feature on private company databases (put crudely, often because they aren’t worth anything to that company) and it is these groups, often the most vulnerable in society, that we are most likely to miss with a non-census solution.

I think there is more than enough data to go around without having to fill in lengthy census forms, the issue is we haven’t worked out how to join it all together yet. Once we solve that problem we then need to work out who we have missed and that is much harder to do without a compulsory census!

If you want to contribute to this debate please fill in the consultation documents. 




  1. It’s a really interesting issue about the future of the census. For my own field of transport geography and urban planning the census is a fantastic resource of where people live, work and how they travel. No other dataset even comes close in terms of comprehensiveness, availability and data integration. For example Oyster Card data covers a small section of the population in one city with no linked data, compared to national coverage of all journey-to-work trips and a host of linked socio-demographic measures in the census.

    So I’m definitely in the keep the census camp, and hope the discussion will switch to how we can do a 2021 census on a much smaller budget (e.g. online only).

  2. I find it a shame that the people doing transport have so little data on how even single junctions are used, let alone how cities work -and what datasets they have manually measured are invariably out of date, because the cost of collection is so high.

    I have the Bath Bluetooth Experiment dataset (see “Instrumenting the City”), but can’t share it. I do plan to write up my findings at some point, as it is the best dataset collected on how people walk round a city. Two years of statistics from 9+ locations.

    From a traffic perspective, Census data is noise. Datamining the cities: ANPR, oyster, instrumented Taxis (see the NY datasets here), are the future.

  3. Jon

    Although I routinely advocate the use of ‘big data’ to address latency issues in the census, the census remains the ‘ground truth’ of life in the UK and it remains the gold standard because of its comprehensive coverage and unified, standard perspective on the population.

    That said, I don’t understand why we haven’t started moving to more flexible approaches based around sampling. Surely we know enough now about the spatial issues associated with generating stratified samples that it should be necessary neither to contact *every* household, nor to do so only *once* every ten years?

    And this *is* somewhere that I think the ‘big data’ approach can also be usefully brought to bear: linking back to the latency issue, we can continuously sample areas over time using indirect metrics (e.g. calling intensity to different countries or number of handsets) and then identify areas where there is significant change in these metrics as candidates for additional sampling.

    So we would interpret ‘no change’ to mean that the situation in a given area is relatively stable/static, and ‘change’ to mean that something has happened to change an area’s makeup (whether it’s in terms of population, employment, or whatever). We would target data collection resources on the latter group and save on the former.

    In an *ideal* world, you would assume that if you had enough sources of data (e.g. comms usage, power usage, water usage, etc.) then you’d be less likely to overlook an area with ‘informal’ activity (immigration, work, etc.) that is flying below the census radar.

  4. I’m curious to find out what blog platform you are utilizing? I’m experiencing some minor security
    issues with my latest site and I would like to find something more secure.

    Do you have any suggestions?

Comments are closed.