Private data in public ways

Last night, I attended a fascinating series of presentations on data visualization as part of London's Big Data Week. Put on by a number of tech evangelists and companies around the world, it's part of a global series of talks and hackathons focusing on data-driven innovation.

In the final session, Dr. James Cheshire talked about mapping London, overlaying various kinds of data, from bike commuters to Tube riders, atop maps of the city. One in particular intrigued me, because it shows the blurring of public and private information, and in doing so, the challenges involved in legislating data usage.

His project to map the most common last names of Londoners atop the city's geography is revealing. It's innocent enough data, and surnames are easily available and open. But this is England, and there are limits to what we can do with information.

The UK is more of a nanny state than it was the last time I visited—under new legislation, mobile Internet users need to opt in to "mature" content (which, as I found out, includes otherwise innocent components of pages.) One of my cousins, who works for the county, told me she's not allowed to ask prospective hires how many years' experience they have, because it's considered age-discriminatory.

I imagine that a survey of Londoners by racial profile would be similarly controversial and frowned upon; at the very least, collecting the data would raise a lot of eyebrows, and be subject to some laws. But an analysis of last names—Smith versus Singh, for example—is trivial. Look at the map above. Want to live with your fellow Irishmen? It's pretty clear where you should look for houses.

The lesson here is that it's hard to legislate and govern data. For everything we ban or try to stuff back in the genie's bottle, something else acts as a proxy. In this case, last names and races act as an otherwise-innocent lookup table. Sometimes big data isn't about the bigness; it's about the ease with which disparate data sets can be mined and linked. Nobody "owns" this information, but it can be put to all kinds of uses that people might not like.

(Sidenote: thanks to Stewart Townsend for setting up the week, and a horde of others for presenting and participating. Great to see the Big Data movement flourishing this side of the pond and to get a sense of the burgeoning community that crosses a wide range of disciplines.)