Horizon Nigh: Why does Google retain data? Because nonexistent laws tell it to

Monday, May 14, 2007

❖

Why does Google retain data? Because nonexistent laws tell it to

Snarky headline, Mr Anderson! But with some non-sequiturs in the article.

Two months ago, Google announced a plan to anonymize its logs … [so that] it should be impossible to link search queries up with individual users. Of course, this is what AOL researchers thought … but queries often turn out to be … the sort of things that can eventually be used to identify individuals.

Okay, okay, good, okay, with you so far.

[Improving search results] Sounds good—though it’s not clear why this couldn’t be done just as well with anonymous data.

Wait, waiiit. I thought you just said they were going to anonymise data?

The real issue here is whether, as in AOL’s case, each user will be assigned an ‘anonymous’ ID number that is not by itself traceable back to a particular user–but with the content of queries, can be used for identification. OR, whether the queries will just be stored with no additional information, meaning that it would not be possible to determine whether any two given queries were performed by the same user or not. Only the latter is true anonymity. Which is Google using? Mr Anderson does not seem to know, or perhaps just does not care to explain the issue.

I wrote several papers about related topics in college, so the subject is of some interest to me.

Mr Anderson’s end point–that Google is hiding behind the government while actually ‘complying’ more than it needs to–is valid, but the reporting is sloppy. I only point this out because Ars Technica is usually very good.