In an announcement early this morning, Microsoft stated it is altering the policy of its search engine Bing with regard to retaining personal data collected from its users. Under a new policy that may take as long as 18 months to implement, the company says, it will anonymize its data logs after only six months of retention rather than the previous 18, in compliance with a European Union directive — perhaps the first major US-based search engine to comply.
In April 2008, the European Union’s advisory body of member states’ telecommunications ministers, called the Article 29 Working Party, advised that search engines doing business in their region should anonymize the personally identifiable data they’ve collected on their users after a six-month period. The theory here has been, after six months’ time, search engines won’t be able to make much use of them to refine the efficacy of current queries — the original excuse for Google, Yahoo, and Microsoft to keep such data for longer periods.
Google responded the following September with a promise to anonymize its server logs after nine months rather than the 18- to 24-month period it had previously observed. Though it’s still not quite six months, the Working Party’s advice isn’t exactly law, and Google isn’t in violation. However, researchers including at Harvard University soon learned that “anonymize” did not mean “delete:” Google’s plan for now is to first remove just the rightmost eight bits of the IPv4 address.
As Christopher Soghoian, a student fellow at Harvard’s Berkman Center for the Internet and Society, observed at the time, Google (or anyone else) could simply match the IP addresses from currently retained cookie data, with the cookies applying to records with the partly-deleted IP addresses, to simply re-create the deleted 8 bits.
As a Microsoft spokesperson confirmed to Betanews this afternoon, Bing’s policy, effective at the time the new policy is officially implemented, is to completely delete all 24 bits of the IPv4 address after six months of retention. In addition, Bing plans to begin recording IP addresses associated with cookies using one-way encryption. The intention here, the company says, is “to prevent the search data from being connected to personally identifying information.” Apparently, this process should also disable Bing from reconstructing the IP address using still-retained cookie data.
Two years ago, Google publicly argued with Article 29 over the context of “personally identifiable data.” Since a person and his computer may be parted from one another, Google proposed, an IP address should not necessarily be construed as personally identifiable. The Working Group vehemently disagreed, producing a broad and unprecedented definition seemingly in direct defiance of Google: If data can be used in the reconstruction of other personally identifiable data, then it’s essentially the same as personally identifiable data, says Article 29.
By that broader definition, Google and most other search engines (including Bing, prior to the policy shift) are in non-compliance with the directive. Since September 2008, Google has proffered the argument that having less data on hand gives Google less of its users’ personal data to protect: “While we’re glad that this will bring some additional improvement in privacy, we’re also concerned about the potential loss of security, quality, and innovation that may result from having less data,” wrote Google global privacy counsel Peter Fleisher at the time. “As the period prior to anonymization gets shorter, the added privacy benefits are less significant and the utility lost from the data grows.”
The Article 29 Working Group has yet to issue a response to Microsoft’s policy shift. Meanwhile, Ask.com is unique among search engines in its ability to let users anonymize their own searches as they happen, by means of a depersonalization option called AskEraser. However, Ask.com states that it cannot be held responsible for data collected by third parties whose Web sites were located through Ask.com, with AskEraser turned on or off.