September 9th, 2008 | Published in Google Public Policy
(Cross-posted from the Official Google Blog)
Today, we're announcing a new logs retention policy: we'll anonymize IP addresses on our server logs after 9 months. We're significantly shortening our previous 18-month retention policy to address regulatory concerns and to take another step to improve privacy for our users.
Back in March 2007, Google became the first leading search engine to announce a policy to anonymize our search server logs in the interests of privacy. And many others in the industry quickly followed our lead. Although that was good for privacy, it was a difficult decision because the routine server log data we collect has always been a critical ingredient of innovation. We have published a series of blog posts explaining how we use logs data for the benefit of our users: to make improvements to search quality, improve security, fight fraud and reduce spam.
Over the last two years, policymakers and regulators -- especially in Europe and the U.S. -- have continued to ask us (and others in the industry) to explain and justify this shortened logs retention policy. We responded by open letter to explain how we were trying to strike the right balance between sometimes conflicting factors like privacy, security, and innovation. Some in the community of EU data protection regulators continued to be skeptical of the legitimacy of logs retention and demanded detailed justifications for this retention. Many of these privacy leaders also highlighted the risks of litigants using court-ordered discovery to gain access to logs, as in the recent Viacom suit.
Today, we are filing this response (PDF file) to the EU privacy regulators. Since we announced our original logs anonymization policy, we have had literally hundreds of discussions with data protection officials, government leaders and privacy advocates around the world to explain our privacy practices and to work together to develop ways to improve privacy. When we began anonymizing after 18 months, we knew it meant sacrifices in future innovations in all of these areas. We believed further reducing the period before anonymizing would degrade the utility of the data too much and outweigh the incremental privacy benefit for users.
We didn't stop working on this computer science problem, though. The problem is difficult to solve because the characteristics of the data that make it useful to prevent fraud, for example, are the very characteristics that also introduce some privacy risk. After months of work our engineers developed methods for preserving more of the data's utility while also anonymizing IP addresses sooner. We haven't sorted out all of the implementation details, and we may not be able to use precisely the same methods for anonymizing as we do after 18 months, but we are committed to making it work.
While we're glad that this will bring some additional improvement in privacy, we're also concerned about the potential loss of security, quality, and innovation that may result from having less data. As the period prior to anonymization gets shorter, the added privacy benefits are less significant and the utility lost from the data grows. So, it's difficult to find the perfect equilibrium between privacy on the one hand, and other factors, such as innovation and security, on the other. Technology will certainly evolve, and we will always be working on ways to improve privacy for our users, seeking new innovations, and also finding the right balance between the benefits of data and advancement of privacy.