Google office logo showing database leak documents privacy breaches

Google Database Leak Documents 6 Years of Privacy Breaches

An internal Google document not meant for public view has been obtained by 404 Media, and it catalogs a long string of privacy breaches previously unknown outside of the company. As the reporters note, each of the items in the database leak is small and deals with incidents that may have involved just one individual. But there are thousands in total, spanning six years of company history.

Google database leak reveals years of unreported privacy incidents

As a whole, the database leak does not disclose systemic large-scale wrongdoing by Google. But it does demonstrate how frequently individual privacy breaches seem to happen, and in a broad variety of the company’s operations ranging from Street View to YouTube. It also documents vulnerabilities that Google uncovered in assorted third-party vendors.

The database leak contains incidents flagged as potential security or privacy breaches from 2013 to 2018. Google has verified to media sources that the leak is legitimate. The company explains that the database contains incidents flagged internally by employees of assorted departments, which are then sent to a reviewer who determines how to proceed. There are thousands of incidents in total in the database leak, but Google notes that some of these ultimately are not determined to be actual issues or are found to be a problem with a third party vendor or partner of some sort.

The incidents thus range from mundane procedural mistakes to serious privacy breaches and security incidents. One of the most serious involved an unspecified government agency that is a Google cloud storage client, which accidentally had its sensitive data transferred to a consumer product. Another involves a bug in a particular filter used by a speech service, which caused about 1,000 hours of audio of various children speaking to be recorded and stored. Google notes that the team involved eventually spotted the bug and deleted the data.

Many of the database leak stories do seem to have happy endings after some sort of initial “oopsie” occurs, but there are some incident types that appear multiple times and indicate potential ongoing problem areas. In 2016, Google Street View went on a tear of recording and storing license plate numbers (which are supposed to be filtered as its vehicles roam about and take their sporadic pictures). That was eventually caught and deleted, but not before a “database of geolocated license plate numbers and license plate number fragments” had formed.

YouTube also seemed to have a consistent string of privacy breaches during this period. Some of these were accidents, such as a private video uploaded by Nintendo being made public ahead of a planned launch announcement. Another involved a third party contractor with admin access, who abused it to swap their own affiliate tracking codes into video descriptions and channel information. One of the most concerning items for the general public is that the algorithm apparently somehow retained access to video view history that users had opted to delete, continuing to use that information to recommend new videos to them.

Google says all listed privacy breaches were resolved

The database leak was not a matter of hacking or misconfiguration, but was sent to 404 Media by an anonymous inside source. But as Google notes, all of the privacy breaches it documents seem to have eventually been flagged and resolved.

The documents were revealed less than a week after another bombshell database leak from Google’s private files, that one involving how it ranks and evaluates websites in its search results. That revelation rankled the digital marketing and SEO industries, which noted some major discrepancies between Google’s longtime public policies in this area and how it has apparently been doing things internally all of this time. Google has also acknowledged the authenticity of that leak, but claims that it contains outdated and incomplete information and “out of context” measures that were phased out or not ultimately implemented.

The list of privacy breaches is not exactly damning, and much of it is internal activity to be expected from a company of the size and complexity of Google over a period of several years. But it is far from the only public relations issue the company is dealing with at the moment. Digital marketers are not the only ones upset with its search product; it has been steadily dropping in popularity for at least several years due to a combination of privacy concerns, perception that it is too laden with ads and increasingly unhelpful, and struggles with buggy and unpopular rollouts of AI tools and additions.

Some of the privacy breaches also lasted for quite a long time before being addressed, and may not have been fully disclosed to the public before now. An AI-assisted learning app called Socratic, owned by parent company Alphabet, listed user email addresses in the home page source code for at least a year before being discovered and may have also leaked geolocation information and IP addresses. The app is targeted primarily at minors as a homework assistance tool.