The first quarter of 2022 has already been heavy with data privacy discussions. Security celebrities were among the first to celebrate the inaugural Data Privacy Week. The Belgian Data Protection Authority (DPA) revealed that the International Advertising Bureau’s Transparency and Consent Forms (TCF) were non-compliant with GDPR regulations. And Google launched new differential privacy tools to help meet the growing demand for better consumer privacy, among other news.
All of these conversations – both good and bad – are constructive when it comes to the advancement of data privacy, because even when we get things “wrong,” it gives us a chance to figure out how to get it right.
Looking specifically at Google’s latest differential privacy tools, they take a step in the right direction but still have a few limitations when it comes to protecting consumer privacy.
First, differential privacy only addresses a subset of the use cases for which user data is collected
Google’s tools are designed for developers working at large entities that have already collected information on their users. As such, these tools can’t address how privacy is maintained while data is captured, and they don’t account for data sharing during capture (i.e. while you are on a site browsing and the site sends a signal to Facebook with your personal data).
Differential privacy needs to be layered on top of processes with strong data governance, which is built on top of foundational principles of data management like data minimization, access controls and data deletion.This means the tools work best with robust operational processes and large, disparate datasets – like a major ecommerce site with hundreds of thousands of users. With this much data, companies can gain insights by using differential privacy without affecting user privacy or accuracy of insights. This makes it easy to bury the proverbial needle of personal identifying information (PII) in a haystack of “noise” created by differential privacy engineers.
However, this approach still won’t address the needs of small- to mid-sized businesses (SMBs), whose datasets are much smaller. For the majority of ecommerce companies – those that have customers in the hundreds to thousands or even tens of thousands – applying differential privacy to their smaller datasets would be more like trying to bury a tree branch in a haystack. It becomes much easier for bad actors to pinpoint a customers’ real PII over the fabricated “noise” because there is not enough data to hide or disguise it.
What’s more, even if these SMBs were somehow able to gather a large enough data set for differential privacy to work, simply implementing this functionality would require more resources than most SMBs can handle. On top of that, this still won’t address major sources of data breaches at small businesses (due to weak passwords, unencrypted data, loss of devices, out-of-date software, or lack of access controls, to name a few). An average SMB spends $800,000 to $1.6 million a year in privacy management – much more than most are willing to pay, as 43% lack a cyber security defense plan. They’re essentially rolling the dice in terms of securing their data – and that of their customers – from small business cyber attacks.
Second, Google’s differential privacy approach doesn’t account for who decides what level of “noise” will appropriately protect user privacy
While the engineers building a differential privacy library will include a “tester” – a feature that allows a developer to test whether or not the differential privacy will hold under a given set of circumstances – the onus is on the developer, Google and other big tech companies like Apple to keep the noise level high enough to protect user privacy. Even if these companies vow to maintain a level of noise that maintains privacy, we have no way of knowing what standard to which they hold themselves.
Epsilon is part of the differential privacy package and provides one standard of measurement: researchers have shown that privacy must be at 1 or below to provide the absolute best anonymity. However, as hard as these companies have tried to maintain the appropriate level of noise, history has shown that while these companies boast a high level of user privacy, the reality is that they have not added enough noise to maintain it.
If we want differential privacy to work, we need to hold Google, Apple and other big tech companies accountable for the level of privacy they maintain. This will require greater transparency into who decides the right noise level and how it is maintained.
Lastly, Google’s motivation for pushing differential privacy may not be in the best interest of consumers
Differential privacy fits within the broader Google Privacy Sandbox Strategy to usher in the post-cookie world. This strategy will ultimately make it easier for browsers like Chrome to assign users to cohorts (i.e. FLoCs) based on browsing behavior, then to track and aggregate that cohort’s activity across the web.
Google does this by assigning each cohort a few thousand users whose web behavior is similar enough to be tracked collectively. This change, combined with technologies like high device fingerprinting, would have the troubling effect of resulting in less user privacy, not more.
Even without taking Google’s FLoC into account, differential privacy fails to account for the potential social and political implications associated with using these libraries as examples of completely random, anonymized user data. At least one analysis suggests that differential privacy could penalize minority communities by undercounting areas that are racially and ethnically mixed. Harvard University researchers found that the method made it more difficult to create political districts of equal population and could result in fewer majority-minority districts.
Under the right circumstances – those where datasets are large and tech companies are transparent about their standards for measuring and maintaining privacy – differential privacy is another helpful tool for companies who want to safeguard consumer privacy. However, developers must walk a very fine line in order to protect individual privacy while maintaining the utility and value of the data. Whether or not developers are up to the challenge remains to be seen.