Image of crowded street with blurred faces signifying how data anonymization and pseudonymization should work under the EU GDPR
Data Anonymization and Pseudonymization Under the GDPR

Data Anonymization and Pseudonymization Under the GDPR

Companies that handle data are currently faced with constraints placed on them by data protection laws. The EU General Data Protection Regulation (GDPR) will come into effect on May of 2018 and will introduce firmer regulations, and imposing heavier penalties for failing to comply with these laws. Data protection regulations are necessary to protect the data security and privacy of individuals. To satisfy these regulations and keep user data safe, companies may need to engage in data anonymization. This is the best way to simplify the process of complying with privacy regulations. Unlike pseudonymization, data anonymization allows companies to work with their data, stay compliant with regulations, and protect the privacy of individuals. Companies can use tools, such as Aircloak’s solution, to accomplish this.

What is data anonymization and pseudonymization?

Data protection laws exist to protect the personal identity of people whom the data describes. If the data subject is not identifiable in any way, data protection law does not apply. That is, when it becomes impossible to connect individuals to the data, people controlling and processing the sensitive data are not restricted in data use or sharing. On the other hand, an identifiable data subject can result in legal consequences including damage claims, loss of reputation, and fines or penalties.

The purpose of data anonymization is privacy protection. It involves the modification of data sets so that no personally identifiable information remains. As a result, data can be used and transferred without individuals’ identities being disclosed unintentionally. This is necessary before analytics can be performed on the anonymized data.

Pseudonymization differs in that personally identifying fields in the data are replaced by pseudonyms such as random numbers. This can also be done by coarse graining data. For example, replacing someone’s full zip code by the first two characters or replacing their city by their state. Simply replacing these fields, however, does not eliminate the possibility of identifying individuals in the data. In fact, a study of human mobility data revealed that “four spatio-temporal points are enough to uniquely identify 95% of the individuals” in a dataset, even if the granularity is low and you have no additional information about a user. For a mobile call records dataset like this, pseudonymization might involve replacing people’s names and phone numbers with random numbers. However, if the analyst knew of 2 or 3 calls that a friend made at certain times from certain places they could be able to (re)identify their friend’s records from the pseudonymized data.

Companies need stronger anonymization techniques

True data anonymization is difficult to achieve, and many data controllers fail to do so properly and completely. In fact, most companies use the weaker pseudonymization techniques to protect personal data, which also means many companies will be constrained by data privacy laws and subject to penalties when the GDPR comes into effect.

The EU has developed a list of criteria for anonymization with their WP29 Opinion on Anonymization Techniques. First of all, the WP29 makes it clear that pseudonymization is not strong enough to protect personal data. Pseudonymization permits data controllers to handle their data more liberally, but it does not abolish all risks due to the possibility of re-identification. Crucially, as a result, pseudonymous data is still subject to privacy regulations under the GDPR.

Overall, the WP29 is intent on educating organizations about proper anonymization techniques, and are calling for certification so that companies follow their requirements. Data protection authorities (DPAs) can carry out this certification. Yet, no clear guidance currently exists on how data anonymization techniques should be certified.

Companies face problems with unclear GDPR requirements

Starting May 2018, companies collecting data from EU citizens will need to comply with new regulations under the GDPR. However, they are facing challenges attempting to prepare for this.

GDPR requirements on anonymization leave substantial room for interpretation. Because there are no specific guidelines on how to ensure anonymization, DPAs are certifying data anonymization practices on an ad hoc basis. Instead of following specific guides, organizations are making certifications based on their own interpretations of the GDPR’s loose criteria. There is no general anonymization certification program that companies can follow before being certified. It is for instance not clear what happens legally when a certified anonymization process is later discovered to be weak or performed by someone who is unqualified. It is difficult to tell who is legally liable in such situations, and non-compliance can result in steep penalties after the GDPR comes into effect.

Traditional data anonymization approaches are problematic

There are a number of traditional data anonymization techniques, like K-anonymity and differential privacy, that are simple and provide strong anonymity but unfortunately destroy data utility. To retain data utility, data controllers typically choose from a complex variety of mechanisms that provide some anonymity, but may not protect against re-identification in many cases. These include rounding, cell swapping, outlier removal, aggregation, sampling, and other techniques. Getting this right requires substantial expertise both on the part of the data controller and the DPA.

Diffix: A new approach to data anonymization

Aircloak, in research partnership with the Max Planck Institute for Software Systems, has put many years into researching Privacy-Enhancing Technologies (PETs) and have introduced a new approach to anonymizing data sets called Diffix. Diffix is the base of Aircloak’s flagship product, Insights™, a solution to these privacy regulation problems. Aircloak Insights provides strong anonymity and good utility for a wide range of use cases, and requires no special expertise to setup and configure. Aircloak Insights maintains the quality of the data set for analysis but also achieves a strong level of anonymity.

Because Aircloak Insights works “out-of-the-box” for a wide range of use cases, once a DPA certifies that Insights has been assessed, any new use case requires little to no additional evaluation. CNIL, the French national data protection authority, has evaluated Diffix against the GDPR anonymity criteria, and have stated that Aircloak delivers GDPR-level anonymity. As a result, data that is anonymized by Aircloak’s solution will not be constrained by privacy laws or be the subject of penalties when the GDPR comes into effect.

Re-identification is possible with weak anonymization. Can you have strong #privacy, anonymity & also data utility?Click to Tweet

With the intent of proving their commitment to transparency and strong anonymization, Aircloak is exposing Insights to public scrutiny. They are running a bug bounty program and incentivizing people to find weaknesses in their anonymization technology by encouraging attackers to single out users from data sets. This program will help establish Diffix, and Aircloak’s implementation of it, as a strong and reliable anonymization technique.

Reliable privacy protection and flexible data utility

In our data-driven world, the use of automation and technology to secure privacy and ensure compliance with regulations is still underutilized. The introduction of the GDPR should force companies take advantage of useful, modern tools for anonymization. As a result, they will be able to reliably protect users, comply with regulations, and use their data with the utmost flexibility for high-quality analysis.