Data Anonymization and Pseudonymization Under the GDPR

Companies that handle data are currently faced with constraints placed on them by data protection laws. The EU General Data Protection Regulation (GDPR) will come into effect on May of 2018 and will introduce firmer regulations, and imposing heavier penalties for failing to comply with these laws. Data protection regulations are necessary to protect the data security and privacy of individuals. To satisfy these regulations and keep user data safe, companies may need to engage in data anonymization. This is the best way to simplify the process of complying with privacy regulations. Unlike pseudonymization, data anonymization allows companies to work with their data, stay compliant with regulations, and protect the privacy of individuals. Companies can use tools, such as Aircloak’s solution, to accomplish this.

What is data anonymization and pseudonymization?

Data protection laws exist to protect the personal identity of people whom the data describes. If the data subject is not identifiable in any way, data protection law does not apply. That is, when it becomes impossible to connect individuals to the data, people controlling and processing the sensitive data are not restricted in data use or sharing. On the other hand, an identifiable data subject can result in legal consequences including damage claims, loss of reputation, and fines or penalties.

The purpose of data anonymization is privacy protection. It involves the modification of data sets so that no personally identifiable information remains. As a result, data can be used and transferred without individuals’ identities being disclosed unintentionally. This is necessary before analytics can be performed on the anonymized data.

Pseudonymization differs in that personally identifying fields in the data are replaced by pseudonyms such as random numbers. This can also be done by coarse graining data. For example, replacing someone’s full zip code by the first two characters or replacing their city by their state. Simply replacing these fields, however, does not eliminate the possibility of identifying individuals in the data. In fact, a study of human mobility data revealed that “four spatio-temporal points are enough to uniquely identify 95% of the individuals” in a dataset, even if the granularity is low and you have no additional information about a user. For a mobile call records dataset like this, pseudonymization might involve replacing people’s names and phone numbers with random numbers. However, if the analyst knew of 2 or 3 calls that a friend made at certain times from certain places they could be able to (re)identify their friend’s records from the pseudonymized data.

Companies need stronger anonymization techniques

True data anonymization is difficult to achieve, and many data controllers fail to do so properly and completely. In fact, most companies use the weaker pseudonymization techniques to protect personal data, which also means many companies will be constrained by data privacy laws and subject to penalties when the GDPR comes into effect.

The EU has developed a list of criteria for anonymization with their WP29 Opinion on Anonymization Techniques. First of all, the WP29 makes it clear that pseudonymization is not strong enough to protect personal data. Pseudonymization permits data controllers to handle their data more liberally, but it does not abolish all risks due to the possibility of re-identification. Crucially, as a result, pseudonymous data is still subject to privacy regulations under the GDPR.

Overall, the WP29 is intent on educating organizations about proper anonymization techniques, and are calling for certification so that companies follow their requirements. Data protection authorities (DPAs) can carry out this certification. Yet, no clear guidance currently exists on how data anonymization techniques should be certified.

Leave a Reply

Please Login to comment
Notify of

Enjoyed the article?

Get notified of new articles and relevant events.

Thanks for subscribing!