If you want to maximize data utility but remain compliant with privacy regulations, you have to find the right balance between those two.
What can help you in your data operations?
Privacy-enhancing technologies, called PETs for short. Although PETs don’t exempt controllers from the ambit of GDPR, CCPA and other regulations, they increase safety around data processing. There are various PET types and every technology serves a different use case.
In this article, you’ll learn:
- What are privacy-enhancing technologies (PETs)?
- Types of PETs.
- How to choose the right privacy-enhancing method for your use case.
What are PETs?
Privacy-enhancing technology is a term that covers various methods that help you protect privacy and data confidentiality. Use PETs to streamline sensitive data processing, while maximizing data utility in both internal and external company projects.
PETs respect fundamental data protection principles such as:
- Lawfulness
- Fairness and transparency
- Purpose limitation
- Data minimization
- Accuracy
- Storage limitation
- Integrity and confidentiality
- Accountability
PETs have been in business for decades but regained popularity because of the raising awareness around security and privacy compliance. You might be familiar with some of them, including ad-blocks, Tor networks, anonymization, pseudonymization, or synthetic data.
When to use PETs? Let’s say you want to run a big data project with a third party but you don’t want to expose your data. In this case, one or more PETs will help you make the collaboration happen as your data will have an additional level of protection.
Let’s dive into details.
Types of privacy-enhancing technologies
PETs paved the way for many organizations to better protect their data. But the technological advancement is so rapid, you’ll often need a combination of different PETs rather than one, standalone solution.
Here is an extensive list of PETs that can turn out to be helpful in your data projects.
Among PET types, you’ll find:
- Encryption in transit and at rest
- De-identification techniques: tokenization or k-anonymity
- Pseudonymization
- AI-generated synthetic data
- Encrypted analysis: homomorphic encryption
- Trusted execution environments (TEE)
- Anonymized computing: secure multi-party computation, federated analytics
- Differential privacy
1. Encryption in transit and at rest
Encryption in general means converting information into coded data that secures this data. Unless you have a decipher key, you won’t access it.
Encryption in transit guarantees data confidentiality when data is on the move – across the internet or different networks. As data is usually less secure when in transit, you need encryption in transit to safeguard it.
Encryption at rest means you protect your data that sits on a third-party cloud or internal environment. This type of encryption ensures that your data is safeguarded against leaks, hacks, or exfiltration.
2. De-identification techniques
De-identification techniques depend on modifying data in a way that information about individuals appears in a smaller amount in a dataset. But according to the Center of Data Ethics and Innovation, these techniques aren’t really mechanisms for protecting the confidentiality of data.
Among de-identification techniques, you’ll find:
- Tokenization – replacing specific values with random values
- K-anonymity – dynamically masking data so individual data points “hides” in a dataset which makes it hard to identify individual
3. Pseudonymization
It’s a data masking technique that hides the identity of an individual in a dataset by replacing fields with pseudonyms. Although pseudonymization is useful in many cases, a pseudonymized dataset contains PII that can be re-identified thus such data is subject to GDPR.
4. AI-generated synthetic data
Synthetic data is artificially generated data that mirrors the patterns, balance, and composition of the original dataset. This PET has a big potential because privacy-preserving synthetic data doesn’t contain personal data and thus can be used in any project that requires good quality data in large amounts.
If you want to collaborate with third-party contributors, synthetic data unlocks data potential as it’s not subject to GDPR CCPA, or LGPD.
5. Encrypted analysis
Homomorphic encryption
In homomorphic encryption, you allow third-party to process and analyze data in an encrypted form. As a data controller, you send data to them and wait for the calculations on encrypted data. Later, after getting the computations’ results, you can decrypt data and see the results.
Unlike encryption which works in transit or at rest, homomorphic encryption works for data in the process. The main idea of this PET is to collaborate with multiple external, third party companies or keep data in a cloud environment in a safe and compliant way.
6. Trusted execution environments (TEE)
In this method, you make a “safe” hardware partition from the main computer’s processor and memory. Data held within the TEE environment can’t be accessed from the main processor and the communication between those environments is encrypted, thus safe.
You as a data controller may let third parties operate within TEE on unencrypted data but you have to possess a certain level of trust between each other that the environment is set up correctly and safely. Because in TEE you work on unencrypted data, it’s faster than homomorphic encryption. Later on, you just need to reveal data with a decryption key.
If you don’t fully trust the external party with whom you’re cooperating, it’s better to lean towards using a homomorphic encryption method.
7. Anonymized computing
Secure multi-party computation
It lets multiple companies collaborate on one data project while no one can learn anything from the inputs of others. This lets you preserve privacy and security while benefiting from the project value without revealing sensitive data.
Federated analytics
It focuses on executing a computer program in a local environment and communicating results to the originating party afterward. Federated learning is a part of the federated analysis and it involves training machine learning models on distributed datasets. This isn’t the privacy-enhancement technology to the fullest – it’s more about not disclosing data during the ML training process.
8. Differential privacy
Differential privacy (DP) is a framework that allows you to share insights about a dataset while withholding information about individuals. In DP, you add a layer of privacy to the dataset so it’s available for public share.
This layer of protection is the noise added to the input data (local differential privacy) or the output (global differential privacy) of an algorithm.
By adding this noise to the data you get an additional layer of privacy but you lose the model’s accuracy at the same time. That’s why carefully balance it so you don’t end up with either too inaccurate or too non obfuscated results.
Which PETs you should use
To choose the right PET(s), you should consider several factors. Here are some tips that will help you choose a suitable data protection:
- Identify the type of data you are using (structured or unstructured).
- Consider whether you will have to share information with third parties.
- Define whether you need access to the dataset or only to the output.
- Decide if the data will be used to train machine learning and artificial intelligence applications.
- Calculate your budget – some PETs are costlier than others.
- Know how much computation power you have available – some PETs require advanced infrastructure.
- Determine whether PII (personally identifiable information) needs to be kept in the dataset.
If you are not sure which PET fits your use case, take a look at the table below.
Use case | Recommended PET |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sources: PET Adoption Guide, Privacy Enhancing Technologies Decision Tree
Lastly, remember to discuss your use case with the right specialists. In most situations, you’ll need more than one PET. To limit risks, implement privacy by design so you have strong privacy fundamentals already.