Finger pointing at digital lock showing privacy enhancing technologies (PETs)

What Are Privacy-Enhancing Technologies (PETs) And How You Can Choose the Right One(s)

If you want to maximize data utility but remain compliant with privacy regulations, you have to find the right balance between those two.

What can help you in your data operations?

Privacy-enhancing technologies, called PETs for short. Although PETs don’t exempt controllers from the ambit of GDPR, CCPA and other regulations, they increase safety around data processing. There are various PET types and every technology serves a different use case.

In this article, you’ll learn:

  • What are privacy-enhancing technologies (PETs)?
  • Types of PETs.
  • How to choose the right privacy-enhancing method for your use case.

What are PETs?

Privacy-enhancing technology is a term that covers various methods that help you protect privacy and data confidentiality. Use PETs to streamline sensitive data processing, while maximizing data utility in both internal and external company projects.

PETs respect fundamental data protection principles such as:

  • Lawfulness
  • Fairness and transparency
  • Purpose limitation
  • Data minimization
  • Accuracy
  • Storage limitation
  • Integrity and confidentiality
  • Accountability

PETs have been in business for decades but regained popularity because of the raising awareness around security and privacy compliance. You might be familiar with some of them, including ad-blocks, Tor networks, anonymization, pseudonymization, or synthetic data.

When to use PETs? Let’s say you want to run a big data project with a third party but you don’t want to expose your data. In this case, one or more PETs will help you make the collaboration happen as your data will have an additional level of protection.

Let’s dive into details.

Types of privacy-enhancing technologies

PETs paved the way for many organizations to better protect their data. But the technological advancement is so rapid, you’ll often need a combination of different PETs rather than one, standalone solution.

Here is an extensive list of PETs that can turn out to be helpful in your data projects.

Among PET types, you’ll find:

  1. Encryption in transit and at rest
  2. De-identification techniques: tokenization or k-anonymity
  3. Pseudonymization
  4. AI-generated synthetic data
  5. Encrypted analysis: homomorphic encryption
  6. Trusted execution environments (TEE)
  7. Anonymized computing: secure multi-party computation, federated analytics
  8. Differential privacy
1.    Encryption in transit and at rest

Encryption in general means converting information into coded data that secures this data. Unless you have a decipher key, you won’t access it.

Encryption in transit guarantees data confidentiality when data is on the move – across the internet or different networks. As data is usually less secure when in transit, you need encryption in transit to safeguard it.

Encryption at rest means you protect your data that sits on a third-party cloud or internal environment. This type of encryption ensures that your data is safeguarded against leaks, hacks, or exfiltration.

2.    De-identification techniques

De-identification techniques depend on modifying data in a way that information about individuals appears in a smaller amount in a dataset. But according to the Center of Data Ethics and Innovation, these techniques aren’t really mechanisms for protecting the confidentiality of data.

Among de-identification techniques, you’ll find:

  • Tokenization – replacing specific values with random values
  • K-anonymity – dynamically masking data so individual data points “hides” in a dataset which makes it hard to identify individual
3.    Pseudonymization

It’s a data masking technique that hides the identity of an individual in a dataset by replacing fields with pseudonyms. Although pseudonymization is useful in many cases, a pseudonymized dataset contains PII that can be re-identified thus such data is subject to GDPR.

4.    AI-generated synthetic data

Synthetic data is artificially generated data that mirrors the patterns, balance, and composition of the original dataset. This PET has a big potential because privacy-preserving synthetic data doesn’t contain personal data and thus can be used in any project that requires good quality data in large amounts.

If you want to collaborate with third-party contributors, synthetic data unlocks data potential as it’s not subject to GDPR CCPA, or LGPD.

5.    Encrypted analysis

Homomorphic encryption

In homomorphic encryption, you allow third-party to process and analyze data in an encrypted form. As a data controller, you send data to them and wait for the calculations on encrypted data. Later, after getting the computations’ results, you can decrypt data and see the results.

Unlike encryption which works in transit or at rest, homomorphic encryption works for data in the process. The main idea of this PET is to collaborate with multiple external, third party companies or keep data in a cloud environment in a safe and compliant way.

6. Trusted execution environments (TEE)

In this method, you make a “safe” hardware partition from the main computer’s processor and memory. Data held within the TEE environment can’t be accessed from the main processor and the communication between those environments is encrypted, thus safe.

You as a data controller may let third parties operate within TEE on unencrypted data but you have to possess a certain level of trust between each other that the environment is set up correctly and safely. Because in TEE you work on unencrypted data, it’s faster than homomorphic encryption. Later on, you just need to reveal data with a decryption key.

If you don’t fully trust the external party with whom you’re cooperating, it’s better to lean towards using a homomorphic encryption method.

7. Anonymized computing

Secure multi-party computation

It lets multiple companies collaborate on one data project while no one can learn anything from the inputs of others. This lets you preserve privacy and security while benefiting from the project value without revealing sensitive data.

Federated analytics

It focuses on executing a computer program in a local environment and communicating results to the originating party afterward. Federated learning is a part of the federated analysis and it involves training machine learning models on distributed datasets. This isn’t the privacy-enhancement technology to the fullest – it’s more about not disclosing data during the ML training process.

8. Differential privacy

Differential privacy (DP) is a framework that allows you to share insights about a dataset while withholding information about individuals. In DP, you add a layer of privacy to the dataset so it’s available for public share.

This layer of protection is the noise added to the input data (local differential privacy) or the output (global differential privacy) of an algorithm.

By adding this noise to the data you get an additional layer of privacy but you lose the model’s accuracy at the same time. That’s why carefully balance it so you don’t end up with either too inaccurate or too non obfuscated results.

Which PETs you should use

To choose the right PET(s), you should consider several factors. Here are some tips that will help you choose a suitable data protection:

  1. Identify the type of data you are using (structured or unstructured).
  2. Consider whether you will have to share information with third parties.
  3. Define whether you need access to the dataset or only to the output.
  4. Decide if the data will be used to train machine learning and artificial intelligence applications.
  5. Calculate your budget – some PETs are costlier than others.
  6. Know how much computation power you have available – some PETs require advanced infrastructure.
  7. Determine whether PII (personally identifiable information) needs to be kept in the dataset.

If you are not sure which PET fits your use case, take a look at the table below.

Use caseRecommended PET
  • You give access to personal data to a third-party entity.
  • This data contains PII.
  • You need to share the output (insights) from the dataset.
  • They carry out processes for you and the complexity of the calculations is very high or the latency has to be low.
  • Trusted execution environments (TEE)
  • The same case as above but the complexity of the calculations is low and there is no strong latency requirement.
  • Homomorphic encryption
  • You want to publish the results of the data analysis. For example, Google wanted to release stats and visualization about the population’s mobility habits in response to COVID-19 interventions.
  • The resource is based on location data from Google users who have opted into location history tracking.
  • Differential privacy (data aggregation protects individuals’ privacy)
  • All of the above but you want to train a machine learning model, using this data analysis.
    AI-generated synthetic data
  • You want to collaborate with multiple parties and conduct analyses.
  • Processing will happen on remote devices.
  • You want to train a machine learning model on this data but you can’t share any personal data.
  • Combine federated learning with the use of synthetic data
  • You want to collaborate with multiple parties and conduct analyses.
  • Processing will happen on remote devices.
  • You want to perform statistical analysis based on this data.
  • Only you need to know the result of the computation.
  • Federated Analytics
  • You are collaborating with a third party(ies) that has access to the information and also contributes sensitive data to your project.
  • You are aiming to make user-specific predictions.
  • Need to share insights, not the dataset.
  • Secure multiparty computation (if you have budget for higher communication costs).
  • Data Processing Agreement, Encryption in transit and at rest (if you don’t have budget for higher communication costs).
  • Same as the above but you don’t require additional sensitive data from third parties to compute the output.
  • De-identification, data processing agreement, encryption in transit and at rest.
  • You are sharing information with a third party or a user.
  • The dataset needs to be visible, not just insights.
  • The dataset doesn’t contain PII.
  • Combine data aggregation with differential privacy, risk-based data de-identification and/or synthetic data and encryption in transit and at rest (for structured data like organized names, dates, geolocation and more).
  • Combine risk-based data de-identification and/or synthetic data with encryption in transit and at rest (for unstructured data like images, videos, speech).

Sources: PET Adoption Guide, Privacy Enhancing Technologies Decision Tree

Lastly, remember to discuss your use case with the right specialists. In most situations, you’ll need more than one PET. To limit risks, implement privacy by design so you have strong privacy fundamentals already.