Participants in extreme obstacle race crawling under barb wires showing the difficulty of navigating the privacy minefield
How to Navigate the Privacy Minefield in 2019 by Dr. Maurice Coyle, Chief Data Scientist at Trūata

How to Navigate the Privacy Minefield in 2019

Most organisations want to be responsible and ethical and respect the privacy of their customers but they are unsure how to go about it. At the same time, they are hungry for the insights and business value to be gleaned from their customer data but wary of falling foul of GDPR.  It’s a minefield that many businesses will have to navigate in 2019 and beyond.

Avoid the consent trap

One of the most important things a data-driven company can do to ensure it is respecting its customers’ privacy when analyzing data is to avoid the trap of thinking that getting customer consent is a panacea.  It’s not. As seen in recently announced fines under GDPR – likely among the first of many – valid consent is tricky to obtain. Easily accessible, specific, and unambiguous opt-in consent must be obtained for every purpose, meaning that blanket consents “for analytics” are not sufficient to allow companies to analyse their customer data for any given purpose.  Rather than relying on consent for analysis involving personal data, it would be better that companies not use personal data at all – especially if the real value can be obtained by using anonymized data instead.

In-house vs independent focus

Data can be considered “anonymized” from a data protection perspective when data subjects are not identifiable, having regard to all methods reasonably likely to be used by the data controller or any other person to identify the data subject. However, if the original source data is retained for some other purpose (fraud detection, etc), then the anonymized data is still considered personal data since it is possible for it to be linked with the original data for re-identification purposes.

By attempting to anonymize data internally, organisations will likely struggle with the complexity of compliance in the post-GDPR world. Privacy enhancing measures like anonymization are not easy to achieve, requiring a combination of data science, data engineering and legal expertise – skills that are expensive, difficult to acquire and even tougher to retain.

Independent anonymization is the best way to truly break the link between the original data source and the anonymized data set that analytics are performed on. Attempting to do anonymization in-house retains the risk of accidental or deliberate re-identification of individuals which would lead to a breach of regulations.

Analysis with more insights and less bias

Fully aligning with GDPR through the use of independent anonymization isn’t just about complying with the law and showing your customers that you respect their privacy.  It actually quickly unlocks further benefits. One of the biggest of these benefits, which is surprisingly underestimated, is that you get more accurate insights with less risk of bias.

Typically, only a small percentage of people will give opt in consent for their data to be used for analytics thus resulting in a small consented data set.  Unfortunately, when this happens, there is a significant risk of drawing inaccurate conclusions. Typically, models will not be robust enough to detect false positives and subtle nuances in the data, increasing the likelihood of bias.

For example, if an organisation wanted to identify traits that defined an accountant, they could conduct a survey of random people on a street. If they surveyed 10 people and met 3 accountants, all of whom were wearing glasses while the other 7 respondents were not, they might conclude that glasses were a defining attribute of the profession.

This is overfitting, where analysis corresponds too closely to a data set and lacks additional data that ensures observations are more accurate. Building a model over a larger volume of people is the best way to avoid overfitting and bias. If they had stayed on the street longer, interviewed more people, it may have become evident that glasses are not a good indicator of accountancy or any other profession for that matter.

Longer-term view

Another benefit of the independent anonymization is the ability to carry out longitudinal studies to deliver deeper business insights. Anonymization allows organisations to legally keep data for longer periods. This means historic data can be used to develop predictive models that are more robust and accurate and that can identify seasonality or other time-based effects.

Extending the example above, if I noticed that some of my survey respondents were carrying umbrellas, I might want to find out if this is due to the weather conditions that day, the typical weather at that time of year or in some cases, due to a cautious nature among some cohorts. I would need to conduct my analysis over a longer period, including all seasons and ideally a number of years, which would mean retaining the responses for the duration of my study. If my responses are anonymized I can perform this longitudinal analysis without compromising the privacy of the survey participants.

Empowering analytics by protecting customers’ data

Independent anonymization enables organisations to continue to analyse data in a GDPR-compliant manner while maintaining the depth and actionability of the resulting data insights. This enhanced capability is achievable across a wide range of analytical activities including customer retention and lifetime value modelling, recommendation engines, customer segmentation and pricing optimization.

Ultimately, independent anonymization of customer data empowers businesses to build out their analytics models which opens the door to business insights that drive growth and innovation in a legally compliant manner.