“Data is the new oil” – not only is this an overused expression with over 1 billion hits on Google but it is also a rather inaccurate comparison. If you find a source of oil in your garden today, you could quickly make money out of it. The infrastructure and technology exists, there are experts and customers for all processing steps and by-products. But imagine you instead find a “data pot” which has grown over the years in your company. You can’t simply rely on proven processing chains because new data protection laws make extracting value from this data so hard. All too often decision-makers in companies now face the question: Here I have valuable data, but what next?
Untapped data potential
Of course, some companies have already solved this problem. Google and Facebook, two of the world’s largest companies, came into their current position because of the sheer value of their customers’ data. So, it is hardly surprising that both companies are trying to enrich their already immense data with information from other sources. For example Google now buys data from credit card institutions to combine online advertising with offline purchases. Facebook goes one step further and offers its own customer profiles in exchange for transactional data.
Why are these two big players heading for shallow waters to get your bank account information? Surely Google already knows exactly where I will spend my next vacation? And Facebook can already accurately determine my personality structure because of a few likes? Sure, but my bank knows a lot more. Or at least they could know.
Transactional data can you give you a huge range of interesting insights even with simple methods. Of course, shopping habits and preferences are easily identifiable, but also motion profiles can be created by using the addresses of stores and restaurants you have visited. The amount of child allowance you receive gives a clue to the number of children. Fuel costs indicate kilometers driven and the estimate can be even more accurate if you include the amount of vehicle tax paid.
A complicated way out
In the finance sector, trust is vital, more so than in almost any other field of business. As a customer, I entrust my money to a bank because it is safe there. I implicitly expect the same with my data. But obviously, the abuse potential is enormous.
So, how can banks legally share your transactional data? Anonymization is one legally-supported solution to this. Correctly(!) anonymized data can no longer be traced back to individuals without disproportionate effort, regardless of which additional information an analyst has. Thus, personal data can be safely and legally evaluated. The General Data Protection Regulation (GDPR) specifies in recital 26: “The principles of data protection should […] not apply to anonymous information”.
Well, that sounds like good news. Simply anonymize the data first, and you do not have to worry anymore … Surprisingly, this is not the case.
Typically, there are three problems with anonymization:
- Anonymization is very complex and time-consuming. It is not enough to simply remove obvious identifiers such as names or bank account numbers. For example, in the United States, 63% of citizens are clearly identified just from date of birth, gender and postal code. Even such a simple data set requires profound methods of anonymization, almost always with high manual effort and complex considerations.
- If the anonymization is carried out incorrectly, there is a great risk for those responsible. Under the GDPR, fines can be up to 4% of the annual turnover or 20 million euros (whichever is greater), as has been repeatedly emphasized in recent months.
- However, even if the anonymization was done correctly, there is another serious problem. Data quality usually suffers enormously. Take the aforementioned record of US citizens. Even if you only have the country of origin instead of the zip code, still about 18% of people can still be clearly identified. So you would have to merge more cells in the data set, for example indicate only the birth month instead of the date of birth. The usefulness of the dataset has then suffered a lot1. Data privacy researcher Paul Ohm wrote in 2010: “Data can either be useful or perfectly anonymous – but never both”.