The spring of 2016 marked the beginning of a very unpleasant summer for a Hospital in the Pacific Northwest. Within a few months, it had become implicated in a security breach that involved massive HIPAA violations after having taken part in an approved, collaborative Genetic Study. The study was conducted by an independent genetic research firm who brought the data in from various institutions to a private cloud, run by a private provider. But as misfortune had it, the info that came from our particular hospital carried patient CPT codes with it, which it should not have.
CPT codes tell us which procedures a patient went through. Since carrying information that reveals anything about the patient’s person, including medical data, is subject to HIPAA, having this information out there with the genetic data became a problem once it was all gathered in the cloud. One of the staff researchers at the firm actually noticed this and kept it to himself – but only for a while. He chose to use it against the firm a few months later, after he was dismissed from his position. A security breach exposed the same error as well, placing the company, the Hospital and the cloud service provider, in violation of the HIPAA law.
Apart from being a perfect storm of security breaches, human error, and abuse of trust put together, this tale of research-gone-awry sheds light on one of the main issues looming over cloud computing today: how do we protect our data while working on it in the cloud? After all, the information reaches the cloud so it can be handled better, processed faster, and ultimately, we want it to be private as well.
They key to having a safe working environment is maintaining control over the data. If our hospital could have the actual, raw data, secure in one place and run the operations on the cloud without exposing the data itself, then neither the hospital nor the cloud service provider would have been liable, because no private information would have exchanged hands.
Machine Learning (ML) and cryptography based solutions are the incredible answers we have right now. Though each have their drawbacks, when implemented in the cloud environment, they can help prevent situations like the one we just read.
Let’s a look at the traditional situation first. In the classic model of centralized learning (ML), information is gathered in the cloud from all connected devices and then sends it to a central model, where data alters the algorithm and ‘trains’ it. Because actual information is gathered and stored it’s less of a fit for dealing with legally regulated records, since this means that data exchanges hands. Distributed learning distributes data between multiple data owners and creates a central aggregator to coordinate between them. The owner and aggregator collaborate, and train a global model, which is then based on all the training data in the system. Despite being privacy friendly, DL systems are exposed to attacks: data inversion, membership inference and property inference, poisoning and backdoor attacks, particularly by systems that feature underlying ML models themselves and can train online using distributed training data. Federated learning gives segments of the learning process out to clients. Each client uses locally collected data and trains a local model. The models are then collected and sent to an aggregator which combines them all into a global model. Federated learning supports privacy, because sensitive data remains with the client. It does not have to be shared or transmitted, not even over the network. The drawbacks are similar to those in distributed learning.
Fully Homomorphic Encryption offers the possibility of combining between usability, functionality and safety, but with a price. FHE enables us to run ML algorithms on encrypted data, without actually accessing the underlying data that is protected under governmental regulations like HIPAA. FHE’s drawbacks are that it is labor intensive, computationally heavy, and complicated to implement.
This multitude of possible solutions tells us that the field is still rife with challenges waiting to be tackled, with no actual winner around to take the lead in delivering speed, privacy and scalability in the cloud. Simplicity has made software-based security solutions the main trend today, but we’ve all learned from experience that the ‘software only’ approach to security is no longer enough. This is why hardware based secure enclaves, including those inside CPUs, have become a rising trend on the cloud.