What Privacy Professionals Need To Know Before Investing in Federated Learning

Only 1% of the world’s data is used to its full extent – let that sink in. Technological, regulatory, and trust barriers are often key drivers of this underutilization. Sometimes this is a good thing, sometimes it’s bad. The fact remains: there are large swaths of untapped data which can drive collaboration and innovation to solve some of humanity’s biggest challenges and power the future of inclusive economic growth.

As a privacy professional, you have the opportunity to lead the charge in changing our relationship with data. Emerging privacy enhancing technologies (PETs), such as federated learning, are the key building blocks of this future. They can enable you to unlock new opportunities for your organization while protecting individual privacy, maintaining control of valuable data / IP, and simplifying compliance in an increasingly fragmented regulatory landscape.

Despite these opportunities, many organizations have been slow to adopt PETs due to a lack of awareness, complexity, and often difficult task of implementation. One of the most promising and mature PETs is Federated learning which is a method to enable organizations to train machine learning models on distributed datasets that cannot be centralized due to technological, regulatory, or trust reasons. It is widely used by Apple and Google to power text prediction models on mobile devices and is increasingly used across industries where data collaboration is both valuable and challenging. Two important examples of this are:

Improving the accuracy of models that help clinicians diagnose rare diseases faster on scarce, yet highly sensitive patient data, and
Developing new ML powered features in B2B software products that learn from data held across customers.

Before leveraging this technology, privacy professionals must first understand what federated learning is, when it can drive value for their organization, and a few pitfalls to watch out for.

What exactly is federated learning?

In traditional machine learning, data must be centralized before training a model. In federated learning, models are trained on distributed datasets – that is, the data resides in two or more separate locations and never needs to be moved. Portions of a machine learning model are trained where the data is located and model parameters are shared among participating datasets to produce an improved global model. Since no data moves within the system, organizations can have confidence in their control over the data, how it is used, and avoid the pain and expense of moving data. Moreover, the distributed nature of model training in federated learning adds a layer of pseudonymization to the resultant models – this can further reduce risk and open up new processing possibilities.

For analytics and machine learning projects at risk of failure because they don’t have access to the right data, this can be a lifeline by unlocking new data sources safely and with minimal friction for collaborators.

When can federated learning add value for my organization?

Privacy professionals can unblock innovation in the business by recognizing opportunities, championing the art of the possible with federated learning, and supporting business projects where it can accelerate value delivery and reduce risk. Privacy teams should look for these five criteria to determine whether a project will benefit from federated learning:

1. Your team needs machine learning or advanced analytics

Think about anywhere your team has thought about using Machine Learning or is already using Machine Learning today. If your company is rooted in product or marketing, this could be for personalization; in financial services, fraud detection; or in healthcare, precision medicine.

2. You don’t have enough data from a single data source

In some cases, your team may not be able to build a model. In others, they may be able to build a model, but it is not accurate enough to meet your business objectives. For a personalization engine, this could mean that your personalizations are not improving customer conversions or it’s taking too long to reach your goals; in financial services you don’t have enough data alone to detect fraud early or accurately enough; and in healthcare, you haven’t reached your accuracy goal in your diagnostic model.

3. You know what data you would need to build a better model

Your data science or product team should have an idea about what data could improve the model. It may be that they need data about more individuals. Some teams also have biased data sets (where data is not representative of a whole population) and are looking for more diverse data.

4. The data you need is siloed

Even though you know this data exists, it may not be possible for your team to easily access this data for machine learning. Data silos or data fragmentation can exist for several reasons:

Regulatory: Regulations like GDPR, PIPEDA, CCPA, and HIPAA restrict the processing of identifiable data for secondary purposes or their transfer outside of a given jurisdiction.
Contractual: Companies may already be sharing data (for example, when you integrate a database with a SaaS tool), but existing contracts limit how they can be used.
Trust: Many companies are unwilling to collaborate on data, due to the cost of structuring and enforcing data sharing agreements, the risk of leaking sensitive information, or the risk of exposing intellectual property.
Technical: Particularly in large organizations or those that have a history of mergers, legacy systems and distributed data architecture makes it costly to centralize and maintain governance over data for machine learning.

If a project meets these criteria, then federated learning could mean new or better machine learning models, and new, impactful opportunities to drive value. With these factors in mind, privacy professionals can begin to consider the ways federated learning can help accelerate their organization.

What pitfalls should I watch out for?

Federated learning has enormous potential but it is not a panacea. As with any project involving personal data, your privacy impact analysis should define the appropriate legal basis for processing, assess the risks to individual data rights, and implement right-sized controls. As we have outlined, federated learning can make this a lot easier. Depending on how the resultant machine learning models are used, you should consider integrating additional PETs such as differential privacy for an added layer of privacy.

Privacy professionals hold the keys to enabling their teams to unlock previously untouchable data. Federated learning and other PETs open up the potential for your organization to enhance privacy protection and data control, while simplifying compliance. If machine learning models can be trained on more and better data, the opportunities that can be unlocked are limitless.