Image of data processing and digital information flowing over network showing data analytics and unstructured data

Privacy and Analytics in the Digital Workplace: Nothing Personal, Just Business

Colombian novelist Gabriel García Márquez once wrote that all human beings have three lives: “public, private, and secret.” In the remote work era—where electronic devices have become a two-way portal between the workplace and our home—one might argue that the lines between Márquez’s “three lives” have started to blur.

For instance, the concept of privacy takes on new meaning as virtually all employee communication shifts to the digital realm. “Water cooler talk” now occurs via Zoom meetings (sometimes recorded), Slack and Teams messages, while personal and candid information goes intermixed with business records. In this strange digital landscape, the issue of what data to save and what data to delete is, at best, unclear. And when multiplied across billions of electronic communications, it presents an entirely new shade of “grey area.”

And to further complicate the picture, any or all of this data can potentially become part of legal discovery, a compliance matter, or one of the latest and most challenging Big Data challenges: a data subject access request for GDPR, CCPA, or any one of a handful of new privacy regulations.

Amidst this convergence of personal and business data, how then do companies manage to keep pace with new data governance requirements such as privacy? And perhaps most pressing to upper management, how can companies simultaneously leverage this data to gather insight into their workforce, as to better understand what’s going on with the “human” side of the corporation?

Answers to such questions lie near to the heart of the C-suite, as articulated by Microsoft CEO Satya Nadella, who said, “The biggest or most strategic database in the company is the knowledge repository of all communications within the enterprise.” And indeed, tapping into this knowledge repository is the holy grail of business intelligence and analytics. Doing so can illuminate many of the human aspects of the enterprise, including the intent, sentiment, performance, engagement, and influence of the workforce.

People data: Unstructured and unknown

Now that we’ve established the purpose behind unlocking this people data, let’s take a look at how it might be approached. At the core of the “digital knowledge” challenge is the entire corpus of unstructured data, comprising all of the emails, instant messages, and documents shared by the workforce. For a large organization, the number of individual items has reached the tens of billions. Throw in individual instant messages, and the number may soon approach the trillions. The volume of data paired with its inherent lack of structure pose an extreme challenge, as the ability to search, manage, or analyze documents becomes impacted at these massive scales. For example, performing a search across the enterprise to analyze the workforce’s sentiment towards “work-from-home,” or to identify all personal data for a particular individual, has been outside the realm of feasibility until recently. Ensuring that personal data is protected from analytics is an additional layer of complications.

To counter these scaling challenges, the common approach to processing data—whether for analytics, records management, or eDiscovery—is to create a “sandbox.” This entails selecting a subset of the total data and exporting it into a more manageable repository, where data can be processed. The sandbox approach, however, has some fatal flaws: it depends on the assumptions that the sandbox will contain all relevant data. The reality is, when it comes to analytics, most insights are out of sight. Sandboxes are only as good as the data they contain, and finding relevant data often takes many iterations, each time requiring more time and resources. Finally, this approach often creates privacy issues where personal data can end up in an analytics repository, leading to possible fines and other risks.

The path forward is in-place

A new approach to management and analysis of unstructured data instead aims to manage data in-place—in other words, without creating additional copies or sandboxes. In this new paradigm, information governance policies are applied to the original data sources, including the classification, retention, search, analysis, and deletion of documents. Only key documents need be archived, where they can be managed more closely. This approach serves a few key purposes.

One, by managing data in-place, without copy, the risks associated with creating unmanaged sandboxes are minimized—for example, privacy breaches. The storage costs are also dramatically reduced since we avoid storing unnecessary duplicates.

Two, because the whole “beach” of data is managed (rather than only sandboxes), the entire corpus of human data is readily available for analytics. Searches can be performed across the enterprise to find the most relevant dataset, and knowledge that was previously invisible from corporate view can now be leveraged for powerful insights.

Finally, by implementing management of data at the source, privacy enforcement is impowered in a completely new way. Searches for personal data, as per GDPR and CCPA, can be executed at an enterprise scale. The knowledge repository of all communications becomes a source of limitless insight while its simultaneously treated for information governance and privacy requirements.

Today, companies sit on an untapped goldmine of knowledge, and those that harness it will enable a strategic advantage that cannot be ignored. Ensuring that analytics, privacy, and information governance are applied synergistically, has become one of the most critical business imperatives. This is perhaps why many of the most pioneering companies have formed information governance committees, which include stakeholders from Legal, Compliance, Privacy, Records Management, and Analytics. Forming such a committee is a critical step to ensure the company moves in unison towards an optimal outcome, maximizing the value of data while minimizing the risks.

Due to the accelerated pace of technology and the shift in working styles, privacy and analytics have converged quicker than one might have imagined. As a result, we’ve surfaced some of the most pressing and impactful questions of our time.

The genie is out of the bottle, and the road we take next will make all the difference.