Considering that privacy issues are causing the emergence of laws and regulations, companies are seeking to comply with the requirements to protect third-party personal data against loss of confidentiality, integrity, and availability.
To reach their goals, companies need to begin with identifying gaps; that is, finding any deficiencies in information security controls. Those gaps can cause the occurrence of significant incidents, which can lead to losses that go far beyond the potential application of fines and sanctions.
It all starts with the need to identify information assets, which consists of surveying the data of holders in the most diverse environments and media.
In these times of ensuring compliance with Brazil’s General Data Protection Act (LGPD), the search for tools such as data discovery and data mapping has been skyrocketing but is this automated software technology effective?
Considering that the term “Personal Data” (anyone who identifies or makes him/her identifiable) has now become legally provided for in LGPD, one sees that there are two different contexts.
In the first, the data makes the person directly identifiable in the sense that the information is clear about who is being referred to, for example, name, ID, CPF, or other data that uniquely identifies the person. In the second, it would be the cross-referencing of data that would make the person identifiable. It is data that, in isolation, cannot be considered personal, since it does not refer directly to the person.
Observing such tools, it is noticeable that they use an indexed data search for individuals, which would not make the search viable because, until recently, there was no LGPD regulating the capture of this information. The only way would be by using standards such as RG and CPF, for example. The search for data would be limited only to the first context mentioned above, through directly identifying information. There is no such possibility when there are several data variables that make the person identifiable, indirect data.
We should also consider that the main focus of data discovery and data mapping tools mainly aim to perform database and database mapping. In this context, we should further expand this analysis into other environments that may maintain data, such as file systems: content maintained in email systems that are not always covered by these solutions.
In the second context mentioned above, the human factor would be necessary.
Such tools would not be able to do such research but would require a highly developed artificial intelligence environment capable of cross-referencing distinct scenarios and then determining whether the person would be ‘identifiable.’ This technology does not exist yet. Without information that indexes the data to the person, it is not possible to make the identification as proposed by these types of software to meet the LGPD requirements.
We should also consider that some information assets are kept on removable media, whether magnetic, optical or even paper.
Given that the scope for inventorying these assets is far beyond the core capabilities of data discovery and data mapping tools, it can be seen that they meet business needs only to a certain extent. Beyond that point, specific actions should be determined in order to map this information properly. Special attention should be given to data kept in areas such as Human Resources, where the use of paper is still quite common.
We should also consider the use of personal devices for the potential storage of corporate or personal data, especially when they make use of applications or computer programs that may scan documents or even transfer them through systems not approved by the company. These include cloud storage solutions (such as Dropbox) and those destined for cloud file transfer, such as WeTransfer, and the concern about the use of mass instant messaging solutions like WhatsApp.
It is reasonable, however, that data discovery and data mapping tools have some value, as even minimally, they can help identify this data, reducing some of the work. However, regarding the quality of information, it will always be an auxiliary tool for interviews and questionnaires to determine and organize the databases.
Thus, we understand that, depending on the size and components of the company, the best data mapping is always done through interviews and specific questionnaires, conducted by an expert lawyer who understands all the possible ways of identifying a person, which tools like these would be unable to do.
It is recommended that a company executive be aware of these concerns and asset identification considerations, to ensure the understanding that these solutions may still require extra efforts in terms of mapping owner data.