There is a famous Dilbert cartoon from a few years ago in which Dogbert advises Dilbert’s company that they can generate revenue from the information they hold about their customers, but first they have to dehumanise the enemy by calling them data. This pretty accurately summarises one approach to data management that has pervaded the early years of this Information Revolution. However, as the tools and technologies for gathering, analysing, and acting on information become increasingly powerful, we find ourselves facing a tipping point in our love affair with these technologies. This tipping point is all the more pronounced as we consider the impact of data-driven processes on democratic processes and human rights around the world.
The question of ethics in information management is often conflated with the challenges of managing data privacy, particularly in an increasingly interconnected information landscape. However, privacy is the entry point for any meaningful discussion of ethical issues in information management. When we begin to look at the various ethical issues that arise in the implementation of ‘big data’, we see that the real privacy issue is not simply the potential loss of privacy and individual agency in an age when we are transparent to the algorithms, but rather the issues that arise when we must trade off privacy against other issues or benefits. If information should be processed to serve mankind (as Recital 4 of the General Data Protection Regulation tells us), as we dig deeper into the ethical issues, we find further questions of ethics and ethical conduct that impact on that fundamental principle of ethical information management.
Big data raises ethical questions
In Chapter 3 of our book, Ethical Data and Information Management, we look at examples such as the ethical questions raised when the tools for big data analytics can only run on technology that is affordable in the First World, a problem which has led one data scientist and blogger to explore the potential for what he calls “Cheap Data Science”. The ethical question here is simple: is it fair that the future, to paraphrase the science fiction author William Gibson, “is here, but not yet evenly distributed”, and that the very people who might best benefit from improved data analytics of issues such as soil erosion or the spread of disease, cannot because of the barrier to entry created by the bias about system performance and network capabilities that developers living in affluent Western economies have baked into the design of the very technologies that data analysts in developing countries would benefit from having. Is it ethically responsible or sustainable to design software and tools that only work reliably in wealthier developed nations?
We also look at the potential benefits and harms of granular tracking and microtargeting of students at university level, in which the prevailing mindset of ‘more data is better’ has lead to the development of technologies that analyse and predict student behaviour, performance, and potential to drop out. However, there is every reason to believe that the headline success stories are simply describing correlation rather than causation. This raises additional ethical issues in the data-driven world where success stories are often not subjected to the rigorous scrutiny that they should otherwise be subjected to. In the case of the burgeoning EdTech sector, the unanswered question that needs to be addressed is whether the investment in technologies to track students and their performance and their interactions with course work is the cause of higher grades and better performance as claimed, or whether students who would perform better and get higher grades are attracted to courses that have these cutting-edge facilities available? Is the relationship described causation or correlation?
Furthermore, even if there is a causal relationship, there has been limited research on the potential downsides of this type of invasive student tracking. The research that has been done raises concerns about the impact on pedagogic methods in universities, and also raises concerns about student privacy and chilling effects on independence of thinking and expression among students, and on the choices that students or parents might make about course selection or their academic performance.
Ethical concerns of algorithmic bias
The issues of algorithmic bias in artificial intelligence (AI) also give rise to ethical concerns, particularly when the questions of the inherent bias in training data are taken into account. While these algorithmic processes are often hailed as beneficial to society through time and cost savings, often they come with a hidden cost. For example, in Chapter 4 of our book we look at the problems with systems like COMPAS, a sentencing support system used in the U.S. courts system, which journalists at ProPublica found to be ‘remarkably unreliable’ in its predictions. White defendants were nearly half as likely to be flagged for potential risk of reoffending as African-American defendants and the sentences recommended tended to be longer.
The question of how we train AI systems is, in and of itself, an ethical choice. In many respects when we are developing AI systems we are acting as parents, imparting values and supporting the development of ways of thinking about issues and inferring facts from the available data. The quality of the models we develop is directly influenced by the quality of the models we are developing from. In the example of COMPAS, a likely root cause of the inherent bias in the system is, for want of a better expression, the inherent bias in the system. Historical court rulings and case studies were used to train the AI. Historically, certain ethnic groups have faired better or worse in the US criminal justice system. Similarly, facial recognition machine learning inherits biases from the images used to train it.