Engineer working on storage system in data center showing Microsoft deletion of its facial recognition database
Microsoft Deletes Massive Facial Recognition Database by Nicole Lindsey

Microsoft Deletes Massive Facial Recognition Database

Amidst growing concerns about the potential misuse of facial recognition software, tech giant Microsoft recently deleted a massive facial recognition database, MS Celeb, which contained more than 10 million images of nearly 100,000 people. Since its original debut back in 2016, MS Celeb has become an important part of testing and training computers to recognize the images of people, and has become the largest publicly available facial recognition data set in the world. Some of the most powerful facial recognition algorithms in the world, in fact, have been trained using MS Celeb. For that reason, the move by Microsoft to pull the plug on its facial recognition database received plenty of attention by researchers and data scientists, even if mainstream media coverage of the event was limited.

Negative PR surrounding facial recognition databases

For the past two years, facial recognition software has come under increasing criticism by data privacy experts, human rights organizations and ethicists. As they see it, facial recognition databases have the potential for dangerous abuse by law enforcement agencies, corporations and government authorities. Case in point: the MS Celeb facial recognition database has been used by Chinese tech firms such as Sensetime and Megvii to aid Chinese government authorities in using facial recognition tools to track and oppress ethnic minorities, especially those in the region of Xinjiang. In addition, military researchers have also used MS Celeb to train facial recognition software, leading to concerns that the original, everyday uses of facial recognition technology –  such as to unlock a smartphone – might have been supplanted by other, more sinister uses for the technology.

For that reason, Microsoft has been increasingly vocal about the need to regulate facial recognition technology. Top Microsoft executive Brad Smith, for example, has warned about the potential for facial recognition databases to erode civil liberties. And Microsoft recently scored a PR win when it turned down a lucrative contract with California law enforcement authorities to use the facial recognition database for policing U.S. citizens. Viewed in that context, the decision by Microsoft to delete the facial recognition database might be seen as part of a broader strategy to de-emphasize the use of facial recognition technology in everyday life.

However, there might be another explanation for why Microsoft pulled the plug on MS Celeb. A damaging report in the Financial Times uncovered the fact that many of the individuals in the database were not even “celebrities” – they were also writers, journalists, authors and activists. In other words, exactly the types of people that an authoritarian regime might want to silence. Moreover, hardly anyone contacted about the database even knew they were in the database or gave their consent to be included. According to Berlin-based artist and researcher Adam Harvey, who first publicized the MS Celeb database via his Megapixels project, it looks like Microsoft just scraped the web for images with a Creative Commons license that could be added to the database without obtaining any sort of consent.

Complicating matters further is the fact that Microsoft says the database was intended for purely academic purposes. Someone no longer with Microsoft created it. Getting rid of the massive MS Celeb facial recognition database, then, was simply an internal procedure that would have been completed sooner or later, and not as part of a cleanup of a top secret project.

Measures to control the mass proliferation of facial recognition technology

Whatever the reason for Microsoft deleting the 10 million-image facial recognition database, the fact remains that the data and tools used to process and analyze that data are widely available on the Internet. There’s no putting the genie back into the bottle. For example, on GitHub, it’s possible to access cleaned-up versions of the facial recognition database for any sort of use you might dream up. Moreover, some of the sophisticated tools used to work with and manipulate the images are widely available online, with no need to reinvent the wheel.

And, since the MS Celeb facial recognition database has been around for nearly three years, it’s a safe bet that the full database has been downloaded countless times by academic researchers all over the web. Whenever needed, these images could theoretically be accessed off the hard drives of researchers. The fact that researchers at Duke and Stanford have also deleted their facial recognition databases suggests that a number of large research institutions have similar sets of images available for academic use.

The problem with facial recognition technology

What makes the MS Celeb facial recognition database potentially so dangerous is the fact that people are now free to use it in ways that were possibly never even imagined. The original database was intended for academic purposes, as a way of training machines to add captions to images and to analyze news videos. But large firms like IBM and Panasonic, clearly seeing the appeal of the facial recognition database for other artificial intelligence purposes – such as spotting a single face in the crowd for law enforcement authorities – began to use MS Celeb in new ways. As AI has become more and more powerful, the potential applications continue to multiply. Just a few years ago, for example, could anyone have imagined that AI-powered algorithms would be helping social media companies like Facebook to analyze faces of people in images?

In 2019 and 2020, it’s easy to see why facial recognition technology is likely to be such a hot-button issue with privacy activists. San Francisco, for example, became the first city in the world to ban the use of facial recognition technology by government authorities. And some states are now working on legislation to curb the use of facial recognition technology – or, at least, to ensure that companies follow European General Data Protection Regulation (GDPR) guidelines when it comes to the use, storage, and processing of any facial recognition data.

#Microsoft deleted MS Celeb which has become the largest publicly available data set used to test and train computers on #facialrecognition. #respectdataClick to Tweet

Ultimately, the decision is up to the big tech companies – including Microsoft and Amazon – that stand to profit the most from the wide-scale acceptance of facial recognition technology in everyday life. If they decide that the price of using such technology is too high (in terms of the PR backlash and loss of stock market valuation), then that might lead to even stronger momentum to regulate facial recognition technology so that it can be used for benevolent purposes only.