Dataset was considered the largest published and was used to train neural networks.
Microsoft removed the MS Celeb database, which contained 10 million face images. Pictures were taken from open sources on the Internet, but without the demand of people who got there. This was noticed in the Financial Times.
MS Celeb database was considered the largest public dataset of individuals. It was published back in 2016 and since then many companies have used it to train face and neural network recognition systems.
Microsoft did not ask for permission to use individuals because they considered them celebrities and used images under a free Creative Commons license. However, German researcher Adam Harvey (Adam Harvey) found that dataset also contained pictures of private individuals, for example, journalists.
After that, the data disappeared from the Microsoft site. In a conversation with FT, the company explained the removal of the corporate protocol. According to Microsoft, the site was intended “for academic purposes and was launched by an employee of the company who no longer works in it.”
As noted by Harvey, removing the database does not solve the problem. It has already been copied to many companies and ordinary users and used to train their projects.
You can’t make datasets disappear. As soon as you publish it and people download it, it begins to exist on hard drives all over the world.
Adam harveycyber security investigator
In 2018, Microsoft President Brad Smith asked the US Congress to take steps to control technology that “has a wide potential for abuse.” After that, the company stopped selling face recognition systems to the Californian police.