IBM has released a new dataset of one million images of diverse human faces in an effort to reduce bias and encourage fairness and accuracy in the development and training of facial recognition technology.
One of the key criticisms levied at AI is that it can be susceptible to human bias and facial recognition software has come under fire for its inability to recognize diversity. In 2015, image recognition algorithms in Google photos infamously classified black people as "gorillas".
To combat this, the legacy tech giant has released the "large and diverse" Diversity in Faces (DiF) dataset which uses publicly available images from the YFCC-100M Creative Commons dataset, which includes images of individuals from a range of genders, ages and skin tones.
"The heart of the problem is not with the AI technology itself, per se, but with how the AI-powered facial recognition systems are trained," explained IBM fellow John R. Smith. "For the facial recognition systems to perform as desired – and the outcomes to become increasingly accurate – training data must be diverse and offer a breadth of coverage.
"For example, the training datasets must be large enough and different enough that the technology learns all the ways in which faces differ to accurately recognize those differences in a variety of situations. The images must reflect the distribution of features in faces we see in the world," Smith concluded.
DiF has 10 coding schemes which includes features such as age, facial ratio and nose length, and it is now available to researchers on request.