Just how extensive, accurate, and closely monitored is the New York City Police Department’s facial recognition technology? At the moment, no one really knows, but these are the key questions animating a freedom-of-information lawsuit against the NYPD by researchers from the Center on Privacy & Technology at Georgetown Law.
The lawsuit was filed about two weeks ago in response to repeated refusals by the nation’s largest police department to release virtually any documentation related to its use of facial recognition software, which ostensibly helps law enforcement agencies match imagery of suspected criminals (cellphone shots taken by witnesses, for example, or surveillance video, or even police lineups and live video feeds) against existing databases — be it their own, or those of other agencies (passports, for instance, or driver’s licenses).
The records were first requested in January 2016 as part of a broad series of Freedom of Information Act requests filed with state and local law enforcement agencies. More than 50 agencies eventually did respond with details on the extent of their usage of facial recognition technology. Those responses were included in the center’s yearlong study “The Perpetual Line-Up,” the most comprehensive analysis of law enforcement’s use of this technology to date.
CONVICTIONS: Where science & criminal justice meet.
Among other findings, the report concluded that half of all American adults — some 117 million people — are enrolled in unregulated facial recognition networks used by state and local law enforcement agencies. “At least 26 states (and potentially as many as 30) allow law enforcement to run or request searches against their databases of driver’s license and ID photos,” the report revealed.
The effect, the authors argue, is a virtual lineup, with algorithms replacing eyewitnesses. That might sound like a good thing, given that psychologists and other researchers have shown repeatedly that human eyewitnesses are notoriously unreliable. Surely a dispassionate algorithm would do a better job at singling out perpetrators?
Not always, it’s turning out — particularly if you happen to be nonwhite, and even more particularly if you are African-American. What’s worse, no one really knows why it happens, or how pervasive such algorithmic biases might be, because very few such systems are tested for internal biases, and little academic research has been undertaken.
Setting aside the invasiveness of facial recognition — the only one of the many biometric techniques currently used by law enforcement, from fingerprinting to DNA analysis, that does not require the explicit consent of the subject, or even a court-ordered warrant — studies have shown that the underlying algorithms are better at identifying some races over others. The Georgetown team offered a few explanations for why that might be so —programmers’ biases, for example, and lack of diversity in the photos used to “train” the software.
But two troubling realities remain. First, the few studies that have been done on facial recognition software suggest a persistently lower accuracy rate for African-American faces — usually about 5 to 10 percent lower than for white faces, and sometimes even worse. This raises concerns, given that African-Americans are already overscrutinized by law enforcement. It suggests that facial recognition technology is likely to be “overused on the segment of the population on which it underperforms,” the Georgetown team warned. Not only does that mean that the software will more often fail to identify a black suspect, it means that innocent black suspects could more often come under police scrutiny — precisely because the software failed.
Secondly, the genesis and extent of this problem remain something of a mystery, because racial bias in facial recognition software is profoundly understudied, particularly considering that the technology has evolved over decades. The Georgetown analysis called research on this front “sparse,” identifying just two relevant studies as “some of the only lines of work to investigate this phenomenon.”
The report notes that the National Institute of Standards and Technology, or NIST,
which has run a face recognition competition every three to four years since the mid-1990s, has tested for racial bias just once. The problem may be related to demand: Even jurisdictions like the San Francisco Police Department — which required prospective face recognition vendors to demonstrate a target accuracy [level], provide documentation of performance on all applicable accuracy tests, and submit to regular future accuracy tests — did not ask companies to test for racially biased error rates.
This state of affairs is not limited to the government or academia. In the spring of 2016, we conducted interviews with two of the nation’s leading face recognition vendors for law enforcement to ask them how they identify and seek to correct racially disparate error rates. At that time, engineers at neither company could point to tests that explicitly checked for racial bias. Instead, they explained that they use diverse training data and assume that this produces unbiased algorithms.
“My biggest complaint — and this is a failure of the academic community — is that there isn’t enough testing of these algorithms,” said Jonathan E. Frankle, a co-author of the Georgetown study and a doctoral candidate in computer sciences at the Massachusetts Institute of Technology. “Very few people have conducted these studies.”
The inaccuracies are troubling but nothing new. Many readers might recall that, back in 2010, consumer-grade facial recognition software was famously failing to detect that Asian users had their eyes open, or that black users were in the frame at all. Facial recognition software used in web services like Flickr and Google have tagged African-Americans as primates.
The algorithmic bias has been described as “the coded gaze” by Joy Buolamwini, an MIT Media Lab graduate student, in a nod to the literary and sociological term “the white gaze,” which describes seeing the world from the perspective of a white person and assuming, always, that your audience is white.
Buolamwini, a Fulbright Fellow and two-time recipient of NASA’s Astronaut Scholarship, launched the Algorithmic Justice League last year to highlight biases in machine learning. “Why isn’t my face being detected? We have to look at how we give machines sight,” she said in a TED Talk late last year. “Computer vision uses machine-learning techniques to do facial recognition. You create a training set with examples of faces. However, if the training sets aren’t really that diverse, any face that deviates too much from the established norm will be harder to detect.”
Frankle echoed these ideas. “It takes millions of images to run these tests effectively,” he said, pointing to a 2011 NIST study showing that algorithms designed in East Asia performed better on Asians, while algorithms designed in Western Europe and the United States performed much better on Caucasians. “Given the setting where the data was collected,” he was careful to add, “there wasn’t enough data on African-Americans or Latinos or other groups to make any statistical claims.”
Of course, that lack of data speaks for itself, and in this sense, algorithmic bias in machine learning mimics human cognitive bias. Psychologists and neuroscientists have identified this as the “cross-race effect” or the tendency to more easily recognize faces of the race one is most familiar with. This well-documented phenomenon has been a critical and historic impediment to equal justice for African-Americans, Latinos, and other people of color.
The problem is a likely symptom of minorities’ ongoing underrepresentation in the STEM fields — science, technology, engineering, and mathematics. If African-Americans, Latinos, and Native Americans aren’t helping to write the code and engineer the systems that seek to identify faces, it makes sense that those systems will have inherent data gaps where black and brown faces are concerned.
“What can we do about it?” Buolamwini asked in her presentation. “We can start by thinking about how we create more inclusive code and employ inclusive coding practices. Are we creating full-spectrum teams with diverse individuals who can check each other’s blind spots? On the technical side, how we code matters. Are we factoring in fairness as we’re developing systems?”
These remain important questions. As it stands, NIST, the federal technology standards laboratory, recently announced that it had started a new testing regime for facial recognition technology — a move prompted by the Georgetown study. Meanwhile, the NYPD is keeping a tight lid on its own facial recognition program, and to date it has not responded directly to the Center on Privacy & Technology lawsuit.
“We are not commenting,” a New York City Law Department spokesman, Nicholas Paolucci, told me, “while this litigation is pending.”
Rod McCullom reports on the intersection of science, medicine, race, sexuality, and poverty. He has written for The Atlantic, The Nation, and Scientific American, among other publications.
Even small reductions of accuracy harm a scheme in which thousands of random people are being compared with more thousands of people from a “lineup” database. This causes fales positives for the same reason the “birthday paradox” causes you to find people with the same birthday: for N people, you’re making N(N-1) comparisons. When N is large, the absolute number of error soars.
An intelligent, insightful report on the topic. This is something that should make people think.