In 2016, ProPublica, the New York-based nonprofit journalism organization, published an investigation of a computer-based prediction tool called COMPAS. The software, which uses a proprietary cocktail of variables to predict future re-arrests, and is used by judges to determine whether that person should be released or held in jail until their court hearings, was biased against black defendants, and thus unfair, the investigation argued.
The company behind the software, originally called Northpointe but renamed Equivant last year, vigorously defended their algorithm in a follow-up report, and many independent researchers have since pointed out that the concept of “fairness,” in all of its legal and ethical complexity, may not be so easy to define. Arvind Narayanan, a computer scientist at Princeton, details 21 different kinds of fairness, for example. Richard Berk and colleagues at the University of Pennsylvania have described six kinds.
But I would add one further conundrum to the COMPAS debate — one that many commentators have thus far neglected to mention: Achieving ProPublica’s concept of fairness would actually require treating people differently by race, a reality that raises its own set of ethical and even constitutional questions.
The debate over COMPAS is pretty straightforward. The ProPublica reporting team looked at people who had been deemed by COMPAS as a “high risk” to be re-arrested, and yet who were not re-arrested over two years. Among black defendants, that percentage was about twice the rate of white defendants (45 percent compared to 24 percent). On the flip side, the percentage of defendants who had previously been judged “low risk” and yet were later re-arrested was much higher for whites than for blacks (48 percent compared to 28 percent), suggesting that COMPAS was getting things wrong in part on the basis of race.
For its part, Northpointe argued that their analysis shows rough parity in the algorithm’s predictions across races. In other words, both groups labelled “high risk” had about the same chance of being re-arrested (60 percent), regardless of race. They were, they said, focused on a different metric (which we could call “predictive parity”), and a different kind of fairness, from ProPublica.
A number of independent researchers have indicated that a single algorithm simply couldn’t have achieved both ProPublica’s and Northpointe’s kind of fairness. Developing a racially unbiased system in ProPublica’s sense would then almost certainly require doing what stakeholders on all sides are trying so desperately to avoid. “It is possible to modify an algorithm to equalize outcomes across racial groups,” the legal scholars Megan Stevenson and Sandra Mayson wrote in a research paper last fall, “but usually [that] requires treating defendants with the same observable risk profiles differently on the basis of race.”
Why would it make sense to embrace ProPublica’s concept of fairness, if it required treating people differently by race? A crucial reason is that, as advocates and researchers have shown, criminal history data — a major input to pretrial risk assessment tools — is itself often racially skewed, the result of long-standing racist practices in policing and in the criminal justice system. For example, police arrest black people at twice the rate of whites for drug-related charges, though both use and sell drugs at comparable rates. By drawing on skewed data, a pretrial algorithm could not only perpetuate the racial asymmetries already in the system, but could also further exacerbate the disadvantage, leading to a vicious cycle. So, this side argues, decision-makers should adjust the underlying data or the algorithm to make up for the bias in the data. Rather than trying to rid algorithms of any influence of race, the tool would take race into account.
Of course, this raises potent questions, among which: Is it constitutional to treat people differently by race in designing an algorithm?
The question has a long history in other areas like college admissions, loan decisions, and hiring and promotion practices. But there haven’t yet been legal cases related to race-based affirmative action in criminal justice algorithms, and researchers are of mixed views on whether it would be constitutional to use race (or a proxy for race) to adjust algorithms to achieve equal outcomes.
On the one hand, some researchers, like legal scholar Anupam Chander, argue that failing to use race in algorithm design could lead to “viral discrimination” where “algorithms trained or operated on a real-world data set that necessarily reflects existing discrimination may well replicate that discrimination.” On the other hand, Sharad Goel, executive director of the Stanford Computational Policy Lab, and colleagues have written bluntly that treating people differently depending on their race would be “a violation of the fundamental tenet of equal treatment.” One of the outstanding questions is whether a particular way of adjusting an algorithm would hold up under the conditions of “strict scrutiny” that could be triggered by using race in an algorithm’s design.
Beyond the legal questions, there are even bigger ethical questions. Can philosophical distinctions and theories help guide this conversation? This was one recurring theme at a recent interdisciplinary Fairness, Accountability, and Transparency conference held at NYU, which brought together practitioners and researchers in computer science, law, and philosophy. As Princeton’s Narayanan put it, there’s some incoherence in only looking at technical definitions, without bringing in moral frameworks to help clarify and guide us. In order to make progress, he said, “It would be really helpful to have scholars from philosophy talk about these trade-offs [in algorithms] and give us guidelines about how to go about resolving them.” While there’s a long tradition of relevant work in moral and political philosophy on fairness and justice, it’s rare to see work that explicitly applies ethical theories to algorithms (though a few researchers, including Reuben Binns, are doing so).
You might think all of this is just a problem introduced by computer-based algorithms, and that refusing to adopt computer-assisted decision-making technology would avoid the issues. Yet this isn’t true. Whether humans are making decisions on their own, or drawing on computer-based prediction for assistance, the questions about fairness remain mostly the same. The use of computer-based prediction — as compared with human brain-based prediction — simply requires making the decision-making rules explicit, rather than hazy and undefined.
Some advocates, like the AI Now Institute, have suggested that algorithms used by public agencies should be made publicly available, so that they can be independently and transparently audited on their aims and impacts. Another approach might be to consider changing or discontinuing the use of algorithms if their use is found to worsen racial inequities or increase jail populations. And some researchers have urged that we design algorithms which make predictions about things less prone to racial disparities (arrests for violent crimes specifically, as opposed to all crimes, for example). Finally, collecting and sharing information about how the algorithms are used in the real world is essential. Megan Stevenson, researching the effects of adopting a pretrial algorithm in Kentucky, found that judges often ignored the algorithm’s recommendation to release people, and that as a result, there was not a sustained reduction in pretrial jail populations over time. “Despite extensive and heated rhetoric,” she wrote, “there is virtually no evidence on how use of this ‘evidence-based’ tool affects key outcomes such as incarceration rates, crime, or racial disparities.” She has rightfully called for a much more extensive study of the impacts of algorithms.
The conversation about what fairness is and how to achieve it involves a tangle of factors — statistical, legal, and ethical — which makes it difficult to work through. Yet it is hugely important for the wider community to tackle. As Berk and his colleagues write, “it will fall to stakeholders — not criminologists, not statisticians and not computer scientists — to determine the tradeoffs.
“These are matters of values and law,” they add, “and ultimately, the political process.”
Stephanie Wykstra (@swykstr) is a freelance writer and researcher with a focus on transparency and criminal justice reform. Her work has recently appeared in Vox, Slate, and Stanford Social Innovation Review.