if i thought you were listening, i'd never say a word: Cathy O'Neill on Social Justice Algorithms

Dr Cathy O’Neill, aka mathbabe, a former Algebraic Geometer turned Wall Street Analyst turned Data Scientist / Activist, has a best-seller and has just been appointed a Bloomberg columnist. Her target is the algorithms used in complex decision-making…

Here’s her conclusion:

The irony is that algorithms, typically introduced to make decisions cleaner and more consistent, end up obfuscating important moral aspects and embedding difficult issues of inequality, privacy and justice. As a result, they don’t bypass much hard work. If we as a society want to make the best use of them, we'll have to grapple with the tough questions before applying algorithms -- particularly in an area as sensitive as the lives of our children.

Well, actually the social justice bit is not the most important issue here. First, something about “algorithms”.

Human judgement was first replaced by algorithms in bank lending and insurance. It turned out (apocryphal source) that human bank managers got it right 82% of the time, and the credit algorithms got it right 85% of the time. For even a small unsecured loans book of a billion pounds, generating £250m of new business a year, that three per cent ibuidls over a short time to a steady £7,500,000 a year of extra profit. More than enough to pay for a couple of dozen credit analysts, their computers and some SAS licenses.

Credit algorithms are brutal. No spare money at the end of the month after paying all your bills and feeding the family? Sorry, no loan. A couple of missed credit card, council tax or gas bills? Sorry, no loan either. A CCJ (County Court Judgement) issued to you at your address? Don’t let the door hit you on your way out. Before you mutter something about banks only lending to people who don’t need it, remember that the bank is lending your savings. You don’t want the bank to turn round and tell you your savings are gone because it was loaned to people who needed the money so bad they couldn’t pay it back - do you? OK. That’s clear then.

Most people can’t make their payments on time, or don’t have spare cash at the end of the month, are low-paid rather than irresponsible. People are low-paid because they don’t have the technical skills, education, or professional persona to earn better salaries. They may also lack the neuroses, character and moral defects, dysfunctions and ability to live without much social life that characterise many of the people who do earn in the top decile of salaries. But let’s not go there, and stick with the lack of education and social skills. Those, in the Grand British Narrative of the Left, are class- and culture-biased behaviours, which fortunately cut across race, creed and colour. In the Grand American Narrative of the Good People, it’s all about race, gender, religion, and economic status - because there is no "class system” in America. Cathy O’Neill is one of the Good People, so she’s concerned that the algorithms may have social injustice embedded in them.

Nobody gets too worked up about bank lending decisions because they are based on past financial behaviour and indicators. Those have an obvious relevance to a lending decision. However, what if the bank refused you because it picked up friends on your Facebook feed who were bad risks? Big Data says that in all sorts of ways we tend to act as our friends do, so it might seem relevant to see if we hang out with financial losers. Everyone lurves Big Data because smart and cool and computers. But how is this not the same as the local gossip saying that we shouldn’t lend to someone because she hangs out with losers? Did the banks hire all those PhD’s just to have them behave like the village busybody? (That’s my objection, not Dr O’Neill’s.)

When the decisions are about sentencing, parole, or taking children into the Social Services system, we would like the algorithms to be a lot better than the local gossip. And Good People want the algorithms to be socially-just as well. Here are the points O’Neill makes about a system called Approach to Understanding Risk Assessment (AURA) introduced in Los Angeles, to help identify children at risk.

The conclusions that algorithms draw depend crucially on the choice of target variable. Deaths are too rare to create discernible patterns, so modelers tend to depend on other indicators such as neighbor complaints or hospital records of multiple broken bones, which are much more common and hence easier to use. Yet these could produce very different scores for the same family, making otherwise safe environments look dangerous.

The quality and availability of data also matter. A community where members are reluctant to report child abuse, imagining it as a stigma or as a personal matter, might look much safer than it is. By contrast, a community that is consistently monitored by the state -- say, one whose inhabitants must provide information to obtain government benefits -- might display a lot more “risk factors.”

AURA, for example, uses contextual information like mental health records and age of parents to predict a child's vulnerability. It’s not hard to imagine that such factors are correlated to race and class, meaning that younger, poorer, and minority parents are more likely to get scored as higher-risk than older, richer parents, even if they’re treating their children similarly.

Her concern is that AURA will have too many false negatives, as the sneaky White People With Jobs stay off the radar. The result will be “unfair” treatment of the people who are correctly modelled. There’s a much bigger elephant in the room. AURA is an appalling model, as O’Neill describes:

In a test run on historical data, AURA correctly identified 171 children at the highest risk while giving the highest score to 3,829 relatively safe families. That’s a false positive rate of 95.6 percent. It doesn’t mean that all those families would have lost custody of their kids, but such scrutiny inevitably carries a human price -- one that would probably be unevenly distributed.

In other words, the next prediction from AURA is overwhelmingly likely to be wrong. Why? Do these people not know what they are doing? Well, I have tried using propensity modelling on a rare event, and got the same result: a horrible level of false positives. After checking my work and berating myself for a lack of creativity, I thought the issues over, and realised that this was caused by the rarity of the event and the nature of the facts I had to use. There is no hope of ever getting a decent predictor for an event as rare as child abuse. First, because it’s rare, and second, because it’s kept private, which is O’Neill’s second point in the quote. By contrast, defaulting in bank loans is a lot more common amongst borrowers than you might believe, and happens within a much smaller chunk of the population than “all parents”.

Propensity modelling started in direct marketing, and even models with much worse false positive rates can help improve profits by cutting down the number of mail shots. What’s good for junk mail is not acceptable for families. Propensity models of rare events are wholly unsuitable for sensitive issues around rare events, not because it "obfuscates important moral aspects and embeds difficult issues of inequality, privacy and justice”, but because the model will inevitably be awful.

That doesn’t mean Dr O’Neill needs to find a new line of work. Big Data research exercises are not expensive and in these kinds of cases a negative result can be valuable. Knowing that there is no group of reliable, accurate markers for child abuse can help dispel prejudice and old wives’ tales, challenge professional folk lore and force policy-makers to think about what they can and cannot achieve. Helping children who are found to be abused is and something a caring society should try to do. Claiming that you can prevent child abuse, when you know it can’t be reliably identified or predicted from publicly-availalble facts, is just irresponsible.

And all the Big Data in the world won’t overcome the cowardice that allowed child prostitution rings run by members of minority groups to operate for years, even though the police and social services knew about it. Which doesn’t mean someone shouldn’t do Big Data research, but it does mean that its issues need to be put in context and proportion.

if i thought you were listening, i'd never say a word

Monday, 30 January 2017

Cathy O'Neill on Social Justice Algorithms

No comments:

Post a Comment