||[Nov. 21st, 2017|04:10 pm]
Last weekend was spent at OryCon #39 in Portland. Had a great time, got to meet people, interesting conversations, the usual.
Also, Panels! At which I took copious notes. So, let's start there.
First panel of the weekend: Predictive Modeling.
Description from the schedule:
Predictive modeling as an element of surveillance, perhaps predicting future crimes - are we likely to get a "minority report" scenario and if so will society trade that additional oversight for enhanced security?
The answers to the questions are "No" and "They already have"
Courts apparently are already using computer models for both sentencing and setting bail, which I hadn't heard about before. One of the problems with this is that the algorithms used tend to be prprietary and there isn't a lot of (or any) transparency.
It doesn't seem to be any worse, though, than leaving it up to a judge, who makes decisions based on a combination of their own criteria, legal guidelines, and their own feelings about the individual merit of the case. A study was mentioned that showed of all the factors, including type of crime, race of the defendant, lawyers, and/or victims, and many others, the largest influence on bail amounts was what time of day the hearing was. Defendants who are brought before the judge shortly after lunch when they are relaxed and happy tend to get lighter bails. This is also obviously problematic.
They've also tried to apply predictive modeling to parole hearings. Which are based largely on recidivism risks. The problem with that turned out to be that the largest, by far, factors that indicate whether a convict will re-offend are:
# Whether there was another immediate family member in prison
# The presence of a father figure when they were young
# The income of their parents when they were younger
In other words, things that the convict could do nothing about. One could construct a model in this case that could accurately predict recidivism, but one shouldn't. That is, these are all major factors that shouldn't be taken into consideration when granting parole despite the fact that they will greatly influence the outcome.
TSA and their various attempts was brought up. One panel member, who had worked with TSA on developing their processes, said that the biggest thing that makes airport security work was a slow-moving S-shaped line. People moving through the line day after day will create a pattern that anyone watching it for all those days will quickly learn to recognize and spot deviations from. I have no idea how valid that is, but it seemed to be accepted by most of the panelists.
Which brings me to their next major point, which is an issue I've expressed before with predictive modelling: How do you validate it? Predictive algorithms, it turns out, are very good at confirming prior biases.
A few years ago, I worked for a company that made a tool that was used for, among other things, intelligence analysis. While it was extraordinarily good at organizing, correlating, and finding patterns from diverse inputs, I noticed it wasn't that great at providing any kind of counterbalance to test the conclusions drawn from it. The tool could have been a conspiracy theorist's wet dream. I'm pretty sure that using it, I could have put together a case for invading Iraq far better than what the Bush administration did, one that would probably have even convinced me of the necessity of doing so.
And even when testing, it's good to know if your tests are actually testing what you think they're testing. One recent fad that's become popular among especially racist government officials, because there was data that at first glance seems to bear it out, is the concept of "Predictive Policing". Sounds good in theory, you send the police in greater force to where there are more crimes occurring. And here's an example of where subtle bias gets magnified. One way to measure crime rates of an area is by arrests being made. So, they chose those areas to send more police patrols to. Which, of course, leads inevitably to more arrests. So, they're finding more criminals now, which proves the value of the model right? Well, no. Of course not. But the data's there, so people still use the concept though not, as far I can tell, in good faith.
What does work, someone asked. The beat cop, was the unanimous answer. 3/4 of the panel happened to be from Brooklyn and all mentioned the friendly neighborhood policeman of their youth. A cop walking a beat got to know the people in the neighborhood, and it humanized police and civilians mutually.
I have my doubts as to how well it really worked, and how free of the taint of nostalgia those memories really were, and also wonder if, even if it did once work, if it would still today when neighborhoods tend to be much more fluid and dynamic than they were of old. Either way, with modern weapons and the incredible resources available to street gangs these days through illicit drug sales I don't know if we can really count on officer Krupke to keep the Jets and the Sharks in line anymore.
Moving away from policing, I have a couple of notes that I'm not sure what they're about:
A quote: "Nothing is as much a barrier as an anonymous piece of software." Unfortunately, I don't remember a barrier to what.
"Arbitrary Strangeness" System/Machine Learning -- Not sure what this one's about, either. Possibly a note on something to google later. I haven't gotten around to it yet, though.
An audience member asked about the methods portrayed by Jack Reacher. The panelist with the intelligence background said no, there's not a whole of accuracy there. It's not like the Tom Clancy novels that freaked out everyone in all our spy agencies because they thought there'd been a leak. On investigation, it turns out there hadn't been, he just made a lot of smart guesses and ran them through several very smart beta readers to refine them and find holes, which he then patched. (Proving the value of beta readers, I guess, so another heartfelt thank you to mine!)
They also mentioned the Go-playing computer. For years, Go was thought to be out of reach of AI, as too complex, but there is now a computer that routinely beats the best human players. Previous AIs were trained by playing against human opponents. The new one though (Sorry, I don't remember its name) was trained by pitting it against itself. Which allowed it to learn much more quickly, as it could move faster than any human and play orders of magnitudes more games allowing it to learn that much faster.
Back to verification - predictive modeling could perhaps be eventually used as *one* of the data points in an investigation, but certainly not the whole thing. Just like fingerprints. It's possible, even not that rare, for two people to have fingerprints that match in all the points of data that they look at (they only really consider a dozen or so points, not the whole print, when making a match.) Even DNA is not a perfect solution, although again it's a very strong piece of evidence when used in conjunction of other evidence pointing to the same thing. Corroboration of different proxies is necessary, although it can be very tempting to want to find a single easy method of separating fact from non-fact.
There's also the problem of the "subtly corrupted" training set. Aside from deliberate biases, or even common unacknowledged prejudices, there's the problem of simply lacking correlation. One panelist gave the example of sorting pictures into categories for real estate. Two of the categories were "overhead image" and "map". The problem there is that these two categories are not as neatly defined as one might think. What about an aerial photograph with the streets labeled? Or a google map cutout with satellite imagery? It turned out in one case, some members of the team were calling anything with text on it a map and others were calling anything with a photograph on it an overhead image, and so the AI was never able to know when its guesses were right and thus couldn't improve and never figured out how to tell the difference.
There's another, apparently famous, case where an AI was trained to recognize cats in images. Which it learned to do pretty well. But it also learned to recognize something that could only be described as "something like a cross between a goat and an ottoman".
Next up: Realities of Leaving Earth.