[conspire] (forw) Automated facial recognition (was: Test your facial recognition skills)

Wed Feb 17 03:42:25 PST 2016

To be clear, DHS's FAST program uses facial recognition as only a minor
component.  It aspires to be a 'pre-crime' automated screening method 
to determine criminal intent from a broad spectrum of biometric data
believed to correlate with behaviour.  

It is also, last I heard, an experimental program, not deployed as
production.

Also:  What DHS is experimenting with and telling the public about, you
can bet corporations (casinos, Disneyworld, who else?) are deploying
aggressively and not telling the public about.  More here:
https://vimeo.com/6408425

----- Forwarded message from Rick Moen <rick at linuxmafia.com> -----

Date: Wed, 17 Feb 2016 01:49:51 -0800
From: Rick Moen <rick at linuxmafia.com>
To: skeptic at lists.johnshopkins.edu
Subject: Automated facial recognition (was: Test your facial recognition
	skills)
Organization: If you lived here, you'd be $HOME already.

I'm just starting to catch up on old threads, having needed to keep 
Internet usage sparse during our ocean crossing.  A week ago, seeing
just the Subject header, I'd wondered if this thread were about the
accuracy of surveillance-type facial-recognition by machines.  I see it
wasn't, but expect it's OK if I digress onto that.

So:  Facebook, Google, Twitter, and such companies with huge collecitons
of other people's tagged digital photos are monetising them.
(Facebook's collection comprises someething like 13 _trillion_ photos.)
FBI has a database of 52 million faces, and describes its integration of
facial recognition software with that database as 'fully operational'.
The agency's director claims its database wouldn't include photos of
ordinary citizens, though this is demonstrably contradicted by its own
documents
(https://www.eff.org/deeplinks/2014/04/fbi-plans-have-52-million-photos-its-ngi-face-recognition-database-next-year) .

Everyone appears to be rah-rah about how successful this is going to be
in every possible application, if not today in year n, then surely in
year n+1 -- and indeed in some applications it works well enough.
However, when I heard that DHS seriously expected to use automated
facial recognition as the reason to detain Bad People in airports and
elsewhere (the 'FAST program' - Future Attribute Screening Technology,
started in 2012), I thought 'Guys, you've never heard of the base rate
fallacy, have you?'

Or, to put it another way, DHS is yet another institution needing to
learn Bayes's Theorem.

Base rate fallacy is the fallacy of ignoring the probability-skewing
effect of a low base rate.  I will explain:

For the terrorists-in-airports example, that would be the probability
that any random person walking through an airport is actually a
terrorist.  Let's say an example airport has 1 million persons walking
through it in a year (it's a small regional), and it's very popular with
terrorist such that we expect 100 terrorists to walk its halls in that
year.  So, the base rate of being a terrorist in the scenario is 0.0001.
The base rate of being a non-terrorist in the scenario is 0.9999.

DHS gets the 'FAST program' going at the airport, and stocks its
database with super-studly spook-approved photos.  And DHS claims the
software is really, really good!  1% error rate!  Specifically, it says:

o  Actual terrorists fail to trigger the klaxon 1% of the time (false
   negative).  And...

o  Non-terrorists trigger the klaxon 1% of the time (false positive).

(These are invented example numbers of mine, but I think within a
realistic ballpark.)

DHS sends out a press release reporting glowingly positive results,
because the system is '99% accurate'.

But what does '99% accurate' really mean in this context?  It merely
means a low error rate, not high accuracy.  The accuracy is actually
piss-poor, because, observe:

9,999 non-terrorist travelers during the studied year got slammed up
against the wall by the brute squad -- along with 99 terrorists, for a
total of 10,098 klaxon soundings.  So, the probability that a person
triggering the alarm actually is a terrorist, is only about 99 in
10,098, which is 0.98% accuracy.

I call _accuracy_, here, the probability of terrorist given klaxon, 
whcih we'll call  'p(terrorist|K)', where p() means probability of, and
the | character means 'given'.

Bayes's theorem says:

p(terrorist|K) =  p(K|terrorist) times p(terrorist) divided by p(K).

p(K|terrorist) is 99 / 100 = .99000000 (1% false negative)
p(terrorist) is 100 / 1000000 = .00010000
p(K) = 10098 / 1000000 = .01009800

Probability of terrorist given klaxon is thus .00980392 or only 0.98%
accuracy -- less than 1% accurate, though I have little doubt DHS would
call it '99% accurate' (ignoring the low base rate).

And the point is, this sort of fallacy occurs _all the time_ when people
talk about probilities and rates of success for infrequent events and
large amounts of data.

----- End forwarded message -----