[sf-lug] HaveIBeenPwned.com (was: Safer Browsing)

Rick Moen rick at linuxmafia.com
Mon Mar 11 00:23:41 PDT 2019


A bit over a week ago, I asked:

>> One of the "safety links" is https://monitor.firefox.com/
>>
>> When I entered one of my email addresses at the above page, the report
>> indicated my email address and data were exposed in two events:
>               ^^^^^^^^^^^^^^^^^^^^^^
> 
> Which means what specifically?  Anyone?  Anyone?  Bueller?


Well, today I got some relevant raw data.  As we've covered, the Firefox
Monitor service appears to be (so far) just a graituitous intermediary
between HaveIBeenPwned.com and members of the public.  That site is a 
well-intentioned public service from Australian security expert Troy Hunt 
( https://en.wikipedia.org/wiki/Have_I_Been_Pwned? ) that notifies
people if their e-mail addresses and 'personal data' -- a term to be
further discussed below -- are known to have been revealed in major data
breaches.

As it happens, over the years, HaveIBeenPwned.com had told me my
rick at linuxmafia.com mailbox and 'personal data' had been revealed in six
data breaches:  

o  sales engagement startup Apollo (July 2018)
o  marketing company Data & Leads (November 2018)
o  marketing firm Exactis (June 2018)
o  Onliner Spambot (August 2017)
o  crowdfunding site Patreon (October 2015)
o  spam hause River City Media (January 2017)

Today, they notified me about a seventh, an 'email address validation
service' named Verifications.io (breach dated February 2019).  Let's
have a look-see, and see what can be learned:


  From: Have I Been Pwned <noreply at haveibeenpwned.com>
  To: rick at linuxmafia.com
  Subject: You're one of 763,117,241 people pwned in the Verifications.io data breach

  You signed up for notifications when your account was pwned in a data
  breach and unfortunately, it's happened.

HaveIBeenPwned.com needs a drama hook to get people's attention, so this
is the 'hook'.  I have to admit, it's nicely done.

  You're one of 763,117,241 people who've had an account compromised in
  the Verifications.io hack of Feb 2019, the details of which you can read
  about here: https://haveibeenpwned.com/PwnedWebsites#VerificationsIO

If you've been around the block a few times, your hype alarm should 
start ringing when you encounter the words 'had an account compromised'.
What does 'an account' mean in this context?  You should stop to think,
wait, am I a customer of some business operating Verifications.io?  Who
on God's green earth is Verifications.io?  Good questions.  Hold that
thought.


  The data disclosed in the breach includes: Dates of birth, Email
  addresses, Employers, Genders, Geographic locations, IP addresses, 
  Job titles, Names, Phone numbers, Physical addresses

Your hype alarm should now be tolling an ongoing bass line, with
voiceover saying 'All of that?  Sounds like quite a laundry list.  Might
this perhaps involve a lazy handwave about data that _might_ in some
cases be present?  Isn't the claim weirdly devoid of specifics?'

But, first things first:  What's Verifications.io?  HaveIBeenOwned.com
says:

'Verifications.io: In February 2019, the email address validation
service verifications.io suffered a data breach.[link] Discovered by Bob
Diachenko and Vinny Troia, the breach was due to the data being stored
in a MongoDB instance left publicly facing without a password and
resulted in 763 million unique email addresses being exposed. Many
records within the data also included additional personal attributes
such as names, phone numbers, IP addresses, dates of birth and genders.
No passwords were included in the data. The Verifications.io website
went offline during the disclosure process, although an archived copy
remains viewable.[link]'

Archived copy link is at Internet Archive,
https://web.archive.org/web/20190227230352/https://verifications.io/ .

'Enterprise Email Validation.  Remove harmful data and bounces from your
list before you send.'  Euphemism, amirite?  Poke a little more, under
Services, Email Validatation:  'Since we specialize in email hygiene, a
large portion of our validation service also involves removing any
threats from your email database. We remove invalid emails and spam
traps from your email list, as well as duplicate emails, litigators, and
consumers prone to complain about commercial email. [...]it is essential
you validate and verify your email database on a consistent basis. If
not you will begin to see a negative effect on your email marketing
campaigns.  Your senderscore will drop, and delivery along with it.  Add
spamtraps or honeypots to it and your IP address is likely to become
blacklisted.  In addition to helping you remove invalid email addresses
we also remove threats and litigators, a real problem lately in the
volatile email marketing arena.'

Aha, so that's the game.  This is or was a service bureau helping
spammers fly under the radar, pruning both invalid/undeliverable
addresses _and_ antispam people/spamtrams/honeypot mechanisms that
create problems for spammers, like greatly hastening the day their
spamhaus IP addresses get widely blocklisted or getting their entire
operations thrown off hosting providers for terms-of-service vioation.

This confirms the already-obvious suspicion that, no, I have never had
any form of business relationship with the scumbuckets operating
Verifications.io.  Thus, HaveIBeenPwned.com's term 'had an account
compromised' turns out to mean a MongoDB database record including
'rick at linuxmafia.com' and some other related data, probably the name
'Rick Moen', part of Verifications.io's large dataset about 763 million
e-mail addresses that fueled their business of helping spammers preen
their mail-out address lists of troublemaking targets.

Here's a pretty good article about the data breach:
https://www.wired.com/story/email-marketing-company-809-million-records-exposed-online/

Excerpt:  

  In general, the 809 million total records in the Verifications.io
  trove include standard information like names, email addresses, phone
  numbers, and physical addresses. But many also include things like
  gender, date of birth, personal mortgage amount, interest rate,
  Facebook, LinkedIn, and Instagram accounts associated with email
  addresses, and characterizations of people's credit scores (like
  average, above average, and so on). Meanwhile, other records in the
  collection seem related to generating sales leads at businesses,
  including company names, annual revenue figures, fax numbers, company
  websites, and industry identifiers for categorizing companies called
  "SIC" and "NAIC" codes.

  The data doesn't contain Social Security numbers or credit card numbers,
  and the only passwords in the database are for Verifications.io's own
  infrastructure. Overall, most of the data is publicly available from
  various sources, but when criminals can get their hands on troves of
  aggregated data, it makes it much easier for them to run new social
  engineering scams, or expand their target pool.

According to
https://securitydiscovery.com/800-million-emails-leaked-online-by-email-verification-service/,
the discovered MongoDB instance was about 150GB, and the e-mail records
portion fo the database associated each e-mail address with zip / phone
/ address / sex / user IP / DOB where known.  And FYI, the way
Verifications.io 'validates' customers' lists of e-mail addresses as
deliverable is, very logically, just sending each address an innocuously
bland, pointless spam that says something like 'Hello'.  If the spam
gets accepted, that validates the target address.  So, sending spam to
help spammers.  Whee.

Nobody is offering access to a copy of the 150GB database for public
inspection, so I cannot say of a certainly what the record about
rick at linuxmafia.com said, but I can confidently predict that it's very
little data and likely as not includes inaccuracies carefully planted by
me -- for the simple reason that I've been actively managing my data
'shadow' for about 35 years.

Try to research the birth date of Rick Moen, resident of Menlo Park,
California.  Unless you pay $50 to a private detective, I'm betting your
outcome after a week of trying will be one of two things:  (1) no
result, (2) February 30 of some year, or (3) February 29 of a
non-leapyear.

A few businesses / government offices have a legal or contractual
entitlement to correct DOB data, but I make a point of messing with the
others, and Feb. 30th (which doesn't exist in any year) is the first
thing I try, to see how badly their data-validation sucks.  If that's
rejected, I'll try Feb. 29th of a non-leapyear.  I might try Feb. 29,
1900, which was a non-leapyear despite being divisible by 100.  If none
of those works, slightly disappointed, I'll pick some other date such as
New Year's Day.

Seeding selectively false Personally Identifying Information (PII) about
myself always seemed a good thing on general principle, but particularly
beneficial if/when distinct dataset-owners compared and tried to
_correlate_ records, so that records purportedly about me would mismatch
and be deliberately corrupted.  (It also helps that my formal legal name
isn't Rick Moen, also.  And no, not a Richard.)  Some other details I
make public as consistent data are deliberatly misleading, too.

Certain totally correct information is among what I deliberately seed to
the public, such as name (but not formal legal name), cellular number, 
residence street address.  Just to lampshade the fact that I'm
doing so, my personal Web page also includes my 'ICBM address', which is
my household's precise longitude, latitude, and altitude.  As planned,
sometimes deranged Internet people have set out to 'doxx' me and been
utterly spooked by the fact that they not only don't have to work to
find me, but I ostentatiously don't hide, and they think whoa, maybe
disclosing this guy's PII is not a wise game plan.  And they're right.
;->


So, anyway, in contemplating all of these 'data breach' matters, you
should start with 'What can I reasonably expect these marketing/spammer
datasets to have about me, do I care, and what measures should I take?'

It's notable that, out of the seven database breaches HaveIBeenOwnd.com
notified me about, six are or were marketing/spam outfits.  The seventh
is Patreon (and I've never dealt with Patreon, but somehow my e-mail
address got in there, without anything being entered my me).
Marketing/spam outfits collect gobs of free-of-charge data from all
over, and here's an interesting thing:  Quality information tends to
cost money.  Free or accurate:  Pick any one.  (This is obviously more
true of information about those of us who are security-aware and
actively manage our digital 'shadows'.)

So, how do you find out what specific data about you is in one of the
'data breaches' that include your e-mail address?  Answer:  In general,
you cannot.  For example, some security researchers maybe have copies of
_some_ data from the 150GB Verifications.io database discovered to have
been accidentally exposed to public access, but they're not going to go
around waving it to the public.

And, IMO, if you take some obvious security measures, data breaches from
marketing/spam companies will be not a serious worry.  What measures?

o  Use good passwords that you never use in multiple places.
   I will carefully avoid, here, launching a discussion of what
   is a good password, because that inevitably results in a 
   painful display of ridiculous advice from technogeeks with zero
   sense of perspective.  If you're really unclear on what is a 
   good password, check some you're considering using against the
   Pwned Passwords database, here:  https://haveibeenpwned.com/Passwords

   The real point is less the password being _good_ than it being 
   unique, i.e., NOT reused by you for multiple things.  People suck
   at doing this on account of biological limitation and need
   technological aids, a point I'll return to below.

o  Mess around with data aggregators.  My 'place of birth' might
   be New Crobuzon for one online site that requests that datum, and
   Atlantis for another.  Of course, this means I store away which 
   site got which birthplace, part of the 'technological aids' subject
   to be covered below.

o  Don't outsource so goddamned much.  All of you people who make 
   everything you use online accessible using your GMail account, 
   so that anyone who cracks your GMail credentials can break into
   just about everything you do, you _do_ know you're being dumb, 
   right?  Webmail is almost as bad a security Typhoid Mary as PHP is.
   In general, if you think the solution to security challenges is
   finding the right set of strangers to handle your security for
   you, then you have totally failed to understand the problem.

o  Be properly skeptical.  E.g., do you have a 'gravitar'?
   (https://en.wikipedia.org/wiki/Gravatar)  Clever primate!
   You've just signed up to be tracked by a commercial company 
   all around the Web.
https://meta.stackexchange.com/questions/44717/is-gravatar-a-privacy-risk
https://meta.stackexchange.com/questions/4553/can-we-use-non-gravatar-avatars/5658#5658
   Did you open a GitHub account?  Clever primate!  It automatically 
   makes a Gravatar for you, and see foregoing.
https://arstechnica.com/information-technology/2013/07/got-an-account-on-a-site-like-github-hackers-may-know-your-e-mail-address/
   (GitHub, now a wholly owned subsidiary of Microsoft Corporation,
   _does_ provide a way to remove the generated Gravatar from 
   your GitHub account, as noted in the comments to that link.)

o  Distrust 'convenience'.  E.g., if your browser suddenly displays a
   login screen for your webmail or online banking, etc. that you didn't
   yourself initiate from a bookmark or typing in the URL, should you 
   use it?  Hell no.  Assume it's fake.  That's how phishing works.
   For similar reasons, distrust autologin settings.


What's _not_ in the above list is the two suggestions pushed by every 
single communication I've received from HaveIBeenPwned.com, so I want to
copy/paste them from today's advisory and talk about why:

  Step 1: Protect yourself with strong, unique passwords for each website
  with the 1Password password manager: https://1password.com/

  Step 2: Enable 2 factor authentication and store the codes inside your
  1Password account.

Both of these suggestions leverage 1Password.  I don't hate it, and it's
a lot better than nothing.  But what is it?  It's an example of passport
manager software.
https://en.wikipedia.org/wiki/1Password

You download a piece of proprietary 1Password software from commercial
firm AgileBits, Inc. for MS-Windows, Mac OSX, Android, iOS, and/or Chrome OS.  
Running it, you can enter passwords and other sensitive data tidbits
(e.g., password remainder questions and answers, your per-Web-site
imaginary date and place of birth, etc.) into it, it stores them a local
encrypted cache (or 'wallet' or 'vault') locked with a master password,
and the program occasionally reaches across the Internet and stores the
'wallet' data on AgileBits, Inc.'s servers, whence you can share them
with your other Internet-connected devices on which you've also
installed 1Password.  You must pay a monthly fee for this (or a year in
advance) for all of this.  Optionally, you can have 1Password supply your 
credentials via a proprietary Web browser plugin (see 'distrust
convenience', above).   They say keeping the master copy outsourced onto their
Internet servers gives you _backup_, and they're not wrong, but, you
might be thinking, shouldn't you be permitted to do your own backup
where _you_ choose?

They have an answer, although they're disturbingly coy about having
clear information about this on the 1Password.com Web site:  You can
purchase a 'standalone license' for a one-time $65 fee (per OS
platform), and may then store your 1Password 'vault' in the place of
your choosing.
https://discussions.agilebits.com/discussion/101987/where-do-i-find-the-information-needed-to-compare-a-stand-alone-license-vs-the-subscription

Why a password manager?  Because basically nobody (with only rare freak
exceptions) can reliably remember a significant number of good
passwords.  Therefore, most people try to 'cheat' in various ways, e.g,
only three strong passwords shared among all the places that matter, or
easily guessable password patterns.  Those people are fooling
themselves, and are easy victims.  It doesn't work.  You need a
technological aid to human memory.

But:  Why _1Password_ at all? 

There are plenty of other ways to store passwords in a 'wallet' file
with a master password.  Some don't require that you sacrifice your
privacy to outsourcing, and are free of charge, and are open source.

Consumer Reports even covered some, here:
https://www.consumerreports.org/digital-security/everything-you-need-to-know-about-password-managers/
That survey article, among other things, links to this site with
information about open-souce option KeePassX, maintained for Linux,
MS-Windows, and Mac OSX.  https://securityinabox.org/en/ 
And there are plenty of other simple ways to deal with this problem. 
Worried about backup?  Periodically copy the password 'wallet' file onto
a USB flash drive and store it in a filing cabinet.  Sheesh.


Short version:  Lots of ways to solve the problem.  If you have money to
throw away and trust AgileBits, Inc., sure, 1Password is way better than
nothing.  Open source offers other options, too.


A word about 'two-factor authentication' aka 2FA
(https://en.wikipedia.org/wiki/Multi-factor_authentication):  Big
companies are pushing the concept as a cure-all, e.g., Google wants
GMail users to enable their version of it, which involves sending an
auth code via SMS to a known cell-phone number when you want to login.  

There's certainly nothing wrong in abstract with 2FA, though I really
would not want the second-nosiest company in the world to have my
cellphone number.  But the point I wanted to make is that outfits like
GMail and Faceplant (er., Facebook, the nosiest company in the world) 
push 2FA mostly as a bandaid to compensate for abysmal password
management by users at their services.  IMO, you would get a lot more
mileage out of concentrating on basic security practices such as my
bullet points, above.





More information about the sf-lug mailing list