[sf-lug] Malware on PyPI repository

Sun Dec 5 16:01:34 PST 2021

Quoting Akkana Peck (akkana at shallowsky.com):

[About https://arstechnica.com/information-technology/2021/11/malware-downloaded-from-pypi-41000-times-was-surprisingly-stealthy/ :]

> It always amazes me how bad articles about malware are. In this case,
> not bothering to mention the names of the packages except for two.

IMO, one significant if not dominant reason for this prevailing badness
is that most articles about malware are slightly adapted press releases
from antimalware companies.  The latter have zero incentive to help
end-users correctly understand security.  To the contrary, their best
interest is served by stirring up users about alleged and real threats
_without_ understanding them, therefore motivating them to buy
antimalware companies' goods and services.  And, with some honourable
exceptions, that's the way they express themselves, which is then
reflected very strongly in the IT press coverage that cribs from them
(which amounts to almost all IT press coverage).

> If you want to see the list of dangerous packages without sifting
> through all the comments to find it, it's at
> https://jfrog.com/blog/python-malware-imitates-signed-pypi-traffic-in-novel-exfiltration-technique/

Basically:  Details the stealthy measures taken to do mischief taken by
11 Python code offerings if one made the error of installing them on a
"what could possibly go wrong?" basis from the third-party "PyPI" code
repository.

IMO, though, that isn't very interesting.  Anyone who installs and
executes untrustworthy code knows -- or learns really quickly -- that
the code can carry out any action its user authority permits.

In my view, it's more interesting to back away and consider larger
context:  What/who is PyPI?  What makes its operation trustworthy or
not?  Is (or was) there meaningful vetting of code contributors and of
what they submit?  If there wasn't (and the smart money is on "there
wasn't"), have they now learned anything?  And why would members of the
general public be sourcing their code from PyPI in the first place?

PyPI Project's self-description:

  The Python Package Index (PyPI) is a repository of software for the
  Python programming language.  PyPI helps you find and install software 
  developed and shared by the Python community.

Looking at the FAQ, I see that this is a means for people on any OS to
circumvent their distro protections (if any) to grab and install Python
interpreted code from a large number of Python coders and add it to a
real system using Python's "pip" installer tool.  Ergo, among other
things, your system package regime (deb, rpm, whatever) won't know
anything at all about what you grab from PyPI.

Some means are furnished to limit the inherent harm doing this creates
to the target system, notably the option to install code from PyPI into 
one of Python3's "venv" lightweight virtual environments, isolating it
somewhat from the system.  You can also decline to run any PyPI code
that wants to talk you into giving it elevated privilege via sudo or
otherwise.

Getting back to my questions:  Who are these guys?  Well, it's an
offshoot of Python Software Foundation.  It's like what CPAN is for
Perl, Gems for Ruby, npm for Javascript, Composer and PEAR for PHP,
NuGet for .NET.  Is (or was) there meaningful vetting of code
contributors and of what they submit?  Well, kind of no.  Judging by the
FAQ, you just contact them and say "Hi, I write cool stuff in Python and
wish to be a project owner on PyPI", and they make you one.  Dan Goodin
at ArsTechnica says:

  Use of open source repositories to push malware dates back to at least
  2016, when a college student uploaded malicious packages to PyPI,
  RubyGems, and npm.  He gave the packages names that were similar to
  widely used packages already submitted by other users.

So, very weak, functionally nil, vetting of new code maintainers and
also of what they submit.  And this really should not be even a tiny bit
surprising.  We've seen this sort of thing over and over and over, on
effectively uncurated (or loosely curated) "bazaar" code hosting sites,
e.g., the older instantiation of addons.mozilla.org (before Mozilla,
Inc. cracked down on the dangerous chaos there), Gnome-look.org, and
dozens of such places, where the "Grab code[1] from here and trust it"
model was ripe for abuse and got it in spades.

Have they learned anything?  Well, not as one might wish.  Or rather, 
looking at it muchless harshly, they're apparently fine with being what
they are, as to the deliberately limited scope of what they provide.
Which is to say, they're willing to accept and act on security reports
that PyPI codebase [X] hax0red your system, by investigating and
removing [X] and mildly swatting its submitter, but that's the limit of
it -- a matter discussed in greater detail here:
https://security.stackexchange.com/questions/79326/which-security-measures-does-pypi-and-similar-third-party-software-repositories

So, takeaway lesson:  If you disregard the gatekeeping protection of
your distro package regime, and go nonchalantly grabbing things from
the likes of CPAN, RubyGems, npm, Composer, PEAR, PyPI, NuGet, or
addons.mozilla.org (even now), gnome-look.org, Ubuntu PPAs, or upsteam
maintainer sites, you are playing with fire and may get burned.  Unix
has provided the rope with which you can efficiently hang yourself, and
will not protect your neck from the harm you are imposing on it.

[1] In the case of Gnome-look.org, the site wasn't supposed to be
hosting code, only GNOME/GTK themes, artwork, icons, splash screens, and
screen savers,  but the bad guy discovered the site didn't prevent him
uploading executable trojan code (set to go off by installing a .deb
package), and he relied on GNOME users downloading his "screen saver" to
be too clueless to notice they were being asked to do something
reckless.  (GNOME screen savers aren't provided in .deb packages.)
https://lwn.net/Articles/367874/
https://www.linux-magazine.com/Online/News/Malicious-Screensaver-Malware-on-Gnome-Look.org

(Heh, notice on the LWN story's reader comments, I gave my take on the
problem then -- in 2009.)