[conspire] The end of PerlHoo?

Rick Moen rick at linuxmafia.com
Fri Mar 20 15:43:43 PDT 2015


tl;dr:  Any Perlista want to fix an XSS problem in a simple 100-line
Perl CGI?


This is a story about how the world changes.

Once Upon a Time 
----------------

In the halcyon dot-com year 1999, there was a Perl teaching project
called PerlHoo in three articles by Jonathan Eisenzopf.  Which I didn't
notice until four years later.

  In our continuing effort to save the world in less than one hundred
  lines of Perl code, we will now embark on a quest to build a complete
  Yahoo-like Web directory. The evolution of PerlHoo will occur over the
  next few issues of Mother of Perl. In this issue, we will build a simple
  implementation in (you guessed it) less than 100 lines of code.

http://www.webreference.com/perl/tutorial/2/
http://www.webreference.com/perl/tutorial/3/
http://www.webreference.com/perl/tutorial/5/

It was a nice little project for 1999.  I found it in 2003, it
being exactly what I needed to organise my appalling sprawl of public
ASCII files.  I also found and expanded a Python script to HTMLise 
the ASCII information files I most cared about.

Setting up PerlHoo is easy.  A little Apache HTTPd logic permits the
PerlHoo CGI to display a virtual webspace directory tree at URL
http://linuxmafia.com/kb/ ('kb' for knowledgebase) by parsing a file in
any directory of the underlying physical tree (/var/www/faq/, on my
server) to parse comma-separated file perlhoo.csv and construct/display
an index.html file based on the CSV values.  Each line (entry) of the
CSV file could point to any URL, either local or remote.

You can see the guts of the tree at http://linuxmafia.com/faq/, where
the perlhoo.csv file in each folder gets parsed to make index.html for
the corresponding virtual webspace folder in http://linuxmafia.com/kb/ .

PerlHoo was exactly what I needed -- as opposed to all the hideously
overengineered CMSes and wikis people suggested when I described the
problem.  The problem was:  'I have HTMLised local files, plus
interesting remote pages, and I'd like to organise them on my Web site.
You know, like the old Yahoo hierarchical catalogue.'  All of the dozens
of suggestions from LUG people were -- sorry -- inane and Didn't Get It.
But I saw PerlHoo and said 'Yes.  Exactly like that.'

PerlHoo had one other function as well, little-used on my site, and this
is where things started to go wrong in early days.  The virtual webspace 
allowed the public to submit candidate URLs to add.  Just like the old
Yahoo Web directory.  Anyone spotting the first snake in this Garden of
Eden?  Anyone?  Bueller?


Comment Spam
------------

Anyone running any site that accepts _any_ kind of HTTP POST or GET
submissions knows this one:  Spammers and scammers blanket the Internet
with automated bots probing all advertised services (including of course
Web servers and their pages), looking for places to spamvertise.
PerlHoo's submission feature is completely devoid of any attempt to
block this.  Results are predictable -- but not a big problem.  Example
from the Knoledgebase's Admin folder:

linuxmafia:/var/www/faq/Admin# ls -l perlhoo*
-rw-r--r-- 1 rick     rick     6601 Jul  8  2013 perlhoo.csv
-rw-r--r-- 1 rick     rick     6486 Feb 23  2012 perlhoo.csv~
-rw-r--r-- 1 www-data www-data 1748 Mar  2  2014 perlhoo_new.csv
linuxmafia:/var/www/faq/Admin# 

perlhoo_new.csv is the submissions from the public.  (perlhoo.csv is the
curated and dislayed Web diretory for the Admin category.)  Once in a
long while -- almost never, really, I've looked through the
perlhoo_new.csv files, and maybe a dozen times over 12 years or so has
there been a human-submitted entry.  All the rest is inane comment spam.
(The dozen or so exceptions tends to be people who didn't quite get what
each folder was for, or were trying to promote their Web sites, the
usual random noise.  Maybe three were ever submissions i liked and moved
to perlhoo.csv.)

So, that part of PerlHoo was a failure for lack of spam control.  But
it can be just ignored.  Some folders, I just chowned perlhoo_new.csv
so the Apache user could no longer write to it, because the feature 
was effectively useless.

But I modestly updated Jonathan's CGI to make it serve valid HTML 4.01
Transitional, and otherwise just loved it for what it was: a simple,
elegant solution to a simple problem.  (Jonathan never touched it after
his 1999 teaching article.  In software-engineering terms, it's orphaned
code.  Or, if you're more of a glass-half-full person, it's a finished
project.)


You Have to Sanitise Public Data
--------------------------------

I don't want to belittle Jonathan Eisenzopf.  I love PerlHoo.  But the
second thing he completely failed to do was sanitise input data.
PerlHoo's CGI takes an incoming URL from the user's Web browser and says
'Oh, you want the virtual webspace index for _this_ directory.'  But 
what if what's submitted is not just the intended URL?  Did Jonathan
make sure contrived data sent to the CGI couldn't trick it into doing
something stupid?  No, he did not.

 Date: Thu, 19 Mar 2015 22:11:22 +0000
 From: Ayoub Tabout <ayoubtabout at gmail.com>
 To: bofh at linuxmafia.com
 Subject: XSS Vuln. in your website

 Hi,

 i've discovred The XSS Vuln. in a subdomain on your website that may
 enables attackers to inject client-side script into Web pages viewed by
 other users. A cross-site scripting vulnerability may be used by
 attackers to bypass access controls such as the same origin policy.

 Here's the url :
 http://linuxmafia.com/kb/Kernel%27%22%3E%3C/title%3E%3Cscript%3Ealert%280%29%3C/script%3E%27%22%3E%3Cmarquee%3E%3Ch1%3EXSS%20found%3C/h1%3E%3C/marquee%3E

Aw, crud.  That's a design flaw in PerlHoo that I probably _shouldn't_
just ignore, because it's a security hole.  Ironically, it's not usable
to attack my site.  It's usable to attack other sites via reflecting
attacks through mine.

Ayoub was kind enough to tell me that PerlHoo completely punts -- fails
-- on one task that all Web applications must do.  It needs to parse
input data to make sure it cannot be used to encode, say, a second
attack URL that the serving HTTPd process then gets tricked into serving
up to the user, making the user carry out targeted attacks against the
user him/herself or against third-party Web sites.  This is called a
'cross-site scripting' (abbreviated XSS) vulnerability.

http://en.wikipedia.org/wiki/Cross-site_scripting

XSSes are a little difficult to wrap your brain around, and a subtle
concept.  The threat model involve a deliberate violation of 'same orgin
policy', where the contrived URL causes content from two places to get
served in a goulash so that untrustworthy content (from, say,
evilsite.com) gets mixed into content the user trusts (from, say,
Linuxmafia.com Knowledgebase).  

Someone could put up in webspace a link to what's _claimed_ to be a
Linuxmafia.com Knowledgebase entry.  The link's URL would indeed be to
linuxmafia.com's PerlHoo CGI, but the URL would also include encoded 
links to 'malicious' content offsite.  And PerlHoo would then fail to 
notice the chicanery and fail to disable the active content reference
being passed through it to the user.

Ayoub proved that PerlHoo provided no protection against it being fooled
in this fashion -- just as Jonathan failed to include any protection
against comment spam.

The threat model isn't an attack on linuxmafia.com.  It's an attack on 
users of linuxmafia.com.  The reason I'm obliged to care is that I want
you to get linuxmafia.com content and not hidden redirects to
evilsite.com when you are seeing my Knowledgebase on your screen.


A Simple Matter of Programming
------------------------------

OK, you are saying.  Retrofit input sanitising into PerlHoo.  Sure.
I'll get right on that.

Except I'm (1) not good and Perl, and (2) so backlogged I can't
reasonably take this on.  Or rather I shouldn't.  Way too much on my
plate, especially since I moderately suck as a Perlista.  (Never claimed
to be one.)

Here's PerlHoo (updated by me to serve valid HTML):
http://linuxmafia.com/pub/linux/apps/perlhoo-linuxmafia-1.21.tar.gz

I will say:  Jonathan writes nice, clean Perl that's a pleasure to read.  
I just am a bit stumped about how to add some lines to sanitise the
submitted URL line that PerlHoo defangs, e.g. puts inside comment tags
or whatever, any URI that's not supposed to be there.

Any actual Perlist willing to try to fix it?  Maybe splicing in a call
to extra CPAN module Filter::Handle::Tainted , the way one of the
Perlmonks suggests here?  http://www.perlmonks.org/?node_id=224782
I really don't know what's the sanest way to code an adequate fix.
Maybe you, the reader, do.

Any Perl coder want to fix this, I'll buy you dinner, or a six-pack of
$GOODBEER, or a nice bottle of wine.


Say Goodbye to PerlHoo?
-----------------------

If I can't fix it in maybe a week, I'll probably just convert the
CGI-generated index file for each folder into a static HTML file and 
remove PerlHoo.  Honestly, I've never gotten any mileage out of
PerlHoo's theoretical dynamic features, so it might as well be flat
HTML, and the content will be exactly the same.





More information about the conspire mailing list