[sf-lug] [BALUG-Talk] BayPIGgies meeting Thursday January 8, 2009: Scrape the Web by Asheesh Laroia

Jesse Zbikowski embeddedlinuxguy at gmail.com
Mon Jan 5 19:05:21 PST 2009


On Mon, Jan 5, 2009 at 6:23 PM, jim <jim at well.com> wrote:
> Scrape the Web: Strategies for programming websites that don't expect it
> by Asheesh Laroia

I regret that I can't attend but, as I've been scraping a bit myself
lately, I wonder if people have considered using proxies to prevent
being banned?  I have never gotten banned by a server so I don't know
if proxying is worthwhile, but I use libwww-perl which allows me to
proxy my requests.  Drawback is dealing with lots of extra timeouts
and retries.

Agenda for the rest of the talk looks excellent; I would especially be
curious about interacting with JavaScript (I've considered using Rhino
for this).  It would be great if a repeat in SF could be arranged.




More information about the sf-lug mailing list