[sf-lug] [BALUG-Talk] BayPIGgies meeting Thursday January 8, 2009: Scrape the Web by Asheesh Laroia

Asheesh Laroia asheesh at asheesh.org
Mon Jan 5 20:52:56 PST 2009


On Mon, 5 Jan 2009, Jesse Zbikowski wrote:

> I regret that I can't attend but, as I've been scraping a bit myself
> lately, I wonder if people have considered using proxies to prevent
> being banned?  I have never gotten banned by a server so I don't know
> if proxying is worthwhile, but I use libwww-perl which allows me to
> proxy my requests.  Drawback is dealing with lots of extra timeouts
> and retries.

Oh, heck yes. Proxies are good. In my work at Creative Commons, we make a 
bunch of Yahoo! API queries, and we make more than the daily limit. So we 
just proxy them through Tor.

> Agenda for the rest of the talk looks excellent; I would especially be 
> curious about interacting with JavaScript (I've considered using Rhino 
> for this).  It would be great if a repeat in SF could be arranged.

I'll see what I can do. (Or you can come to PyCon! ;-)  We do have two 
months, so that could well happen.

-- Asheesh.

-- 
Just because the message may never be received does not mean it is
not worth sending.




More information about the sf-lug mailing list