[sf-lug] [BALUG-Talk] BayPIGgies meeting Thursday January 8, 2009: Scrape the Web by Asheesh Laroia
Asheesh Laroia
asheesh at asheesh.org
Mon Jan 5 20:52:56 PST 2009
On Mon, 5 Jan 2009, Jesse Zbikowski wrote:
> I regret that I can't attend but, as I've been scraping a bit myself
> lately, I wonder if people have considered using proxies to prevent
> being banned? I have never gotten banned by a server so I don't know
> if proxying is worthwhile, but I use libwww-perl which allows me to
> proxy my requests. Drawback is dealing with lots of extra timeouts
> and retries.
Oh, heck yes. Proxies are good. In my work at Creative Commons, we make a
bunch of Yahoo! API queries, and we make more than the daily limit. So we
just proxy them through Tor.
> Agenda for the rest of the talk looks excellent; I would especially be
> curious about interacting with JavaScript (I've considered using Rhino
> for this). It would be great if a repeat in SF could be arranged.
I'll see what I can do. (Or you can come to PyCon! ;-) We do have two
months, so that could well happen.
-- Asheesh.
--
Just because the message may never be received does not mean it is
not worth sending.
More information about the sf-lug
mailing list