[sf-lug] BayPIGgies meeting Thursday January 8, 2009: Scrape the Web by Asheesh Laroia

jim jim at well.com
Mon Jan 5 15:31:05 PST 2009

BayPIGgies meeting Thursday January 8, 2009

NOTE BayPIGgies is NOT meeting at Google this month (or next) but
at the Tech Museum on Market Street in San Jose (details below)

Tonight's talk is
Scrape the Web: Strategies for programming websites that don't expect it
by Asheesh Laroia

Do you find yourself faced with websites that have data you need to
extract? Would your life be simpler if you could programmatically
input data into web applications, even those tuned to resist
interaction by bots?

This talk presents the basics of web scraping and then dives into
the details of different methods and where they are most applicable.
You'll leave with an understanding of when to apply different tools
and learn about a "heavy hammer" for screen scraping used by a
project for the Electronic Frontier Foundation.

Attendees should bring a laptop, if possible, to try the examples
and optionally take notes.


Meetings start with a Newbie Nugget, a short discussion of an
essential Python feature, specially for those new to Python.
tonight's Newbie Nugget: Code Coverage Basics
by Benjamin Sargeant

The Tech Museum of Innovation
201 South Market Street
San Jose, CA 95113
ROOM: "Large Group Meeting Room", which is on the right
as one enters the building from the Park Street entrance.
NOTE: doors close to stragglers at 8 PM
No badges or registration are required.
You may come early.
Those interested in seeing the Leonardo exhibit may get
discount coupons, contact Rob Stephenson (rstephenson at thetech.org)

Parking and transport information can be found at

BayPIGgies meeting information is available at

------------------------ Agenda ------------------------

..... 7:30 PM ...........................
General hubbub, inventory end-of-meeting announcements,
any first-minute announcements.

..... 7:35 PM to 7:45 PM ................

Newbie Nugget Code Coverage by Benjamin Sargeant

Note that Tech Museum doors will not allow late-comers after 8:00 PM
..... 7:45 PM to 8:45 PM ................

Scrape the Web: Strategies for programming websites that don't expect it
by Asheesh Laroia

Note that the meeting ends promptly at 9 PM.
..... 8:45 PM to 9:00 PM  ................
Mapping and Random Access

Mapping is a rapid-fire audience announcement of topics
the announcers are interested in.

Random Access follows immediately to allow follow up
individually on the announcements and other topics of

Note that the meeting ends promptly at 9 PM.

More information about the sf-lug mailing list