[conspire] July schedule posted

Rick Moen rick at linuxmafia.com
Mon Jul 6 18:58:59 PDT 2009


Quoting Tony Godshall (tony at of.net):

> OK, so here's a better start (stole some time)
> 
> So far I'm generating stuff like this with the python attached (and
> yes, before anybody starts on the style criticism, this is being
> written under the assumption that it's a one-time-use one-way hack
> with better code being used to regularly build html and hCalendar from
> that)...

Getting close.  Have a look at 
http://linuxmafia.com/calendar/week.php?cal=bale&getdate=20090723

That is displaying http://linuxmafia.com/calendar/calendars/bale.ics
within an old version if PHP iCalendar I've had online since forever.
bale.ics is a test run of your t.py (with small fixes), arrived at as
follows:

Step 1:

$ wget -O- http://linuxmafia.com/bale/ > /tmp/t
$ cd /tmp
$ python t.py > bale.ics
$ mv bale.ics /var/www/calendar/calendars

Step 2:

Load into PHP iCalendar page... and the software chokes on it, because
it's not a valid iCal file.  Examine file, note absence of headers and
footers.  Supply mockup-grade faked headers:

BEGIN:VCALENDAR
VERSION
 :2.0
PRODID
 :-//Mozilla.org/NONSGML Mozilla Calendar V1.0//EN

and the necessary footer:

END:VCALENDAR

Load into PHP iCalendar page... and note that there are now valid
iCal events, despite all the "bad start: 0" placeholder lines where your
script was not able to parse "-" timestamps for events with no
particular start/stop times (holidays, things that span multiple days,
things that run all day long).

I put the obvious Print statements into t.py.

However, problem:  There is null text shown for all of the events.

Step 3:

Compare draft "bale.ics" against other iCal files.  Do 
":%s/DESCRIPTION: /SUMMARY:/g" on the bale.ics output file, since it
appears that "SUMMARY:", not "DESCRIPTION:" is what gets shown (and to
get rid of the spurious space character after the colon).  OK, no longer
null.

And so, we arrive at what looks like a halfway decent iCal file --
albeit one with a badly organised SUMMARY field (which really ought to
have the group's name first, not city first).

OK, I've now fixed _that_ as well:  The line that used to return
DESCRIPTION now has:

print "SUMMARY:%s: %s" % (items[3],items[2])

I attach the trivially-improved script as ical.py.  Thanks for your
efforts so far!



Readers, please note:  That was just a test run, to see if we can get
valid iCal at all.  That is not production, just test data.  In
particular, I am _not_ currently committing to keeping that bale.ics
file maintained.


Obviously, in actual use, "ASSUMEYEAR=2009" is no-go and would have to
be filled in with something that works.  A call to "date +%Y" is sort-of
an answer, but you have edge cases with calendars that wrap around the
year end, that need more-intelligent handling.


Also, we'd need to work around that problem of not parsing "-" and just
punting on non-numeric timestamps, by substituting something reasonable.
Like [date]T000100 for a DTSTART just after midnight, and [date]T235900
for a DTEND just before midnight, if I'm reading the spec right.


Also, although I admire the pragmatic way you grab HTML to parse using
wget for prototyping purposes, in production wget'ing input might be a bit
whacked.  You're correct that the HTML on the CABAL page's calendar 
table is sort-of the same as BALE's -- but that's a matter of
protohistory:  I created the CABAL page's calendar by, at one point,
taking the _output_ of BALE (which is PHP) and pulling just the CABAL
entries into a text editor.  And then maintaining _that_ flat HTML,
going forward, in a text editor.


There's also something unpleasant with what happens to the SUMMARY
field.  Note this field in the "t" file for a CABAL day:

   <a href="#cabal">CABAL</a> meeting</strong> at <a
href="../cabal/#directions">Rick & Deirdre's House</a> <a
href="http://linuxmafia.com/~rick/map-1105Altschul.gif"><em>(map)</em></a>

The "#cabal" and ""../cabal/#directions" are fine _on_ the BALE page,
because they resolve relative to the page base URL, which is
http://linuxamafia.com/bale/ , for which request the Web server sends
http://linuxamafia.com/bale/index.php .  However, out of context in
(e.g.) PHP iCalendar, they become:

  http://linuxmafia.com/calendar/includes/event.php?event=Menlo+Park%3A+%3Ca+href%3D%22%23cabal%22%3ECABAL%3C%2Fa%3E+meeting+at+%3Ca+href%3D%22..%2Fcabal%2F%23directions%22%3ERick+%26amp%3B+Deirdre%27s+House%3C%2Fa%3E+%3Ca+href%3D%22http%3A%2F%2Flinuxmafia.com%2F%7Erick%2Fmap-1105Altschul.gif%22%3E%28map%29%3C%2Fa%3E&cal=bale&start=1:00%20PM&end=9:00%20PM&description=&status=&location=&organizer=a:0:{}&attendee=a:0:{}#cabal

and

  http://linuxmafia.com/calendar/cabal/#directions

...both of which are 404.

So, ideally, relative URLs within the SUMMARY field for BALE data would
get dereferenced to http://linuxmafia.com/bale/[relURL], and ones for
CABAL data to http://linuxmafia.com/cabal/[relURL].  Or something.

The ".." is a real headache, but I'm pretty sure it's unique to CABAL 
events on the BALE page -- and is an artifact of my knowing that the
BALE and CABAL page files will reliably have a fixed directory
relationship.  (If I needed to, I could stop using ".." relative
paths -- but I'm reluctant to stop using the others, as they are useful
within the BALE page.)


But anyhow, that problem plus the "-" timestamp one highlights a broader
(non-showstopper, but worth noting) obstacle, which I alluded to before:
BALE's/CABAL's event dataset was simply not created with iCal in mind.
iCal favours a particular form of data organisation (e.g., the strict
type enforcement on fields of type DATE-TIME) and assumes a particular
degree of brevity, which is why the converted SUMMARY fields aren't
working all that well: They're written with the generous screen space of
a BALE event line in mind, not the small-screen-box space typical in
iCal-handling apps.



Hell, maybe I should just make a daily cronjob to run that Python script
as-is, with a local parse on /var/www/cabal/index.html, and a wget fetch
of the output of http://linuxmafia.com/bale/index.php .  Buggy and a
little odd, but arguably better than nothing.





More information about the conspire mailing list