[conspire] perl's lovely WWW::Mechanize and, e.g. shapshots.debian.org

Sat Oct 26 00:27:53 PDT 2024

Ah, Perl's lovely WWW::Mechanize is great for automating web http/https
stuff.  E.g. you want to automate some client stuff interacting with web
server.

So, my latest mini-project to use that, I've thus far named the program:
find_deb_in_snapshot.debian.org
So, what it essentially does, takes one or more file arguments of .deb
binary files, gets their size in bytes and calculates their (currently)
SHA-1 digest, then it goes to see if it can locate the files on the
snapshot.debian.org web site, and reports the results.
If successful, it gives the filename and URL from which the file can be
retrieved.

snapshot.debian.org is a great Debian service.  It has Debian files from
the Debian project, notably .deb and corresponding source files, going
way back.  "Everything", back to nearly 2006-06-30, and I believe all
sources going back much further than that (all the way to the beginning?
I know Debian has those - not 100% sure if they're all on
snapshot.debian.org, but they may well be).

Anyway, I'd been hanging onto my downloaded .deb files ... going way
back.  In case I ever needed 'em again, or whatever.  Well, given how
long snapshot.debian.org has been around, and how stable it is, etc.,
really time for me to trim some of my older archives of .deb files,
most notably where they're redundant with what's in snapshot.debian.org.

Ah, but how to check?  Well, sure, can manually check on the web site.
Put in name of package in binary search box, click Submit, find the
relevant version, follow that link, see that it's there for relevant
architecture, check the SHA-1 matches.  Not hugely difficult, but for
tens of thousands of such files ... automation.
That's where Perl's WWW::Mechanize comes in.

E.g. - to demonstrate, these are a couple of the newer files I thus far still
have sitting around:
$ (cd /var/cache/apt/archives && f="$(ls -f | grep '\.deb$' | shuf |
head -n 2 | sort)" && [ -n "$f" ] && echo "$f" &&
find_deb_in_snapshot.debian.org $f)
libedataserverui-1.2-4_3.46.4-2_amd64.deb
tftp-hpa_5.2+20150808-1.4_amd64.deb
libedataserverui-1.2-4_3.46.4-2_amd64.deb
https://snapshot.debian.org/archive/debian/20230318T085419Z/pool/main/e/evolution-data-server/libedataserverui-1.2-4_3.46.4-2_amd64.deb
tftp-hpa_5.2+20150808-1.4_amd64.deb
https://snapshot.debian.org/archive/debian/20221026T030313Z/pool/main/t/tftp-hpa/tftp-hpa_5.2%2B20150808-1.4_amd64.deb
$

If one may want to peek more, or download:
https://www.mpaoli.net/~michael/bin/find_deb_in_snapshot.debian.org.txt
Above is browsable link to the below.  If I get the web server
configuration tweaked so the below doesn't tend to try and force clients
to download, then I may well get rid of the above link, but in at least
the meantime above link is also there as work-around.
https://www.mpaoli.net/~michael/bin/find_deb_in_snapshot.debian.org

Future directions?  I haven't yet, but I'm at least somewhat inclined to
expand the functionality, so one can give it filenames without the files
even being present, and it would then likewise search, and in that case
also provide the information that would've otherwise come from files
present, notably size in bytes, and especially the (currently) SHA-1
digest.