A small update on my previous post about scraping the unstoppable Terminal Escape mp3 blog. I've settled into a monthly-ish schedule of scrapping the newest posts and then uploading them to the collection on archive.org. I have thought about making the process fully automated, which would be a fun exercise, but I don't want to accidentally mass-upload a bunch of garbage to archive.org. Currently I can keep an eye on the process and make sure everything is scraped and uploaded properly and avoid any runaway automated mishaps. It's a fairly easy task to have running for an afternoon while I'm home working on other things.
Currently the Terminal Escape collection is over 3732 items!
I also scraped and uploaded Terminal Escape's sibling-blog Escape is Terminal which specializes in live bootleg tapes. It's not a daily blog, and there were a substantial number of older dead links, but I still pulled down nearly 450 tapes and uploaded them to archive.org. You can see them here. Perhaps I should request a collection for that batch as well.
I have also scraped a few other mp3 blogs out there with a similar archival bent. The script works fairly well against blogs similar to Escape is Terminal. I'll upload those soon...