DRE - An RSS-to-Email Aggregator
DRE is a Perl script that fetches articles from a list of RSS
feeds and sends them on as plain-text emails.
Configuration
Feed URLs go in ~/.dre/feeds - one URL per line, blank lines are OK,
lines beginning with a hash (#) are ignored as comments.
The recipient of the emails goes in ~/.dre/recipient - single line,
plain email address, e.g. dave@biff.org.uk
Command-line options
Just run the script: dre
-d - print chatty debug information
-q - be quiet (don't complain about bad feeds) - for use with cron
-n - don't send any email (for repopulating the "seen" file)
-a - send email for all entries, even those previously seen
-f - download (to ~/.dre/downloads) an mp3 file listed as an enclosure
-h - show brief help text
Download
Download the current version. The
licence is GPL.
Required Perl modules: XML::RSS, HTML::Entities, LWP::UserAgent,
LWP::Simple, DB_File, HTML::FormatText::WithLinks, Data::Dumper, Getopt::Std.
History
11th April 2005:
- we are more defensive about missing links in feed items
- added -q switch for quiet (don't whinge on bad feeds)
- XML errors no longer cause us to bail out
- I *think* high-bit characters are handled OK
15th June 2005:
- in the absence of a "link" attribute, use the "guid" attribute
instead (to handle the BBC's podcasting trial)
- added "-n" flag for "don't send any mail" - useful for
repopulating the "seen" file without getting spammed
- added "-a" flag for "send all mail", even resending previously
seen entries
- added "-h" flag for brief help
31st August 2005:
- use HTML::FormatText::WithLinks instead of the home-grown parser
based on HTML::PullParser. This should make the output look nicer and
reduce the number of error messages from dodgy feeds.
- added a small number of workarounds for "unusual" formatting
23rd September 2005:
- there's a brief nod of the head to
better UTF-8 support
11th November 2005:
- Be more tolerant of feed items with no title.
6th July 2006:
- Use the guid rather than the link to track "seen" status.
- Optionally download an mp3 file referenced as an enclosure.
- (...using LWP::Simple to avoid sucking it into memory en route...)
4th August 2006:
- Cope with non-permalink guids (argh!)
- Handle UTF-8 better in debug output.
23rd March 2007:
- In the absence of a guid or link, use an enclosure url if it's
present. This was prompted by the Classic FM Jane Austen series. Gosh,
there are some badly-formed RSS feeds out there.
The name sort-of stands for "Dave's RSS to Email convertor".
Troubleshooting / Q&A
- DRE keeps sending me the same feed entries over and over again.
- Maybe you upgraded your DB library, e.g. by upgrading Debian Woody
to Sarge, and the "seen" file can't be read or updated:
dave@eyas:~/.dre$ file seen*
seen: Berkeley DB (Hash, version 8, native byte-order)
seen.old: Berkeley DB (Hash, version 5, native byte-order)
Fix: delete the "seen" file and run "dre -n" to repopulate the "seen"
file.
- DRE doesn't understand Atom feeds.
- I know... it's on my to-do list so I can read Google Blog.
Dave Holland
<dave@biff.org.uk>
$Id: dre.html,v 1.13 2007-03-23 13:24:08 dave Exp $