By John Gruber
Doxie: scan smarter, not harder. Use Amazon promo code “FIREBALL” for 35% off.
Given that (a) Perl is optimized for text processing and Unix-y administrative tasks; and (b) the mbox email storage format is both plain text and deeply rooted in Unix culture — you’d think Perl would be a terrific scripting language for mbox parsing.
But I’ve always rather disliked the entire
Mail:: hierarchy of
CPAN modules, several of which provide ways to parse mbox files
(along with other mail storage formats). They reek of
over-engineering — complicated APIs and slow performance. One of
Larry Wall’s guiding precepts for Perl is, “Easy things should be
easy, and hard things should be possible.” The modules in the
Mail:: hierarchy fail the “easy things should be easy” test.
In response, Simon Cozens and several collaborators have spearheaded
the Perl Email Project, the product of which is the relatively new
Email:: hierarchy of CPAN modules. These modules are simple, fast,
and easy to use. I replaced
in a script I wrote to process mbox files full of T-shirt orders,
and it ran over 10 times faster. Even better, I found the syntax
more intuitive — more, well, Perlish.
For example, in my original script using
Mail::MboxParser, given a
$msg, to get the body of
$msg as a string, I
needed to do this:
my $body = $msg->body($msg->find_body); my $text = $body->as_string;
Email::Simple), I can just
my $text = $msg->body;
A few weeks ago Cozens wrote an article for Perl.com, “The
Evolution of Perl Email Handling”, wherein he makes a
compelling case for the new
Email:: modules while providing a good
introduction to their usage. Highly recommended for anyone who uses
Perl to read email files.