RTF to Plain Text Translator

About a year ago I looked around for a way to convert RTF (Rich Text Format) files to plain text. I forget why I wanted to do this, but I ended up not finding a decent solution. Most surprisingly, there don’t seem to be any Perl modules on CPAN that can do this; there are a bunch that have something to do with RTF (e.g. creating RTF files from scratch), but none that will simply take an RTF file as input and spit out a plain text translation.

Last week I was puttering around with FileMerge, the text file comparison app included with Apple’s Developer Tools. In FileMerge’s preferences, you can create command-line filters to pre-process certain files before comparing them. For example, the default prefs have a filter to send Project Builder “.pbproj” files through the /Developer/Tools/pbprojectdump tool before FileMerge compares them.

Most interesting is the default setting for RTF files — they get sent through a filter with the path $(APP)/convertRichTextToAscii. The “$(APP)” part is apparently a special reference to the Resources folder in the FileMerge application package. If you open a Terminal window and cd inside the FileMerge.app package, you’ll find the convertRichTextToAscii tool here:

/Developer/Applications/FileMerge.app/Contents/Resources/convertRichTextToAscii

It works great. I haven’t tested it thoroughly, but it worked perfectly for me on RTF files generated by both Word and TextEdit. It’s a little unwieldy to invoke, what with the long path and long name, but you can either make a copy and rename it, or set up a symlink. Something like “rtf2text” would be a better name.

To use it within BBEdit, I created the following shell filter, which I named “RTF to Text” and saved in the “Unix Filters” folder in my “BBEdit Support” folder:


#!/bin/sh
/Developer/Applications/FileMerge.app/Contents/Resources/convertRichTextToAscii "$1"

Note: The convertRichTextToAscii tool doesn’t actually convert RTF to ASCII — it converts to UTF-8. This is a good thing, since technically, the ASCII character set only includes 128 characters. However, if you’re using BBEdit 7, make sure you turn on the “Send UTF-8 to Interpreter for Perl and Unix Scripts” checkbox in BBEdit’s Unix Scripting prefs panel.