Interview: Michael Tsai

Wednesday, 24 September 2003

Michael Tsai is a Macintosh software developer who has authored several useful utilities, including DropDMG and BBAutoComplete. Michael is also the editor and publisher of the monthly Mac e-zine About This Particular Macintosh, and he writes a Mac-oriented weblog.

His latest software release is SpamSieve 2, a major update to his US$25 spam filtering utility. SpamSieve includes several types of filtering strategies, most especially a Bayesian statistical filter. I’ve been using SpamSieve 2 for over a month (including beta releases), and its efficacy has bordered on the incredible: flagging all but 10 of 1981 spams. SpamSieve is easy to set up, runs quickly, and works with every major Mac OS X email client except Apple Mail (it even works with Claris Emailer). Needless to say, I highly recommend SpamSieve.

I interviewed Michael via email.

(Safari users: Use the Reload Page command to force Safari to update its cached version of this site’s CSS style sheet; trust me, the interview will look much better.)

Gruber: When did you get the idea to write SpamSieve? Was it something you were thinking about before Paul Graham published “A Plan for Spam”?

Tsai: I was thinking about spam a lot before Graham published his paper, because I was getting more than a thousand spams a day.

Gruber: Wow. That’s a shitload of spam. How much spam are you getting now?

Tsai: About the same. It rose much higher (5000/day) in the interim, and so I had to make the ATPM mail server bounce all messages that weren’t sent to known addresses. (I used to look through all those so that we didn’t lose mail when people mistyped addresses.)

I had made a huge filter in Mailsmith, but I had to keep modifying it, and it started to become unmaintainable. So I had been thinking about what to do about my personal spam problem. Apple had demonstrated Jaguar’s Mail, and that convinced me that it was possible to do better than my Mailsmith filter and the other solutions that were available at the time. But I hadn’t actually started writing a new spam filter because I was busy with other projects at the time.

Paul Graham’s paper was what got me to write SpamSieve. First, it said that it was absolutely possible to write a better spam filter. Second, it demonstrated that doing so was actually pretty easy. The core of Graham’s algorithm is only about ten lines of Lisp code.

When I read “A Plan for Spam” around August 20, 2002, I thought every e-mail client would add a Graham-style filter in its next update. Pretty soon I wouldn’t have to worry about spam anymore. About a week later I got impatient, and it occurred to me that it would be possible to hook an external filter up to Mailsmith via AppleScript. (Previously, I didn’t really like the idea of an external filter, but AppleScript could make it work together with the e-mail client nicely.) I’d already written some Cocoa-AppleScript code for BBAutoComplete and DropDMG, and Graham had shown how to do the spam filter part. Suddenly this looked like a problem I could solve quickly, and so I took a few hours one evening and wrote a prototype. I say that not to brag that I’m such a fast programmer, but rather to illustrate how important it is to have a good framework like Cocoa. It really lowers the activation energy for starting a project, and that can make a lot of interesting things happen.

Gruber: In other words, you could get started writing code to filter spam — which is the code you wanted to write — rather than first needing to write a lot of code just to set up an application environment in which to execute the spam filtering code.

Tsai: Right. And, also, the framework has good general purpose tools—like arrays, hashes, and string routines—that are useful in writing the spam filtering code itself.

Gruber: How influential has Graham been in your approach with SpamSieve?

Tsai: The approach in SpamSieve 1.0 was almost exactly Graham’s. I think the only difference was that I tokenized words a little differently. In 2.0, there’s little that came directly from Graham, although of course he’s indirectly responsible for much of what’s in SpamSieve, having popularized the general approach.

Gruber: Where else have you drawn from, tactically?

Tsai: The new math in SpamSieve is due to Gary Robinson. It was refined by Tim Peters, and later by me. John Graham-Cumming wrote an influential document that describes a lot of tricks that spammers use to obscure their messages. The SpamBayes and POPFile projects have generated a lot of ideas about how to tokenize messages, and some of them show up in SpamSieve. Whitelists and blocklists are an old idea, of course, but I don’t know of any filters that automatically train them the way SpamSieve does.

Gruber: Part of what makes Bayesian spam filtering so interesting is that the concept is so simple. How would you describe it, for the non-programmer/non-mathematician?

Tsai: It looks at examples of your spam messages and good messages and figures out how often each word occurs in each type of message. When you get a new message, it looks at where the words in that message previously occurred. In other words, does the new message look more like your spam or more like your regular mail?

Gruber: Part of what makes SpamSieve — or at least version 2.0 — so effective is that it also looks at where the words occur. E.g., the word “money” in the Subject is counted separately from “money” in the message body. And also that these “words” aren’t just words in the English language sense, but rather any of the distinct tokens of an email message.

For example, according to my SpamSieve corpus, the token “U:remove” has occurred 1009 times in spam messages, but not once in a good message. I presume this means “remove” in a URL?

Tsai: Yes. Spam messages tend to include links to CGIs and tracking images, so that’s where a lot of the really spammy tokens come from.

Gruber: Given those stats, that’s obviously a strong indication that a message containing a URL with the word “remove” is spam. Compare and contrast with the word “remove” in the body of a message (and not in a URL), which has occurred 927 times in my spam, but also 75 times in my good messages. Probable spam, but far from tell-tale.

It’s kind of fun to look through the corpus, no?

Tsai: I think so. :-) The corpus window was originally intended to be just for my own use in testing and debugging the program, but it turned out that a lot of users liked it.

Gruber: SpamSieve’s whitelist and blocklist work very well. How do you defend against spammers who use forged headers to spoof the From: address such that spam appears to come from someone you know?

Tsai: Well, first of all you can decide whether you want to use the whitelist and blocklist. If you get a lot of forged spams “from” your friends, maybe you don’t want to. But otherwise, the whitelist works in an optimistic fashion. It trusts an address on the whitelist until you get a spoofed spam from that address, and then the address is disabled and it falls back on using the Bayesian classifier for messages from that address. I have about 400 addresses on my whitelist right now, and about 25 of them are disabled for that reason; the others are trusted until proven otherwise. So a spammer can fool it at most once per address.

Gruber: Version 2.0 is a significant improvement over SpamSieve 1.x. What kind of accuracy were you getting with SpamSieve 1.3.1, and what are you getting with 2.0?

Tsai: I was getting between 97 and 98% with 1.3.1. That’s probably a few percent higher than most users were seeing. With 2.0 I’m getting 99.5% and up. Nearly all the false negatives are virus e-mails with senders that are in my address book. So, because of the way I’ve set the preferences, those will always get through.

The accuracy matches what I’ve been hearing from the beta testers, so I think 2.0 will be a big improvement for the general user base. Perhaps more important than the peak accuracy is that 2.0 learns much faster. In testing, I’ve gotten up to the 1.3.1 accuracy range with a corpus of only 300 or so messages, whereas before it took around 2000.

Gruber: So far in September, SpamSieve has been 99.7 percent accurate for me. About 3300 messages total, 2000 of which were spam. I’ve had 10 false negatives, and zero false positives. In fact, I haven’t had a single false positive, ever, with any of the SpamSieve 2.0 betas.

This compares very favorably to SpamAssassin. Over the same period, SpamAssassin had over 90 false negatives — all of which SpamSieve caught. And conversely, SpamAssassin flagged 9 of the spams that SpamSieve missed. So, combining the two, I’ve had one spam, out of about 2000 total, slip through in September

Frankly, I’m surprised SpamAssassin still works as well as it does. Why any spammer doesn’t run their messages through it first, and edit them until they get through, is beyond me.

Tsai: It’s a mystery.

Gruber: In theory, the appeal of server-side spam filtering is obvious — if it works reliably, you can avoid downloading spam in the first place. But the key to Bayesian filtering is that it’s based on your personal email, which is harder to implement on the server than on the client.

Have you considered a server-side version of SpamSieve?

Tsai: Yeah, that’s something I’m going to look at more closely now that 2.0 is out. If anyone’s interested in such a product, please e-mail me and tell me what you want.

Gruber: Are there any particular types of spam messages that SpamSieve has trouble identifying?

Tsai: In the past, it had a lot of trouble with really short HTML messages containing, say, just an image and a link. With 2.0, those don’t seem to be much of a problem once it’s well trained. So far I haven’t really seen a pattern to the ones that 2.0 is missing, except that the language is more subdued.

Gruber: Last month, Paul Graham published “Filters that Fight Back”, wherein he advocates that when spam filters identify suspected spam, they should spider any URLs contained in the message. The idea is that even spammers have to pay for bandwidth, and so collectively, if we follow Graham’s advice, we can actually punish spammers.

The appeal of Graham’s proposal is undeniable. It would only take a few thousand users of such “fight back” spam filters to produce a noticeable and costly spike in the spammers’ web hosting bandwidth costs. Have you given any thought to this?

Tsai: If enough people did that, it might work. I don’t know. But I’m kind of skeptical about fighting back in a way that uses more bandwidth. He’s essentially proposing a distributed DoS attack, and I don’t know what the unintended consequences of that would be.

Gruber: I’d be concerned about false positives. SpamSieve, at least in my experience, produces extraordinarily few false positives (as stated earlier, none for me thus far). But other spam filters aren’t so accurate, and I’d hate to see the publisher of a legitimate email newsletter get their web server pounded by the users of an inaccurate spam filter. There’s no denying that Graham’s proposal is the digital equivalent of vigilante justice.

Tsai: Right, that’s a good example of an unintended consequence, but I wouldn’t be surprised if there were others that are worse. For example, if a digital terrorist knew that such filters were widely deployed, he could send out a spammy message including links to a target site he wanted to take down. In fairness, Graham wants to only apply this technique to sites that are already on a human-inspected blacklist, and I guess that would address these two scenarios.

Gruber: You mentioned that after reading Graham’s “A Plan for Spam”, you expected to see built-in Bayesian filtering in most email clients. Eudora 6 recently shipped with Bayesian-style filtering via its new SpamWatch plug-in. Have you compared it to SpamSieve?

Tsai: It’s kind of hard to do a direct comparison because Eudora doesn’t have a way of forcing its filter to look at particular messages. It simply runs the filter on all incoming messages, and since headers are important I can’t test it by sending myself mail.

I’ve looked at the corpus that Eudora uses, and it appears that SpamSieve extracts more information from the messages, hopefully leading to better accuracy. SpamSieve has some non-Bayesian filters seen in its preferences, and it’s more configurable, so I expect that it will interest a lot of Eudora users. It’s also half the price of the paid Eudora mode that’s required for using SpamWatch.

Gruber: Apple added “adaptive latent semantic analysis” to Mail in Jaguar, which sounds like a very fancy way of saying “Bayesian-style filtering”.

Tsai: Apple’s filter predated Graham’s paper; it was announced at WWDC in May, as I recall. As I understand it, Latent Semantic Analysis is actually quite different (and more ambitious) than Graham-inspired Bayesian filters. Someone (maybe Geoff Duncan) told me that Apple’s filter is “Bayesian,” but that doesn’t necessarily mean it’s similar to other Bayesian filters. Bayes was a huge figure in the field of statistics, and his work is widely applicable.

Gruber: Have you compared SpamSieve’s accuracy to Apple Mail’s?

Tsai: I did last fall, and found Mail’s accuracy to be good, but not as good as SpamSieve’s. Macworld found that also. Since then I’ve heard conflicting reports from Mail users. Some find that its filter is nearly perfect. Others find it almost worthless and have switched to other e-mail clients so that they can use SpamSieve. I get a steady stream of requests to make SpamSieve work with Mail.

Gruber: And so why doesn’t it work with Mail? Because Mail’s meager AppleScript support is insufficient?

Tsai: Right, at least for POP accounts. I’m investigating whether it will work for IMAP.

Scripting

Gruber: And so the least scriptable of the major Mac email clients is the one made by the company that invented AppleScript. What’s sad is that it isn’t surprising — very few of Apple’s recent applications offer decent scriptability.

SpamSieve is a perfect example of why good AppleScript support is a winning proposition for developers. Adding good scripting support certainly isn’t easy or quick, but once an app supports a rich scripting interface, it creates the potential for all sorts of unforeseen future benefits.

BBAutoComplete is another great example — it’s a free utility that works with any Mac OS X program that fully supports the standard text suite of Apple events. So in the same way that SpamSieve works with any scriptable email client, BBAutoComplete works with any text editing application that supports the standard text suite of Apple events.

Which of Apple’s current applications do you think would most benefit from better AppleScript support? (My vote would go for Safari.)

Tsai: Probably Mail or iChat.

Gruber: Admittedly, most Mac users don’t write AppleScript themselves. No matter how easy a language it is, it’s still programming, and most people have no interest in learning to program. But what I think many people overlook is that you don’t have to write AppleScripts to benefit from scriptable applications. E.g., PowerMail users don’t have to write their own scripts to connect SpamSieve to PowerMail — you’ve already written the scripts for them. But if PowerMail didn’t offer decent scripting support, they’d be missing out on a terrific spam filtering utility.

Tsai: I think those are good points. AppleScript is very popular among a small subset of Mac users, and it’s too bad it isn’t more widely used. More people using AppleScript would put more pressure on developers to support it better. I’ve come to think that the root problem may be AppleScript itself. People joke that Perl is a write-only language, but AppleScript often seems like a read-only language. It’s extremely readable, but a lot of people—even programmers—get stuck when they try to write it. It looks enough like English that it can be hard to see what the rules behind the language are. I think there may be more Mac users now who are comfortable with PHP or Perl or Python or JavaScript, than with AppleScript, even though these are ostensibly harder languages. The very aspect of AppleScript that was meant to make it accessible, may have held it back. Python and AppleScript are almost exactly the same age, incidentally.

Gruber: I agree about AppleScript’s easy-to-read, hard-to-write nature. Brent Simmons and I talked about this in my interview with him; I called AppleScript “the most successful unpopular language ever”. Brent wrote:

The biggest problem for me is that it seems like English, but it isn’t. There’s a dissonance there — like playing a C and D note together on the piano. I prefer my programming languages to be at least a minor third away from English: a fourth or fifth is even better. (And I think UserTalk, C, PHP, etc. are fifths.)

Tsai: Yes, I enjoyed reading that interview.

Gruber: I think it’s been a grand and rather successful experiment in creating an English-like programming language, but I think the result of the experiment shows that it isn’t a good idea. Somehow there’s this misconception that the goal of AppleScript is to be so “easy” that all Mac users could be out there writing scripts. That’s absurd. It’s definitely an easy language to get started with, but it’s still programming. Most people either don’t want to program, or have no aptitude for it, and so it doesn’t matter how easy any particular language might be.

Tsai: Right. But why do you think that’s a misconception? Why make AppleScript the way it is, if that wasn’t the goal?

Gruber: You’re right — “misconception” isn’t quite the right word. I think that it probably was Apple’s original intention to make AppleScript so easy that nearly anyone could write it. But now, 10 years later, I think we can safely say that making a programming language “English-like” doesn’t mean everyone can write their own scripts.

Tsai: Agreed. I know English pretty well, but I could never write a screenplay.

Gruber: I think it would be really hard to argue that AppleScript is an easier language to learn than, say, Python. That said, of course, I personally happen to like AppleScript.

What’s intriguing is that scriptable apps don’t support AppleScript directly — they support the Open Scripting Architecture (OSA). It just so happens that AppleScript is the only widely-used OSA scripting language. But there could be others, and they should “just work” with existing apps. Late Night Software’s freeware JavaScript OSA is one example, and I’m surprised it isn’t more popular.

Tsai: JavaScript OSA is a really neat idea that I confess I haven’t gotten around to trying. I’d be interested to know how it works in practice. Do you have to write glue tables like with Frontier? Does it work with AppleScript Studio?

Gruber: It doesn’t work with Studio, which is hard-wired only to work with AppleScript. But for regular scripting with JavaScript OSA, there’s no need to write Frontier-style glue tables; it “just works” as an alternate scripting syntax. There are oddities and deficiencies, however, which I think are mostly the result of JavaScript not being designed specifically as an OSA language. Notably, whose clauses don’t work in JavaScript.

Tsai: That would seem to be a major problem, efficiency-wise, but I imagine there are also many advantages (aside from syntax) that come from it going beyond OSA.

Gruber: Right. Like the fact that JavaScript has built-in support for regular expressions. It’s a much better language for string manipulation than AppleScript.

Another syntactical oddity is that AppleScript allows for tokens that contain spaces. For example, “display dialog” is a single command in AppleScript. In JavaScript, in-token spaces are translated to underscores, e.g. “display_dialog”.

Here’s a brief example I wrote a while back. BBEdit has a hidden preference to show a smiley face when its HTML syntax checker doesn’t find any errors in a document. You can toggle this setting with this AppleScript:


tell application "BBEdit"
   set p to get html preferences
   set smiley face enabled of p to not (smiley face enabled of p)
   set html preferences to p
end tell

The equivalent in JavaScript OSA:


var bbedit = MacOS.appBySignature("R*ch");
var p = bbedit.get_html_preferences();
p.smiley_face_enabled = !(p.smiley_face_enabled);
bbedit.set_html_preferences_to(p);

ATPM

Gruber: In addition to writing and publishing Mac OS X software, you’re also the publisher and editor of ATPM. How long have you been involved with ATPM, and how did you get started?

Tsai: I joined ATPM in March of 1996 as the reviews editor. My job was to write reviews and try to get others to write them too. My first article was about Netscape and IE 2.0. That was issue 2.04, and we just finished 9.09.

Gruber: That’s a long time. How old were you in 1996? When did you take over as publisher and editor?

Tsai: I was 16 then. Robert Paul Leitao and I took over as co-publishers in June of 1996. I became the editor, and he became the managing editor. Unfortunately, Rob later had to scale back his involvement, but he still writes the Welcome column at the beginning of each issue.

Gruber: I’m not sure what impresses me more — that you’ve been editing ATPM for seven years, or that you were only 16 at the time.

Where do you see ATPM compared to other Mac publications?

Tsai: The biggest difference in the content, I think, is that we don’t cover news. We try to write more in-depth articles that will be interesting to people a year or two after they’re written. And we do multiple editing passes and accuracy checks, which hopefully set us apart from the average Web site in terms of quality.

In reviews, we tend to write about products that we use every day. That’s the only way to really go beyond the spec sheet and press kit, and get at what it’s like to actually use the product. In general, we write about what interests us and the topics where we have something to add, rather than feeling an obligation to completeness. For example, Eudora is an important product, but we haven’t reviewed it since 1997 because I don’t think anyone on staff normally uses it.

When I was an ATPM reader, I liked the down-to-earth, personal writing style, and I hope some of that still remains. Compared to other Mac publications, I’d like to think that ATPM is most like TidBITS — only with graphics.

Gruber: The reviews have always been my favorite segment in ATPM — it really shows that they’re written by people who actually use the software, which is very much in contrast to the reviews in most web publications.

One thing that I find distinctive about ATPM isn’t contextual, but structural — ATPM publishes monthly issues, more like a print magazine than a web site. In fact, when ATPM was established, it didn’t have an HTML version, just DOCMaker, right?

Tsai: Right. ATPM was DOCMaker-only for about the first year, and then we added a Web version. PDF and various other offline editions came later.

I like the idea of a monthly mix of articles that’s delivered to your mailbox, and that you can print out and read like a magazine. At one time there were several monthly e-zines like ATPM, but the Web is a more timely medium and much better for advertising. From time to time it’s suggested that we should update the Web site more often and “sync” with the offline editions at the beginning of each month, but as a practical matter one deadline a month is plenty.

Gruber: How did you get started using the Mac? You must have been quite young if you were editing ATPM by the time you were 16.

Tsai: We had an Apple II, but some friends had Macs that I used around 1987 or 1988, mostly for games. I didn’t switch to the Mac until System 7 came out in 1991. I read Bob LeVitus’s Dr. Macintosh and Stupid Mac Tricks and was hooked.

Gruber: When did you start programming?

Tsai: I did a bit of Logo in second grade, although it was really typing commands interactively rather than combining them into programs. After fourth grade, I did a summer program where we learned BASIC. That’s when I really started, I guess. I wrote some simple games and figured out how to alphabetize lists of words.

Mac Development

Gruber: When did you start writing Mac software?

Tsai: I learned Pascal and started playing around with the Mac toolbox in 1991, but that didn’t really go anywhere. I took another stab at it about three years later in C, and made it through several tutorial/exercise books, but I didn’t have any ideas for my own projects. Over the next few years, I did several projects with PowerPlant, but none of them were intended for distribution. Most of the programming I was doing wasn’t Mac software. I didn’t actually ship a Mac application until DropDMG in 2002.

Gruber: Where did you attend college? Did you study computer science?

Tsai: I went to MIT where I majored in computer science and minored in linguistics.

Gruber: Larry Wall, who designed and created Perl, is a linguist by training, and according to him, many of the decisions he made regarding Perl’s syntax stem more from linguistics than computer science. I assume that you see quite a bit of overlap between the two fields, as well?

Tsai: Yeah. Denotational semantics for programming languages is similar to the ways people are doing semantics of natural languages, and there are similarities in syntactic formalisms, as well. Both fields like to draw upside-down trees. :-) Looking at it from a higher level, the holy grail of AI—the Turing test—is really about figuring out how natural language works.

Perl is an amazing experiment in how a programming language should work, and it’s interesting to see where the ideas are coming from. Another language that relates to what we’ve been talking about is Smalltalk (a progenitor of Objective-C). Smalltalk was designed to be accessible to children, and Alan Kay has said that its messaging system drew on his experience from cellular biology.

Gruber: Objective-C is clearly the premier language for Cocoa programming, but there are other options, including Java and AppleScript Studio. There are also efforts to bind scripting languages like Ruby, Perl, and Python to Cocoa. Are you particularly intrigued by any of these other languages for Cocoa?

Tsai: Java is kind of a sidestep from Objective-C, and so I don’t see a lot of reason to use it as the primary language in writing a Cocoa application. However, Java has a ton of libraries, and the Java bridge lets Objective-C programmers take advantage of them.

AppleScript Studio doesn’t really interest me because it’s kind of a heavyweight solution. If I’m writing a script, I want something like 24U’s Appearance OSAX—“display dialog” on steroids. And if I’m writing a larger application, I wouldn’t choose to do it in AppleScript. But that’s just me. It’s certainly an interesting technology.

Perl, Python, and Ruby really intrigue me because they let you be so much more productive than with Objective-C (or Java). No matter how much you hear about how Cocoa’s memory management is good (it is) and easy to learn (it is), the fact of the matter is that Objective-C applications still leak memory and crash. These are less of a concern with the scripting languages.

The scripting languages are much more expressive, so you can translate your ideas into code more succinctly. That probably means less time coding and fewer bugs. They let you develop interactively, which is great for experimenting and testing. And they also have lots of libraries, especially for Internet stuff.

Python intrigues me the most, because I really like readability of the language, and because it seems to have the best Cocoa bridge.

Gruber: I agree. While my favorite language, by far, is Perl, I must admit that the Python bridge to Cocoa seems the most robust of the various scripting languages.

Assuming Mac OS X has a long life ahead, do you think Objective-C does too?

Tsai: I don’t think Objective-C is going away. There will always be a need for a lower-level language, particularly for people working on the OS. It looks like Apple is starting to get serious about enhancing it, which of course I’m very happy about.

Gruber: But would you agree with me that at some point in the not-too-distant future, we’ll start seeing more Cocoa applications written in scripting languages than in Objective-C?

Tsai: I think the scripting languages will become very popular among a growing subset of Cocoa developers. People will realize their advantages, and new Mac developers will already know these languages and perhaps not want to use Objective-C. But I don’t see them overtaking Objective-C. They’re non-standard (w.r.t. Cocoa) and not blessed by Apple; a lot of people like to stay on the beaten path. Also, developers may not want to expose their source code and algorithms.

Gruber: I think what would be most interesting would be if Apple were to bless Cocoa scripting bridges from these languages. I think Cocoa programming with Python could end up being much more popular than AppleScript-Studio.

Tsai: Yeah.

Gruber: It’s also the case that you can, in theory, mix and match the languages used to write components of a Cocoa application. You could thus write most of an app in your favorite scripting language. But you could use Objective-C to write the parts that are performance-sensitive (as well as any other parts for which you don’t want to make the source code visible, like, say, the licensing and registration routines).

Tsai: Right, in practice I think most of these applications will be hybrids. By the way, two other potential drawbacks I thought of are difficulty in debugging and threading. Those get more complicated when you combine two different systems. Also, threading performance would not be as good. Ruby doesn’t have native threads, so it can’t take advantage of multiple processors. Python supports native threads, but the interpreter can only run in one thread at a time (I/O can happen in the background).

SpamSieve is supported on Mac OS X 10.1.5 or later (although 10.2.6 is recommended). How much work is it for you to stay compatible with 10.1.5? Do you really think it’s worthwhile?

Tsai: There are a few things built into Jaguar that I re-implemented last year so that SpamSieve could run on Puma. Those workarounds continue to work. The only feature that absolutely requires Jaguar is the Address Book integration, which I shipped in February, and it was only a little extra work to add that while maintaining Puma compatibility. At the time, some people hadn’t upgraded to Jaguar yet. For future versions, I’m not going to purposely remove support for Puma, but I’ll drop it if doing so is a win for Jaguar and Panther users.

Gruber: When SpamSieve 1.0 shipped last year, it cost only $10. Since then, the price has crept upward a bit — when I purchased my license a few weeks ago, it was $20, and SpamSieve 2.0 now costs $25. I’m not complaining — in fact, quite the contrary, I think $25 is a perfect price for SpamSieve. A good value for users, and good money for you.

Tsai: I have a tendency to price things too low, unfortunately. And that’s a problem in terms of perception, as well as money. As you say, I think the new price is fair all around. I think it’s final.

Gruber: Do you view your Mac software development as a career or a hobby? Or somewhere in-between?

Tsai: A career.

Gruber: That’s good to hear. I think you’re off to a great start — SpamSieve is useful, accurate, easy, and most importantly, best-of-breed. That it works with so many different mail clients gives you a large potential audience.

The Internet has profoundly changed the market for small software developers. In the old days, you really needed a box and installation disks to get software into your customers’ hands. That meant a lot of distribution infrastructure, because boxed software was sold through catalogs (MacConnection, MacWarehouse, etc.) or through retail stores.

Today, small- and mid-sized developers sell directly to their customers. Selling software as a direct download isn’t just cheaper for developers, but it’s also faster and more convenient for customers.

Tsai: Yup, it’s great all around.

Gruber: It’s a more direct, personal relationship between developer and customer. But it’s also more likely to be a complete do-it-yourself effort. You don’t just design and develop your software — you also write the documentation and answer the tech support email. Do you enjoy this, or would prefer to be able to concentrate on software development?

Tsai: I enjoy playing those different roles, and I think being in contact with the customers helps me make a better product. That said, it would be frustrating if I were spending a large percentage of my time doing technical support. Software development should be the main focus.

Gruber: How strong do you think the market is for small Mac developers?

Tsai: I suppose it depends on the product, but in general I think this is a great time for small Mac developers. As you mentioned, there’s now a good infrastructure for selling directly to customers. We have better developers tools, which lower the barriers to entry. And it’s much easier to get noticed than in the Windows market.

Gruber: How much time do you spend dealing with tech support?

Tsai: It depends. Last week, when SpamSieve 2.0 was released, I did nothing but e-mail for several days straight. It’s still pouring in. In general, it’s probably around half an hour a day. Although I think it’s getting better, SpamSieve is relatively difficult for people to install and set up. A lot of people have questions about how to make it work best in their situation, and how to customize it. There’s also an unpredictability to it because each user has a different corpus and gets different e-mails. After adjusting for the different numbers of users, it probably still generates ten times as many e-mails as DropDMG.

One of the interesting things is that I’ve gotten to see Moore’s chasm in action. In the beginning, I got a few tech support questions, but most of the e-mails were feature requests and suggestions. People would show me how they’d customized the AppleScripts and make sure I was aware of various spammer tricks.

After SpamSieve was reviewed in Macworld, I suddenly got a lot of feedback that the documentation wasn’t clear. People didn’t know what AppleScripts were, or what the Scripts menu was (there’s no menu called “Scripts”), or how the SpamSieve rule in their e-mail program would interact with the other rules, or the difference between Entourage’s address book and Apple’s, or whether the messages would still be in SpamSieve’s corpus if they were deleted from the e-mail program. These people weren’t tinkerers like us; they wanted to double-click the icon and forget about it. Of course, I had tried to make the program as easy to use as possible, and I had spent a lot of time on the documentation, but I just wasn’t prepared for the kinds of questions people would have.

My favorite example is that one version of the documentation said something like “To tell SpamSieve that a message is spam, select the message and run the ‘SpamSieve - Add Spam’ AppleScript.” In retrospect that was a stupid way to phrase it, because it’s not clear what an AppleScript is or how you run one. And you shouldn’t have to know. Naturally, people looked for where in Entourage’s user interface it said “Run AppleScript.” There’s such a checkbox in the dialog for Microsoft’s Junk Mail Filter, and so people would enable that and then set it to run the SpamSieve AppleScript, thinking that that would mark the selected messages as spam. Now the documentation says to choose the “SpamSieve - Add Spam” item from the Scripts menu, and it shows a picture of the menu so there’s no misunderstanding.

Previous:	‘Select Word’ Script for BBEdit
Next:	‘Select Word’ Script Addenda