Should Email Errors Keep Customers From Buying?

Somewhere inside every commerce site, there's a bit of code sending emails out to customers. Email campaigning might have been in the requirements and that email code stands tall at the brightly-lit service counter. On the other hand, it might have been added as an afterthought, languishing in some dark corner with the "lost and found" department. Either way, there's a good chance it's putting your site at risk.

The simplest way to code an email sending routine looks something like this:

Get a javax.mail.Session instance
Get a javax.mail.Transport instance from the Session
Construct a javax.mail.internet.MimeMessage instance
Set some fields on the message: from, subject, body. (Setting the body may involve reading a template from a file and interpolating values.)
Set the recipients' Addresses on the message
Ask the Transport to send the message
Close the Transport
Discard the Session

This goes into a servlet, a controller, or a stateless session bean, depending on which MVC framework or JEE architecture blueprint you're using.

There are two big problems here. (Actually, there are three, but I'm not going to deal with the "one connection per message" issue.)

Request-Handling Threads at Risk

As written, all the work of sending the email happens on the request-handling thread that's also responsible for generating the response page. Even on a sunny day, that means you're spending some precious request-response cycles on work that doesn't help build the page.

You should always look at a call out to an external server with suspicion. Many of them can execute asynchronously to page generation. Anything that you can offload to a background thread, you should offload so the request-handler can get back in the pool sooner. The user's experience will be better, and your site's capacity will be better, if you do.

Also, keep in mind that SMTP servers aren't always 100% reliable. Neither are the DNS servers that point you to them. That goes double if you're connecting to some external service. (And please, please don't even tell me you're looking up the recipient's MX record and contacting the receiving MTA directly!)

If the MTA is slow to accept your connection, or to process the email, then the request-handling thread could be blocked for a long time: seconds or even minutes. Will the user wait around for the response? Not likely. He'll probably just hit "reload" and double-post the form that triggered the email in the first place.

Poor Error Recovery

The second problem is the complete lack of error recovery. Yes, you can log an exception when your connection to the MTA fails. But that only lets the administrator know that some amount of mail failed. It doesn't say what the mail was! There's no way to contact the users who didn't get their messages. Depending on what the messages said, that could be a very big deal.

At a minimum, you'd like to be able to detect and recovery from interruptions at the MTA---scheduled maintenance, Windows patching, unscheduled index rebulids, and the like. Even if "recovery" means someone takes the users' info from the log file and types in a new message on their desktops, that's better than nothing.

A Better Way

The good news is that there's a handy way to address both of these problems at once. Better still, it works whether you're dealing with internal SMTP based servers or external XML-over-HTTP bulk mailers.

Whenever a controller decides it's time to reach out and touch a user through email, it should drop a message on a JMS queue. This lets the request-handling thread continue with page generation immediately, while leaving the email for asynchronous processing.

You can either go down the road of message-driven beans (MDB) or you can just set up a pool of background threads to consume messages from the queue. On receipt of a message, the subscriber just executes the same email generation and transmission as before, with one exception. If the message fails due to a system error, such as a broken socket connection, the message can just go right back onto the message queue for later retry. (You'll probably want to update the "next retry time" to avoid livelock.)

Better Still

If you have a cluster of application servers that can all generate outbound email, why not take the next step? Move the MDBs out into their own app server and have the message queues from all the app servers terminate there? (If you're using pub-sub instead of point-to-point, this will be pretty much transparent.) This application will resemble a message broker... for good reason. It's essentially just pulling messages in from one protocol, transforming them, then sending them out over another protocol.

The best part? You don't even have to write the message broker yourself. There are plenty of open-source and commercial alternatives.

Summary

Sending email directly from the request-handling thread performs poorly, creates unpredictable page latency for users and risks dropping their emails right on the floor. It's better to drop a message in a queue for asynchronous transformation by a message broker: it's faster, more reliable, and there's less code for you to write.