Friday, September 5, 2008

mail loops haunting me again

This week, an important mail loop caused slowdown problems in our company's mail servers. The cause of it was one of my servers that, in 12 hours, managed to send over 125K emails. Counting all the bounces that came back, the total number of e-mails must have been around 250K.

Here were the ingredients:

1. All our servers have a local sendmail daemon active. This is a requirement for our applications that speak SMTP to localhost:25. From a security standpoint, I had IP Filter filtering port 25 so I didn't modify the default sendmail configuration too much as I wanted it to remain as standard as possible.

2. After a few months, I forgot about point #1, of course. For a long time, I was under the impression that we had no sendmails listening at all.

3. Last week, we stopped IP Filter on one of the servers which was having some networking problems, and since it's a mission-critical one, I didn't have the guts to restart it. So this basically made the SMTP server active to the outside world.

The 3 ingredients were in place for a mail loop. Here's how it happened:

1. Thursday, I killed a process on the server, and an e-mail was generated with a missing process alert. The e-mail was sent to root.

2. All mails destined to root are redirected, through /etc/mail/aliases, to a MS Exchange mailing-list that includes all the system administrators.

3. One of our administrators, let's say John Doe, was on leave since a while, and it's mailbox was full.

4. The mail bounced back with a message stating that John Doe's mailbox was full. Its return address was either root@server.

5. Since the server had sendmail, and its port was unfiltered, it picked up the mail and tried to deliver it to root.

6. Back to step #2, 150000 times.

Now that loop lasted for a while until I got back at work.

To prevent this in the future:

1. I spent some time making sendmail "send only". The HP-UX sendmail.cf generator, gen_cf, sucks big time but I found out that by setting send_only and modifying /etc/rc.config.d/mailservs, it adds the correct DaemonOptions to restrict it to listening to 127.0.0.1. So even if IP Filter is stopped, at least any bounce will be refused by the server.

2. IP Filter should also be restarted ASAP.

3. I also redirected postmaster and MAILER-DAEMON to /dev/null (they are sent to root by default) so that if steps 1 and 2 are not followed, at least these addresses these won't participate in the loop.

4. I checked how sendmail could be throttled to limit the number of emails that are sent in a specific time period, there are macros for this available but I'd rather not deviate too much from the default settings.

5. I also think the Exchange administrators should reconsider the "let's send a bounced mail each time a mailbox is full" strategy. I know nothing of Exchange but I strongly beleived this can be throttled. If an account has, say, 10 bounces a second, this feature should be automatically deactivated.

As a side node, having support from the manufacturer is important to me. So don't tell me to install postfix or qmail. I don't want to. If I die, quit or go on a hell of a long vacation, I expect any less experienced admin to be able to call HP directly and be supported. That's why I'm relying on the subsystems that are included with HP-UX (sendmail, apache, tomcat, wu-ftpd, etc.) and not the open-source ones. Yes, they're outdated and yes, they're not necessarily the best of breed, but they work. Furthermore, any security patch is issued by HP, so I don't need to take care of that either.

No comments: