The mail.cs
server provides both an SMTP service for clients to send messages and IMAP/POP services for clients to fetch their email.
Once in a while, mail.cs
gets overwhelmed with IMAP processes and stops accepting messages. This is an undesirable state, which should be corrected when it is noticed. A suggested procedure for documenting and correcting the problem is:
fuser
) accessing files under /var/mail
)
/etc/inetd.conf
to comment out the four imap/pop3d lines, then HUP the inetd
process, e.g. sudo
killall
-s
HUP
inetd
or sudo
kill
-HUP
`ps
-C
inetd
-o
pid=`
)
/etc/inetd.conf
from the backup copy made earlier, then HUP the inetd
process)
As a follow-up, check with the people with active connections at the time of the SMTP outage to see whether they need help managing their inbox or ThunderbirdConnectionTimeout, particularly those individuals with a large inbox (/var/mail/userid
) at the time of the problem.
Note: Outbound mail will be logged on mail.cs, inbound email will be logged on mx100.cs
To look for email to/from a particular userid, eg: jdoe, use grep or zgrep:
# ssh mail.cs (or mx100.cs for inbound email) # cd /var/log # grep jdoe syslog* # zgrep jdoe *.gz
mail.cs% ps -adelf | awk '{print $3}' | sort | uniq -c | sort -nr 71 root 6 mgandalf 5 ptook 4 fbaggins 4 mbrandyb 4 sgamgee [...]
In a crisis situation, you will sometimes see tens of processes running for a single user.
For only users with active processes, run the "fuser" command on their mailbox and show the output only for cases with more than one process.
mail.cs% cd /var/mail mail.cs% ps -adelf | awk '{print $3}' | sort | uniq | \ sh -c 'xargs /usr/sbin/fuser 2>&1' | \ grep -v ': No such file or directory' | \ grep ' *[^ ][^ ]* *[^ ][^ ]*' mgandalf: 10632o 1808o fbaggins: 11225o 28655o
In a crisis situation, you will nearly always see very many processes reading a single mailbox.
arpepper: 12160o 12159o 12158o 12157o 12156o 12155o 12154o 1215 3o 12152o 12151o 12150o 12149o 12148o 12147o 12146o 12145o 121 44o 12143o 12142o 12141o 12140o 12139o 12138o 12137o 12136o 12 135o 12134o 12133o 12132o 12131o 12130o 12129o 12128o 12127o 1 2126o 12125o 12124o 12123o 12122o 12121o 12120o 12119o 12118o 12117o 12116o 12115o 12114o 12113o 12112o 12111o
It can be reassuring to watch the list decline.
cscf.cs% cd /var/mail cscf.cs% fuser arpepper arpepper: 12160o 12159o 12158o 12157o 12156o 12155o 12154o 1215 3o 12152o 12151o 12150o 12149o 12148o 12147o 12146o 12145o 121 44o 12143o 12142o 12141o 12140o 12139o 12138o 12137o 12136o 12 135o 12134o 12133o 12132o 12131o cscf.cs% fuser arpepper arpepper: 12160o 12159o 12158o 12157o 12156o 12155o 12154o 1215 3o 12152o 12151o 12150o 12149o 12148o 12147o 12146o 12145o 121 44o 12143o 12142o 12141o 12140o 12139o 12138o 12137o cscf.cs%
In some cases, it can take a few minutes for each process to disappear.
When you re-enable imap, perhaps watch to see if the situation begins reoccurring for the same mailboxes.
cscf.cs>2# ssh -x mail.cs Last login: Fri Jun 17 11:32:20 2011 from cscf.cs.uwaterl services116.cs# cd /var/log services116.cs# zgrep user=imauser syslog* | egrep 'ipop3d|imapd' | wc 0 0 0 services116.cs# zgrep imauser syslog* | wc 0 0 0