THIS PAGE IS NOW OBSOLETE
- It was too specific to the old Solaris 8 environment running an old version of University of Washington IMAP.
Resolving a type of crisis on mail.cs.uwaterloo.ca
The
mail.cs
server provides both an SMTP service for clients to send messages and IMAP/POP services for clients to fetch their email.
Once in a while,
mail.cs
gets overwhelmed with IMAP processes and stops accepting messages. This is an undesirable state, which should be corrected when it is noticed. A suggested procedure for documenting and correcting the problem is:
- record current load average and process listing in ST (see Tools below for a good procedure)
- of particular interest are large numbers of processes attributed to a single userid and multiple processes (determined by
fuser
) accessing files under /var/mail
)
- disable IMAP and POP services (after making a backup copy, edit
/etc/inetd.conf
to comment out the four imap/pop3d lines, then HUP the inetd
process, e.g. sudo
killall
-s
HUP
inetd
or sudo
kill
-HUP
`ps
-C
inetd
-o
pid=`
)
- wait for the system to quiesce (return to low single digit load average)
- kill off IMAP and POP processes only if the system does not recover in a few minutes
- enable IMAP and POP services (restore
/etc/inetd.conf
from the backup copy made earlier, then HUP the inetd
process)
- monitor the system for several minutes, looking for a recurrence of the problem
As a follow-up, check with the people with active connections at the time of the SMTP outage to see whether they need help managing their inbox or
ThunderbirdConnectionTimeout, particularly those individuals with a large inbox (
/var/mail/userid
) at the time of the problem.
Looking for what happened to a particular piece of mail
Log files
On most systems, logs are kept in /var/log/syslog Older copies of the logs are usually kept as compressed files in the same directory, eg: syslog.2.gz
Note:
Outbound mail will be logged on
mail.cs,
inbound email will be logged on
mx100.cs
To look for email to/from a particular userid, eg: jdoe, use grep or zgrep:
# ssh mail.cs (or mx100.cs for inbound email)
# cd /var/log
# grep jdoe syslog*
# zgrep jdoe *.gz
Tools
Sorted count of processes per user
mail.cs% ps -adelf | awk '{print $3}' | sort | uniq -c | sort -nr
71 root
6 mgandalf
5 ptook
4 fbaggins
4 mbrandyb
4 sgamgee
[...]
In a crisis situation, you will sometimes see tens of processes running
for a single user.
Sorted count of multiple accesses per mailbox
For only users with active processes, run the "fuser" command on their mailbox
and show the output only for cases with more than one process.
mail.cs% cd /var/mail
mail.cs% ps -adelf | awk '{print $3}' | sort | uniq | \
sh -c 'xargs /usr/sbin/fuser 2>&1' | \
grep -v ': No such file or directory' | \
grep ' *[^ ][^ ]* *[^ ][^ ]*'
mgandalf: 10632o 1808o
fbaggins: 11225o 28655o
In a crisis situation, you will nearly always see very many processes
reading a single mailbox.
arpepper: 12160o 12159o 12158o 12157o 12156o 12155o 12154o 1215
3o 12152o 12151o 12150o 12149o 12148o 12147o 12146o 12145o 121
44o 12143o 12142o 12141o 12140o 12139o 12138o 12137o 12136o 12
135o 12134o 12133o 12132o 12131o 12130o 12129o 12128o 12127o 1
2126o 12125o 12124o 12123o 12122o 12121o 12120o 12119o 12118o
12117o 12116o 12115o 12114o 12113o 12112o 12111o
It can be reassuring to watch the list decline.
cscf.cs% cd /var/mail
cscf.cs% fuser arpepper
arpepper: 12160o 12159o 12158o 12157o 12156o 12155o 12154o 1215
3o 12152o 12151o 12150o 12149o 12148o 12147o 12146o 12145o 121
44o 12143o 12142o 12141o 12140o 12139o 12138o 12137o 12136o 12
135o 12134o 12133o 12132o 12131o
cscf.cs% fuser arpepper
arpepper: 12160o 12159o 12158o 12157o 12156o 12155o 12154o 1215
3o 12152o 12151o 12150o 12149o 12148o 12147o 12146o 12145o 121
44o 12143o 12142o 12141o 12140o 12139o 12138o 12137o
cscf.cs%
In some cases,
it can take a few minutes for each process to disappear.
When you re-enable imap, perhaps watch to see if the situation begins reoccurring for the same mailboxes.
How to find out if a user has been using imap
eg: on mail.cs looking for user imauser
cscf.cs>2# ssh -x mail.cs
Last login: Fri Jun 17 11:32:20 2011 from cscf.cs.uwaterl
services116.cs# cd /var/log
services116.cs# zgrep user=imauser syslog* | egrep 'ipop3d|imapd' | wc
0 0 0
services116.cs# zgrep imauser syslog* | wc
0 0 0