Problem Diagnosis
There are various failure modes of the MySQL cluster; this section will help identify which recovery method to use. The scenarios below are listed in decreasing order of emergency.
Nagios reports Master is down
The automated Nagios check may report that the master is down. This is likely an emergency- all operations that rely on our mysql database will fail until this is fixed.
See
MySQLHAMasterFailure for recovery steps.
Applications are failing to connect to mysql.cs
Production applications could fail with messages such as "Cannot connect to database mysql.cs.uwaterloo.ca". This is likely an emergency- unless there are network issues with those applications, you can assume all operations that rely on our mysql database will fail until this is fixed.
See
MySQLHAMasterFailure for recovery steps.
Nagios reports Master is out of sync with both slaves
The automated Nagios check may report that the master is out of sync with both slaves. This is an intermediate-level emergency; recovery can wait until the next business day.
You may treat this situation as if both of the slaves are down. See
MySQLHASlaveFailure for recovery steps.
Nagios reports Master is out of sync with a slave
The automated Nagios check may report that the master is out of sync with one slave. This is
not a time-criticial emergency, and recovery can wait until the next business day.
You may treat this situation as if the slave is down. See
MySQLHASlaveFailure for recovery steps.
Nagios reports Slave is down
The automated Nagios check may report that a slave is down. This is
not a time-criticial emergency, and recovery can wait until the next business day.
See
MySQLHASlaveFailure for recovery steps.
Nagios reports Master is out of sync with a slave
The automated Nagios check may report that the master is out of sync with one slave. This is
not a time-criticial emergency, and recovery can wait until the next business day.
You may treat this situation as if the slave is down. See
MySQLHASlaveFailure for recovery steps.