THIS PAGE HAS BEEN MIGRATED TO CONFLUENCE: https://uwaterloo.atlassian.net/wiki/x/nwENUAo

CS MySQL Database Backup and Recovery for System Administrators

Public Alias Current Hostname Local Backup Remote Backup Hot Spare
mysql.cs mysql-10[246].cs /var/backups/mysql/ linux.cscf /mysql/backups/ See MySQLHAOperationsManual
mysql.student.cs mysql070.student.cs /var/backups/mysql/ linux.cscf /mysql/backups/ mysql-270.cscf

Backups

mysql.cs

mysql.student.cs:

  • We use a combination of backup techniques to balance their advantages and disadvantages. The logical backups are useful (e.g. for verification), but not necessary.
    • logical backup once per day using mysqldump --all-databases
    • binary dump once per day using Percona innobackupex
  • Local backups of database and configuration (/etc/mysql/) are stored in /var/backups/mysql/ .
  • The binary logs are in /var/log/mysql/binlog.* .
  • We rsync backups and binlogs to linux.cscf /mysql/backups/ once per hour. (The remote backups were on backup.cs /jetstor_p1/mysql/ in the past. They may move back after disk space on backup.cs is re-arranged.)

Backup and Restore Validation

See MySQLBackupValidation (mysql.student.cs) and MySQLHAOperationsManual (mysql.cs)

Overview of the Fail-Over Procedure

mysql.cs

mysql.student.cs

  • Daily cron jobs on production CS MySQL servers currently take an innobackupex backup (and a mysqldump logical backup at different times) to a local filesystem.
  • An hourly cron job on the CS MySQL backup server (see "Remote Backup" in the table above) uses rsync to copy the local backups and the binlogs.

  • As soon as it is known that a fail-over may be needed, reduce the TTL. Well known public CS MySQL servers currently use a DNS CNAME alias. The alias may have a default TTL of 3 hours. Use Infoblox to reduce the TTL to about 5 minutes.
  • If the failed mysql service has replication, fail-over to the slave. This procedure is not documented here.
  • If the failed mysqld is still running, attempt a clean shutdown.
  • If the failed mysql service host is still accessible, attempt to rsync the binlogs to the CS MySQL backup server. See the hourly rsync cronjob on the backup server.

  • A spare MySQL server may already exist. If not, create one using the "Creation" procedure in the MySQL page.
  • Copy the backup files from the backup server to the spare MySQL server. If the files are small, simply copy them all. Otherwise, copy only the latest innobackupex, the latest etcmysql, and all the binlogs starting about one day before the latest innobackupex.
  • On the spare mysql server, stop mysqld. It may be running independent of the service command for testing. Make sure both are stopped. mysqld may take several seconds to shut down.
  • Assuming the spare has no useful mysql data, configuration or logs, just delete them before doing the restore.
  • Restore the mysql configuration from the backups we just copied here. Fix the bind-address to listen only on localhost for testing.
  • Use Percona innobackupex to restore the latest full backup. First, the backup must be "prepared" before it can be restored. This modifies the backup. So, use a temporary copy. Check for "innobackupex: completed OK!".
  • Now, restore from the prepared backup. Check for "innobackupex: completed OK!".
  • After the --copy-back, fix permissions on everything in /var/lib/mysql/.
  • Check the total data size of /var/lib/mysql/ against the innobackupex.
  • The data is now restored to the state of the last full backup.

  • Start mysqld to facilitate further recovery and testing. Use --skip-networking to make sure that users and applications can't connect to the database yet by accident. This assumes there are no unexpected connections from the local host, e.g. cron jobs or local unprivileged users. Additionally, enforce read-only (except for users with the SUPER privilege).
  • Check mysqld is running, not listening for TCP/IP connections, is accepting local connections, and has reasonable status.
  • If the database content was damaged, do a binlog point-in-time restore up to just before the damage was done. Otherwise, do a binlog restore to the end.
  • Optionally, check that the last transaction was applied. Transactions that update date fields or autoincrement fields are especially useful.
  • Optionally, perform other checks that the database is correct.
  • The data is now fully restored (either to the selected point in time, or to the last transaction in the binlogs).

  • Stop mysqld.
  • Modify the mysql configuration to its production-ready state. Usually, the only change is bind-address in /etc/mysql/my.cnf. Usually, it is sufficient to simply comment it out, meaning mysqld will listen for TCP/IP connections on all server host IPv4 interfaces.
  • Start the mysql service. service mysql start
  • On the mysql application host, check that mysql clients can connect to the new server using the application mysql username and password.
  • Optionally, perform other checks that operation is correct.

  • Use Infoblox to change the mysql service alias from the failed server to the new server. Client hosts do not see the change until their copy of the TTL expires.
  • Test that the mysql applications now work correctly.

  • Start backups of the new server.
  • Start monitoring the new server.

Example restore of mysql.student.cs to hot spare LXC container mysql-270.cscf

  • Copy the remote backup to the hot spare. First, create a directory (if it doesn't already exist) to hold the backup.

mysql-270.cscf# mkdir           /var/backups/mysql
mysql-270.cscf# chown root:root /var/backups/mysql
mysql-270.cscf# chmod 700       /var/backups/mysql

  • If the database is small, simply copy the entire mysql backup for that host.

linux.cscf# du -sh /mysql/backups/mysql.student.cs
400M    /mysql/backups/mysql.student.cs 

linux.cscf# time rsync -a /mysql/backups/mysql.student.cs mysql-270.cscf:/var/backups/mysql
real 0m31.161s
user 0m8.341s
sys 0m1.897s 

  • Go to backups on the hot spare and clean-up old unneeded data.

mysql-270.cscf# cd /var/backups/mysql/mysql.student.cs

mysql-270.cscf# find . -maxdepth 2 \( -name "mysqldump-*.sql" -o -name "*-*-innobackupex" -o -name "*-*-etcmysql.tar" \) -mtime +6 -exec rm -rf {} \; 

  • Ensure that mysqld is not running. If it was started from boottime or the service command, that will look like this.

mysql-270.cscf# service mysql status
mysql start/running, process 11404

mysql-270.cscf# service mysql stop
mysql stop/waiting

mysql-270.cscf# ps -elf | grep '[m]ysql'
[nothing] 

  • Otherwise, it could look like this.

mysql-270.cscf# service mysql status
mysql stop/waiting

mysql-270.cscf# ps -elf | grep '[m]ysql'
4 S root      6820     1  0  80   0 -  1100 wait   Jul14 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --skip-networking --read-only
4 S mysql     7255  6820  0  80   0 - 121115 poll_s Jul14 ?       00:05:53 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --skip-networking --read-only --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306

mysql-270.cscf# kill 7255

mysql-270.cscf# ps -elf | grep '[m]ysql' 
[nothing]

  • Save and replace the mysql configuration. Until we're ready to go live, change it so mysqld listens only on localhost.

mysql-270.cscf# mv /etc/mysql /etc/mysql.`date +%s`

  • Pick the latest configuration backup to restore from.

mysql-270.cscf# ls -l /var/backups/mysql/mysql.student.cs/*-etcmysql.tar | tail -1
-r-------- 1 root root 20480 Aug  6 05:47 /var/backups/mysql/mysql.student.cs/mysql070-1438854421-etcmysql.tar

mysql-270.cscf# tar --directory / -xvf `ls -1d /var/backups/mysql/mysql.student.cs/*-etcmysql.tar | tail -1`
etc/mysql/
etc/mysql/conf.d/
etc/mysql/conf.d/mysqld_safe_syslog.cnf
etc/mysql/conf.d/student_cs_uwaterloo_ca.cnf
etc/mysql/debian-start
etc/mysql/debian.cnf
etc/mysql/my.cnf
etc/mysql/nagios_check.cnf

  • Compare the previous configuration with what we just installed, to remind us to uncomment the bind-address.

mysql-270# diff -r `ls -d /etc/mysql* | tail -1` /etc/mysql
diff -r /etc/mysql.1438887704/my.cnf /etc/mysql/my.cnf
48c48
< bind-address           = 127.0.0.1
---
> #bind-address           = 127.0.0.1

  • Uncomment the bind-address.

mysql-270.cscf# vi /etc/mysql/my.cnf

mysql-270.cscf# diff -r `ls -d /etc/mysql* | tail -1` /etc/mysql
[nothing] 

Note: There may be additional configuration differences. Check to make sure that they look reasonable.

  • Save the mysql data directory and create an empty one.

mysql-270.cscf# mv /var/lib/mysql /var/lib/mysql.`date +%s`
mysql-270.cscf# mkdir             /var/lib/mysql
mysql-270.cscf# chown mysql:mysql /var/lib/mysql
mysql-270.cscf# chmod 700         /var/lib/mysql 

  • Use Percona innobackupex to restore the latest full backup. First, the backup must be "prepared" before it can be restored. This modifies the backup. So, use a temporary copy. Check for "innobackupex: completed OK!".

mysql-270.cscf# ls -ldtr /var/backups/mysql/mysql.student.cs/mysql070-*-innobackupex | tail -1
dr-x------ 26 root root 4096 Aug  6 05:47 /var/backups/mysql/mysql.student.cs/mysql070-1438854421-innobackupex

mysql-270.cscf# innobackupex --use-memory=2G --apply-log `ls -1d /var/backups/mysql/mysql.student.cs/mysql070-*-innobackupex | tail -1`
...
InnoDB: Shutdown completed; log sequence number 3004546
150806 15:10:52  innobackupex: completed OK!

  • Now, restore from the prepared backup. Check for "innobackupex: completed OK!".

mysql-270.cscf# innobackupex --copy-back `ls -1d /var/backups/mysql/mysql.student.cs/mysql070-*-innobackupex | tail -1`
...
150806 15:13:01  innobackupex: completed OK!

mysql-270.cscf# chown -R mysql:mysql /var/lib/mysql/. 

  • The data is now restored to the state of the last full backup. We next use a point-in-time recovery to bring it up to the latest transactions recorded in the backups of the mysql "binary logs". First, start mysqld.

mysql-270.cscf# mysqld_safe --skip-networking --read-only &
[1] 11768
150806 15:15:45 mysqld_safe Can't log to error log and syslog at the same time.  Remove all --log-error configuration options for --syslog to take effect.
150806 15:15:45 mysqld_safe Logging to '/var/log/mysql/error.log'.
150806 15:15:45 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

mysql-270.cscf# ps -elf | grep '[m]ysql'
4 S root     11768 11519  0  80   0 -  1100 wait   15:15 pts/0    00:00:00 /bin/sh /usr/bin/mysqld_safe --skip-networking --read-only
4 S mysql    12215 11768  0  80   0 - 104696 poll_s 15:15 pts/0   00:00:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --skip-networking --read-only --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 

  • Next, examine the binary logs and database state. Determine which binary logs to use and the start position.

mysql-270.cscf# cd /var/backups/mysql/mysql.student.cs

mysql-270.cscf# cat `ls -1d /var/backups/mysql/mysql.student.cs/mysql070-*-innobackupex | tail -1`/xtrabackup_binlog_info
binlog.000137 24735

mysql-270.cscf# ls binlog.[0-9]* | awk '$0 >= "binlog.000137"'
binlog.000137
binlog.000138

mysql-270.cscf# mysqlbinlog `ls binlog.[0-9]* | awk '$0 >= "binlog.000137"'` --start-position=24735 | wc
  473  2105 17691

mysql-270.cscf# mysqladmin --defaults-file=/etc/mysql/debian.cnf status
Uptime: 521  Threads: 1  Questions: 2  Slow queries: 0  Opens: 33  Flush tables: 1  Open tables: 26  Queries per second avg: 0.003

mysql-270.cscf# mysql --defaults-file=/etc/mysql/debian.cnf
...
mysql> SHOW BINARY LOGS;
+---------------+-----------+
| Log_name      | File_size |
+---------------+-----------+
| binlog.000050 |       147 |
| binlog.000051 |       147 |
| binlog.000052 |       147 |
| binlog.000053 |       147 |
| binlog.000054 |       147 |
| binlog.000055 |       147 |
| binlog.000056 |       147 |
| binlog.000057 |       147 |
| binlog.000058 |       147 |
| binlog.000059 |       147 |
| binlog.000060 |       126 |
| binlog.000061 |       107 |
+---------------+-----------+
12 rows in set (0.00 sec)

mysql> SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+---------------+----------+--------------+------------------+
| binlog.000061 |      107 |              |                  |
+---------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> select * from nagios_check.last_update;
+-----------------+---------------------+
| client_hostname | server_time         |
+-----------------+---------------------+
| ubuntu1404-202  | 2015-08-06 05:44:01 |
+-----------------+---------------------+
1 row in set (0.00 sec)

mysql> Bye

  • Now, apply all the transactions recorded in the binary logs since the full backup.

mysql-270.cscf# mysqlbinlog `ls binlog.[0-9]* | awk '$0 >= "binlog.000137"'` --start-position=24735 | mysql --defaults-file=/etc/mysql/debian.cnf 

  • Check the updated database state. Note nagios_check.last_update server_time matches the latest binlog file modification time.

mysql-270.cscf# mysql --defaults-file=/etc/mysql/debian.cnf
...
mysql> SHOW BINARY LOGS;
+---------------+-----------+
| Log_name      | File_size |
+---------------+-----------+
| binlog.000050 |       147 |
| binlog.000051 |       147 |
| binlog.000052 |       147 |
| binlog.000053 |       147 |
| binlog.000054 |       147 |
| binlog.000055 |       147 |
| binlog.000056 |       147 |
| binlog.000057 |       147 |
| binlog.000058 |       147 |
| binlog.000059 |       147 |
| binlog.000060 |       126 |
| binlog.000061 |      8489 |
+---------------+-----------+
12 rows in set (0.00 sec)

mysql> SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+---------------+----------+--------------+------------------+
| binlog.000061 |     8489 |              |                  |
+---------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> select * from nagios_check.last_update;
+-----------------+---------------------+
| client_hostname | server_time         |
+-----------------+---------------------+
| ubuntu1404-202  | 2015-08-06 13:59:02 |
+-----------------+---------------------+
1 row in set (0.00 sec)

mysql> Bye

mysql-270.cscf# ls -l /var/backups/mysql/mysql.student.cs/binlog.0* | tail -1
-rw-rw---- 1 mysql adm    7967 Aug  6 13:59 /var/backups/mysql/mysql.student.cs/binlog.000138 

  • The database is now restored to the latest information we have in the remote backups. Next, we should do some sanity checks on the data and mysqld.

To be continued ...

Old restore notes

  • Stop mysqld. service mysql stop
  • Empty mysql data directory. mv /var/lib/mysql /var/lib/mysql.old && mkdir /var/lib/mysql && chown mysql:mysql /var/lib/mysql && chmod 700 /var/lib/mysql
  • The backup must be "prepared" before it can be restored. This modifies the backup. So, use a temporary copy. Check for "innobackupex: completed OK!" E.g.: innobackupex --use-memory=2G --apply-log /copy/of/backup/hostname-timestamp-innobackupex
  • Restore the backup. Check for "innobackupex: completed OK!" E.g.: innobackupex --copy-back /copy/of/backup/hostname-timestamp-innobackupex
Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2024-10-28 - MariHassanzada
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback