MySQLHAMovingDatabase < CF

CF Web>MySQLHAOperationsManual>MySQLHAMovingDatabase (2016-04-08, DanielAllen)

Move Some (but not all) Databases from Old MySQL server to New HA Cluster

This section assumes:

mysql 5.5
the high-availability architecture is one master with two slaves using asynchronous replication
102 denotes mysql-102.cs, the first slave of 104
104 denotes mysql-104.cs, the master
106 denotes mysql-106.cs, the second slave of 104
172 denotes mysql-172.cs, the old server
mysql.cs is a CNAME or extra A record for mysql-172.cs
mysqltest.cs is a CNAME or extra A record for mysql-104.cs

First, copy the selected databases from the old server to the new (and prepare for replication). This is not time sensitive.

On the old mysql server:
- In general, you can use mysqldump to dump the databases you want to move. For the March 2016 move of mysql.cs, use mysqlbackupgrants to dump all the grants and mysqldumpexcept to dump all the databases except a few we need to ignore. Run the dump twice, separated by enough time to get an unambiguous binlog position.
  - PATH="/usr/local/lib/mysql/bin:$PATH"
  - mysqlbackupgrants ; mysqldumpexcept ; echo ; sleep 5 ; mysql --defaults-file=/etc/mysql/debian.cnf -e 'show master status' ; echo ; sleep 5 ; mysqldumpexcept ; mysqlbackupgrants
- Compare the dumps.
  - diff `ls -tr /var/backups/mysql/*-show-grants.sql | tail -2`
  - diff -rs `ls -dtr /var/backups/mysql/*-mysqldumpexcept | tail -2`
- Repeat the above dump and compare until there is no difference in the dumps. For the current load on mysql.cs, this almost always succeeds the first time.
- Note the binlog file name and position, for use in the CHANGE MASTER command later.
- If it doesn't already exist, grant replication slave for the new master.
  - slavepassword=`dd if=/dev/random status=none bs=24 count=1 | base64 | tr -d /+`
  - ```
  mysql --defaults-file=/etc/mysql/debian.cnf -e "GRANT REPLICATION SLAVE ON *.* TO 'mysql-104_slave'@'mysql-104.cs.uwaterloo.ca' IDENTIFIED BY '$slavepassword' "
```
- if there is already a replication slave, note the password recorded on each host in /etc/mysql/{$host}_slave.cnf (the passwords are the same for all three hosts).
On asgard:
- Wait for the hourly rsync cron job to copy your dumps. Or rsync it now.
  - rsync -a mysql.cs:/var/backups/mysql/ /mysql/backups/mysql.cs
- Then, copy one of the identical dumps to the new master.
  - thegrants=`ls -tr /mysql/backups/mysql.cs | grep .-show-grants.sql | tail -1`
  - echo thegrants=$thegrants ; ls -l /mysql/backups/mysql.cs/$thegrants # found the correct grants?
  - thedump=`ls -tr /mysql/backups/mysql.cs | grep .-mysqldumpexcept | tail -1`
  - echo thedump=$thedump ; ls -ld /mysql/backups/mysql.cs/$thedump # found the correct dump?
  - tmpdir=`ssh mysqltest.cs mktemp -d`
  - echo tmpdir=$tmpdir # got a good tmpdir?
  - rsync -a /mysql/backups/mysql.cs/$thegrants /mysql/backups/mysql.cs/$thedump mysqltest.cs:$tmpdir
On the new master:
- Init.
  - thegrants=value-from-above
  - thedump=value-from-above
  - tmpdir=value-from-above
  - cd $tmpdir
- Load the grants needed by the selected databases(s). E.g.:
  - (Details to be added later...)
- Load the selected database(s). E.g.:
  - ls -l $thedump/equipment.sql # looks OK?
  - mysql <$thedump/equipment.sql

Second, start replication of the copied database(s) from the old server to the new.

On the new master:
- Start replication from the old mysql server.
  - CHANGE MASTER TO MASTER_HOST='mysql-172.cs.uwaterloo.ca', MASTER_USER='mysql-104_slave', MASTER_PASSWORD='slavepassword-from-above', MASTER_LOG_FILE='binlog-File-from-above', MASTER_LOG_POS=Position-from-above;
  - Create a temporary config file.

mysql-104# cat /etc/mysql/conf.d/temp-slave.cnf
# Configuration used only while we are moving the indicated database to this host.
[mysqld]
replicate-do-db	= 'equipment'

- - Restart mysqld service mysql restart
  - Confirm the slave has caught up mysql -e 'show slave status\G'
On the old mysql server (172), set up grants (if they don't already exist) for pt-table-checksum.
- GRANT PROCESS, REPLICATION SLAVE, SELECT, SUPER ON . TO 'pt_checksum'@'localhost' IDENTIFIED BY PASSWORD '*33...9E';
- GRANT ALL PRIVILEGES ON `percona`.* TO 'pt_checksum'@'localhost';
- GRANT PROCESS, REPLICATION SLAVE, SELECT, SUPER ON . TO 'pt_checksum'@'mysql-%.cs.uwaterloo.ca' IDENTIFIED BY PASSWORD '*33...9E';
- GRANT ALL PRIVILEGES ON `percona`.* TO 'pt_checksum'@'mysql-%.cs.uwaterloo.ca';
On the old server, check that old and new servers have identical copies of the database being moved.
- 172# pt-table-checksum --defaults-file=/etc/mysql/pt_checksum.cnf --replicate=percona.checksums --databases equipment --no-check-binlog-format --no-check-replication-filters
On the new server, check slave is still OK, because with this temporary setup, some types of pt-table-checksum errors can break replication.
- 104# mysql -e 'show slave status\G'

At this point, the old and new servers have identical copies of the database, clients are using the old server, and any changes on the old server are replicated to the new.

Optionally, check the stability of the temporary slave setup by leaving it running overnight.

The following preparation should be done shortly before client visible downtime starts.

If telling the clients to use a different server is done via DNS changes:
- Find the current TTL for the server hostname clients use. It should already be low (5 minutes is typical), not the default of 3 hours.
- Decrease the TTL to 1 minute. This is a major part of client visible downtime.
- Wait for the previous TTL to expire.
Take one more backup in case of disaster in the switch-over to follow.
- 172# /usr/local/lib/mysql/bin/mysqlbackupgrants ; /usr/local/lib/mysql/bin/mysqldumpexcept
Immediately before moving the clients from the old server to the new, check that they are still synchronized.
- 104# mysql -e 'show slave status\G' Did replication break overnight?
- 172# pt-table-checksum --defaults-file=/etc/mysql/pt_checksum.cnf --replicate=percona.checksums --databases equipment --no-check-binlog-format --no-check-replication-filters Did the two copies of the database diverge overnight?
- 104# mysql -e 'show slave status\G' Did pt-table-checksum break replication now?

Caution: Client visible downtime starts here. For most of the downtime, clients of all the mysql databases on the old and new servers can read, but not write the database. This section should be done quickly, but carefully. Any mistakes may cause permanent data loss and extended down time.

You will need at least 2 windows. Each of the old and new server will need its own mysql session.

Make the old server read-only! (except for threads with SUPER or REPLICATION SLAVE privilege). This prevents clients from losing data by using the old server after replication stops.
- 172# mysql> SET GLOBAL read_only = ON;
Make the new server read-only! (except for threads with SUPER or REPLICATION SLAVE privilege). This prevents clients from losing data by using the new server before it is ready.
- 104# mysql> SET GLOBAL read_only = ON;
Show binlog position on old server.
- 172# mysql> SHOW MASTER STATUS;
Check that the new server is caught up (at least check: binlog file and position match above; 0 seconds behind master).
- 104# mysql> show slave status\G
Check that old and new servers have identical copies of the database being moved.
- 172# pt-table-checksum --defaults-file=/etc/mysql/pt_checksum.cnf --replicate=percona.checksums --databases equipment --no-check-binlog-format --no-check-replication-filters
Tell clients to stop using the old server and start using the new. Depending on the database(s) being moved, use the correct one of the following methods:
- Change the server hostname in all appropriate client config.
- Or, use infoblox to delete and recreate the appropriate CNAME pointing to the new server.
- Or, use infoblox to delete and recreate the appropriate virtual host (extra A record) with the IP address of the new server. Fix the PTR if it is missing or incorrect.
  - Infoblox operations on A records may be performed via the Infoblox Command Line Interface. Read the instructions first; then as root on www152.cs you can su cs-inv and run: ~/infoblox-scripts/ibclispeedy -e 'conf zone "" modify host {hostname.cs.uwaterloo.ca} {new IP address}'
At this point (if using DNS changes), some clients are using the new server and some are still using the old until their TTL expires. Both servers are still read-only.

IMPORTANT: Stop replication on the new server! This prevents the database from being dropped on the new server in the next step.
- 104# mysql> STOP SLAVE; RESET SLAVE ALL;
Verify that replication has stopped on the new server! The following output should be empty.
- 104# mysql> SHOW SLAVE STATUS\G
On the old server, drop the database being moved! This prevents clients from losing data (or seeing incorrect old data) by accidentally using the old server. After this point, any client still trying to use the moved database on the old server will fail.
- 172# mysql> DROP DATABASE equipment;

On the old server, return to normal operation with the remaining databases.
- 172# mysql> SET GLOBAL read_only = OFF;
On the new server, return to normal operation with addition of the moved database(s).
- 104# mysql> SET GLOBAL read_only = OFF;
Delete the temporary config file on the new server and restart mysqld.
- 104 rm /etc/mysql/conf.d/temp-slave.cnf
- 104 service mysql restart

At this point, client visible downtime is over and all operations should be back to normal (assuming the 1 minute TTL has expired for all clients). Some essential clean-up work remains.

Perform tests to ensure old and new server are working normally.
- Are all the databases present that should be?
- Are all applications working correctly?
- Does the data look correct, e.g. most recent transactions in a history table.
If telling the clients to use a different server is done via DNS changes:
- Change TTL back to normal for the service. 5 minutes is typical.
- If you are using the extra A record method, check that the PTR is correct.

At this point, slaves 102 and 106 probably will have replication holes in the moved database(s).

First, you must run pt-table-checksum on the master to find and record any differences between master and slaves.
- 104# pt-table-checksum --defaults-file=/etc/mysql/pt_checksum.cnf --replicate=percona.checksums --no-check-binlog-format --quiet
Then, if there are diffs, run pt-table-sync to fix them.
- 104# pt-table-sync --defaults-file=/root/.my.cnf --verbose --execute --replicate=percona.checksums h=localhost

Topic revision: r1 - 2016-04-08 - DanielAllen

Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.

Other Webs

My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images

Edit