Meeting 3 March 2016, 2pm
Attended: drallen a2brenna fhgunn ldpaniak
- Progress on Milestones
- Timeline
- Brief summary since the last meeting
Progress on Milestones:
- Backups set up and tested: (Due-date reset to today): Remaining: testing restore
- fraser now redundantly backs up
data, for ease-of-use
- moving inventory- likely not finished until Friday (11th), a few days more slip.
- Schedule shows wrapup April 4 (!) where:
- Mon 14 Mar - Fri 18 Mar is moving the rest of apps (Fraser and Daniel);
- Fri 18 Mar - Tue 22 Mar is benchmarking apps (Lori and Daniel);
- Tue 22 Mar - Tue 29 Mar is finalizing monitoring, tuning, maintenance documentation (Anthony, Fraser, Daniel) - Fraser only available until 24th. (here the following week but realtime lab may interfere).
- Tue 29 Mar - Mon 4 April is wrapup: Fixing remaining issues, writing up lessons learned, proposed plans for student cluster, handoff to Ken.
- Can we gain time back?
Brief summary since the last meeting, and upcoming week ("this week")
Fraser: pt-table-checksum is tested (see ST#103822 Fri, Mar 4 2016 12:39). Not currently run under cron. Should we?
- pt-table-checksum can break replication, but Fraser can include checks to avoid that. He will work on this. (monitoring item - ST#104108 )
- inventory is again copied and running; needs testing
- Fraser will try slaving another db from mysql-172 to have user-transparent copy. (this is not critical for this project but will be helpful for phase two- are there more critical work?)
- Yes: Fraser merge configuration from old to new - which Anthony then needs to move to salt. (new item for this)
- and fixing pt-table-checksum
- config work with Anthony
Daniel: testing inventory.
- Fraser has expanded/clarified Daniel's test plan from ST#103822, put in MySQLHATesting
- Daniel to follow these tests and put results in ST#103822
Anthony: packaging scripts done; mysql changes pulled into salt; 10gb need to restart.
- Anthony to check ona, then tell Lori if it's a hw problem.
- then test backup by blowing away 104 (and write docs)
- will check with Dave about data-integrity regarding mount options for ext4.
- Daniel will get Anthony the IP to add to mysql-102 for mysqltest (ST#104139)
Lori: got Devon's nagios scripts and starting to code them. Fraser to advise Lori on percona additions to monitoring
How is recovery implemented? restarting automatically has a failure mode if master goes down.
Proposal: set auto_start to no on all three; sysadmin goes to mysqladmin on the container to restart manually.
Anthony will write the first draft of the manual.
- the checklist will be a work in progress
- but we will do the basics now before deployment
Adding to the list of things to do later: change databases from myisam to innodb . fraser wants to consider how to make mysqld to go down and stay down when rebooting.
-- DanielAllen - 2016-03-09