Meeting 3 March 2016, 2pm

Attended: drallen a2brenna fhgunn ldpaniak


  • Progress on Milestones
  • Timeline
  • Brief summary since the last meeting

Progress on Milestones:

  • Backups set up and tested: (Due-date reset to today): Remaining: testing restore
    • fraser now redundantly backs up pt-show-grants data, for ease-of-use


  • moving inventory- likely not finished until Friday (11th), a few days more slip.
    • Schedule shows wrapup April 4 (!) where:
      • Mon 14 Mar - Fri 18 Mar is moving the rest of apps (Fraser and Daniel);
      • Fri 18 Mar - Tue 22 Mar is benchmarking apps (Lori and Daniel);
      • Tue 22 Mar - Tue 29 Mar is finalizing monitoring, tuning, maintenance documentation (Anthony, Fraser, Daniel) - Fraser only available until 24th. (here the following week but realtime lab may interfere).
      • Tue 29 Mar - Mon 4 April is wrapup: Fixing remaining issues, writing up lessons learned, proposed plans for student cluster, handoff to Ken.
    • Can we gain time back?

Brief summary since the last meeting, and upcoming week ("this week")

Fraser: pt-table-checksum is tested (see ST#103822 Fri, Mar 4 2016 12:39). Not currently run under cron. Should we?

  • pt-table-checksum can break replication, but Fraser can include checks to avoid that. He will work on this. (monitoring item - ST#104108 )
  • inventory is again copied and running; needs testing
  • Fraser will try slaving another db from mysql-172 to have user-transparent copy. (this is not critical for this project but will be helpful for phase two- are there more critical work?)
    • Yes: Fraser merge configuration from old to new - which Anthony then needs to move to salt. (new item for this)
      • and fixing pt-table-checksum
  • config work with Anthony

Daniel: testing inventory.

  • Fraser has expanded/clarified Daniel's test plan from ST#103822, put in MySQLHATesting
    • Daniel to follow these tests and put results in ST#103822

Anthony: packaging scripts done; mysql changes pulled into salt; 10gb need to restart.

  • Anthony to check ona, then tell Lori if it's a hw problem.
  • then test backup by blowing away 104 (and write docs)
  • will check with Dave about data-integrity regarding mount options for ext4.
  • Daniel will get Anthony the IP to add to mysql-102 for mysqltest (ST#104139)

Lori: got Devon's nagios scripts and starting to code them. Fraser to advise Lori on percona additions to monitoring


How is recovery implemented? restarting automatically has a failure mode if master goes down. Proposal: set auto_start to no on all three; sysadmin goes to mysqladmin on the container to restart manually. Anthony will write the first draft of the manual.

  • the checklist will be a work in progress
  • but we will do the basics now before deployment

Adding to the list of things to do later: change databases from myisam to innodb . fraser wants to consider how to make mysqld to go down and stay down when rebooting.

-- DanielAllen - 2016-03-09

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2016-03-09 - DanielAllen
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback