Meeting 27 January 2016, 2pm

Attended: drallen a2brenna fhgunn ldpaniak.

Lori noted the "Update Server Room High Speed Network Interconnects" might be time-consuming and cautioned it might not be completely independent (ie, necessary for the cluster to be production-ready). He volunteered to do the mellanox deployment. Unknown status; Daniel will ask Dave.

Database hardware status:

  • hardware racked, powered, network connectivity? hostnames? inventory? Yes. Anthony reports:
  • ubuntu14.04; 3.16+ kernel
   mc-3015-411.cloud.cs.uwaterloo.ca 
   dc-3558-411.cloud.cs.uwaterloo.ca
   m3-3101-411.cloud.cs.uwaterloo.ca

These are in inventory (eg. mc-3015-441.cloud.cs ) but not in Machine Notes.

  • networking isn't working quite as expected- but is currently accessible via linux.cscf
    • Anthony ran into issues with networking on the salted system; he will need followup with Dave. Lori will deliver example ubuntu stanzas to Anthony.
  • active/passive replication works - via Clayton's instructions. manual switchover from passive to active.
  • "Install Cluster Mysql container and with mysql aplication and ppa's" - is done.
  • Anthony is working on "Manage Mysql configuration (N-node cluster versus single instance)" - ongoing issue with connecting to the systems via ssh under salt. This was the Feb 1. item... he will have this done for Wed Feb 3.
  • Manual failover seems prone to excess downtime. Can we do some sort of "active-active" setup instead of "Active-passive"? Anthony cautions this might be incompatible with three servers- he'll look into "active-active-active."

  • Where is the documentation for yubikey? ST#98955 refers to https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-in-mysql - which Anthony used; and there is also info about active-active.
  • Anthony's work on "Manage Mysql configuration (N-node cluster versus single instance)" ran into an issue with mysql auth ; Fraser offered ubuntu/mysql auth advice. Anthony will approach him later when he's tried a few things.
  • Benchmark current mysql.cs performance - Fraser and Lori to do this- initial low-level tests before we next meet. (mysql service log has details; Lori will ask Fraser)
    • can we track end-user usage stats? - can we instrument inventory, marmoset to make charts? ST already has this per-request... not averages.
    • however, speedups are not as critical a part of the project as HA. We could still do "cheap" benchmarks without taking too much staff time.

We don't yet have timing info on the migration and further process. So we can't estimate how far from done we are.

Migrating data:

  • will move inventory using the active/passive process described in the cluster documentation- Anthony to do for inventory; we will need to be able to teach someone in TOP- probably Nick. (Daniel to confirm with Omar).

Testing failover:

  • Can we do "active active active"? Likely not, but Anthony is investigating.
  • Can we set a realistic goal of 99.999% uptime? (equivalent to 25.9 seconds downtime per month / 5.26 minutes per year)

Near the end of the meeting, Lori noted that asynchronous sync has a chance of data-loss if a master goes down before it is fully synced with the slaves. This doesn't look like a good definition of "Highly Available" to him. A question for management, since Dave wasn't here: Is this good enough? Or is some synchronous solution a minimum requirement? Lori suggested we might make one cluster with the existing plan for asynchronous, and then consider building out synchronous for the second cluster (over the summer). If Lori and Dave can't come to agreement soon, Daniel and Dave to ask Ken.

Next meeting: 2pm Wed 3 Feb. 2016

Topic revision: r3 - 2016-02-17 - DanielAllen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback