As of Tuesday 12 April 2016, the deliverables are either complete, or very minor steps away from completion.
A formal project plan and planning process contributed to keeping this project on target. We also benefited from having a project manager who was responsible for keeping track of timeline, individual responsibilities, STs being updated, and coordination. The project manager role was also helpful because all three of CSCF's groups contributed resources to this project.
Compared to many CSCF projects, this project involved considerable communication within the team. We had 12 weekly meetings for setting strategy and coordinating work. Focused meetings proved very helpful toward project success, and we got better at focus as the project progressed. We also set up and followed communication plans for alerting our clients, including the rest of CSCF, elsewhere in CS, and with MFCF.
Early on, we agreed that successful completion would require good maintenance documentation, which we did with "cross-training" between project members to verify that it was complete and sufficiently specific. Writing and reviewing documentation has been a substantial portion of the project and is nearly complete as of Tuesday 12 April.
Throughout this project, CSCF managers encouraged the project members and gave us sufficient time to work on this priority. Without this support, the project would likely have failed to complete in the allotted time. In early April a higher priority for the School required a project member for much of his remaining time, and we were able to re-assign tasks among the other team members to accommodate.
All members of the team reported learning many technical details about mysql and its available toolsets such as Percona. We have recorded below some areas of useful further investigation that we decided were out of scope for this project. A critical area of discovery: mysqldump
does not, by default, dump stored procedures for a database. It will do so with an optional
--routines
parameter. Investigation does not explain why this is the default versus dumping stored triggers by default. This caused the only data-loss during the database move, for one application database.
In retrospect, the project design document would have been strengthened by a more comprehensive list of risks and mitigations; we might have uncovered some of the areas that we didn't discover and discuss until later in the project, such as: "Risk: setting up a newer mysql requires value judgements about configurations that were set on a previous version. Mitigation: spending enough time to make good choices about configurations to copy or keep the mysql default."
Finally, keeping the project sponsor in the loop throughout the project is important; as Project Manager, I think I should've appraised the Sponsor about the project spec as soon as it was settled in the first few weeks, since the spec was changed from what the Sponsor thought we were planning to do this term.
A potential future project is replicating the process in order to set up a dedicated mysql cluster for marmoset, which also could benefit from an automated failover process; CS students would find this a visible improvement over the current setup. While a replica of this project, as-is, would be a quick project, it would not solve the problem of automated failover. Either one would be an improvement over the status-quo for marmoset.
A separate future project is developing a similar process for setting up a postgres cluster. However, the technical details are quite different. We would encourage that a formal project plan and planning process be followed for that project.
None of the following items yet have ST items, and none of these are seen as high priorities for CSCF for Spring 2016.
service mysqld start
doesn't know whether it's a master or slave. We avoid risk of two active masters ruining the database, by writing the maintenance docs to require disabling mysql on a downed master; it would be more foolproof to have an accurate automated method to tell the master from the slaves.
pt-table-sync
- we use 1 of 3 modes of operation. There might be a faster method of recovery that we haven't investigated yet.
-- DanielAllen - 2016-04-08