CS Linux Working Group Meeting - 2021-01-12
Attendees
Dave, Adrian, Anthony, Clayton, Fraser, Gordon, Guoxiang, Lori, Nathan
Agenda
Reboot policy / schedule
Resolution RT#1129029: each host administrator maintains a "/etc/uwcs-cscf.d/reboot-info" file that in one of three states:
-
- doesn't exist: means the file hasn't been setup yet. Unknown whether it's safe to reboot. Talk to contacts.
- empty: means safe to reboot
- contains contents: Not safe to reboot, follow instructions or talk to contacts.
- Have molly-guard display virtual systems running on a private cloud node when double checking that "reboot" is desired.
Problem: Container hosts regularly hitting extreme uptimes which means uninstalled security updates.
- Is the live patching (Ksplice) doing anything for us?
- are troubles documented?
- what about software updates that require a reboot, but are not kernel updates?
- Reboot a IAAS host must be a "planned" event?
- we have virtual hosts in the private cloud that cannot be freely restarted without manual intervention
- we have "clusters" that do not dynamically promote a slave unit when the active node fails.
Having all the server room vlans
Overall the group feels it is worth the additional $50k to switch the OSPF area 4 ABR's to Cisco gear located in IST MC and EC2 server room (generator power and redundant A/C setup)