CS Linux Working Group Meeting - 2021-01-12


Dave, Adrian, Anthony, Clayton, Fraser, Gordon, Guoxiang, Lori, Nathan


Reboot policy / schedule

Resolution RT#1129029: each host administrator maintains a "/etc/uwcs-cscf.d/reboot-info" file that in one of three states:

    • doesn't exist: means the file hasn't been setup yet. Unknown whether it's safe to reboot. Talk to contacts.
    • empty: means safe to reboot
    • contains contents: Not safe to reboot, follow instructions or talk to contacts.
  1. Have molly-guard display virtual systems running on a private cloud node when double checking that "reboot" is desired.

Problem: Container hosts regularly hitting extreme uptimes which means uninstalled security updates.

  • Is the live patching (Ksplice) doing anything for us?
    • are troubles documented?
    • what about software updates that require a reboot, but are not kernel updates?

  • Reboot a IAAS host must be a "planned" event?
    • we have virtual hosts in the private cloud that cannot be freely restarted without manual intervention
    • we have "clusters" that do not dynamically promote a slave unit when the active node fails.

Having all the server room vlans

Overall the group feels it is worth the additional $50k to switch the OSPF area 4 ABR's to Cisco gear located in IST MC and EC2 server room (generator power and redundant A/C setup)
Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2021-01-13 - FraserGunn
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback