When getting email notices about system failures, check the "from" address, if it's not icinga.cscf.uwaterloo.ca, then it's not the CSCF central monitoring service and you'll need to talk to the "Admin Contact" or Service expert of the host the email came from.
Private Cloud Nodes disks (partitions) layout
Drive Assignments
/ (root distribution only drive with empty /var) - minimum 1TB
Space for virtual hosts get their own drive
/srv/virtual_services is the disk mount - minimum ~2TB, current standard is greater than 3.5TB
/srv/virtual_services/{lxc,libvirt,...} get bind mounted to /var/lib/{lxc,libvirt,...}
should /var/log be it's own volume?
mixed opinions
ZFS - not for base OS setups, use where necessary (where very large volumes, check summing, incremental snapshots,... are needed).
Documenting the amount of minimum "free" space required
These hosts are in a critical state and disk space needs to be dealt with ASAP
MC-3015-201.cloud.cs, mysql servers host nodes, MC-3015-211.cloud.cs
Monitoring of disk space
It is up to the hardware/service "Admin Contact" to say what is to monitored and how, then engage the Monitoring (Devon) and SaltStack (Anthony,Nathan) via RT to see that it gets implemented.
Friday, Mar 19 MC 3015 equipment failures.
Tried replacing the batteries in rack beside DataSci system, on power on, the UPS "inverter?" smoked. This also fried the sPDU and one of six equipment power supplies. Details to be documented later this week.
Saturday's Mar 20 power outage post mortem:
Preliminary Shutdown schedule:
Friday morning: reduce DNS round-robin of linux.student.cs
Add MC servers back to DNS round-robin of linux.student.cs
Network Redundancy isn't anymore,
What can/should CSCF do about this?
An [https://rt.uwaterloo.ca/Ticket/Display.html?id=1144540][RT #1144540]] about physics server room redundancy (temporarily being disabled, 2 years ago).
Is CSCF monitoring of IST provided services requried, ie establishing quality-of-service indicators for services IST is providing CS?
Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.