Linux Working Group
Meeting Date
Invitees - Attendees
- Dave, Anthony, Adrian, Clayton, Guoxiang, Fraser, Lori, Nathan, Devon, Nick
Review and accept previous meeting minutes.
Proposed Agenda Items
Switching linux.student.cs and linux.cs to 20.04
Switch Date: Monday May 3, 2021
- Change the linux{.student,}.cs.uwaterloo.ca CName to point ubuntu2004-*{.student,}.cs.uwaterloo.ca
Other Important Dates:
- Monday April 19 - Exams start.
- Monday April 26 - Winter last day of exams. (Revised from April 24 due to COVID-19.)
- ???? - Grades need to be submitted.
- Monday May 3 - Spring Term ISG co-ops start work
- Monday May 10 - Spring classes begin. (Revised from May 3 due to COVID-19.)
Things to do, when:
April 22: Rack two new hardware (dual 32 core AMD Ryzen systems)
April 23: Remove all but a pair of redundant ubuntu1804-*{.student,}.cs.uwaterloo.ca
from the ubuntu1804{.student,}.cs.uwaterloo.ca round-robin DNS.
(Check that min 500GB
SSD OS drive exists.)
April 27: Have the new hardware (ubuntu2004-00{2,4}.student.cs?) systems ready for production
and used to initiate the ubuntu2004.student.cs.uwaterloo.ca DNS round-robin
April 27: Turn off ubuntu1604-*{.student,}.cs.uwaterloo.ca systems.
April 29: Reinstall the ubuntu1804-*{.student,}.cs.uwaterloo.ca that were removed from the
DNS round-robin with Ubuntu 20.04
May 24: Reinstall ubuntu1604-*{.student,}.cs.uwaterloo.ca systems with Ubuntu 20.04
iff no critical reason for us to continue providing an ubuntu1604 service.
(Check that min 500GB
SSD OS drive exists.)
What still needs to be done before the Summer term?
User communication - Omar and/or Lawrence to get announcement out.
Testing by users ASAP - how can we encourage this?
- new ISA's starting May 3 - make sure they're aware that OS distro was upgraded and to please test things they need asap.
More Ubuntu 20.04 general-use servers to reach 6 systems for teaching and 2 systems for cs-general regions.
- The current single container is NOT to be in round robin DNS and is in addition to the 6 and 2 systems. (These are for pre-production new service setup and integration testing; includes the SaltStack recipe development)
- Will order fiscal year 2021-2022 general-use servers ASAP (2 for linux.student.cs and 1 for linux.cs)
- Get the broken general-use Supermicro system sent in for warranty repair once shutdown is over.
- Can we detect and auto-heal servers being killed by vscode 150k threads bug?
- Get icinga to monitor and graph the number of runnable threads and (more importantly) total number of threads. - Fraser, Anthony and Devon
Dynamic round-robin DNS for student.cs login systems? (ldpaniak)
- can we automate the removal of unhealthy systems from round-robin DNS?
- left as something for people to think about.
icinga.cscf ubuntu1804-010 USERS WARNING - 22 users currently logged in (fhgunn)
- We get many of these alerts per day. That's bad.
- The warning threshold (20 users) is too low. It is normal for the number of users on a linux.student.cs server to be 100 to 300. It would make more sense to alert if the number is below 100
- The number of users on ubuntu1804-010 is low only because it was rebooted recently and was out of the round-robin.
- Fixed. Devon greatly increased the warning and critical thresholds from the defaults.