Linux Working Group 
 Meeting Date 
  
 Invitees - Attendees 
 
-  Adrian, Anthony (group leader), Clayton, Guoxiang, Lori, Fraser, Devon, Nathan, Nick, Todd, Dave, Lawrence, Omar
 Review and accept previous meeting minutes. 
 Proposed Agenda Items 
 Netapp Retirement - Deadline January 2022 
 Migrate remaining data 
 
-  In General 
-  Any progress on moving Dan Berry's /opt/csw? 
-  RSG discuss moving to jerusalem directly (lfolland) 
-  Configure an NFS share via gateways: RT#1194157 (ctucker) (ctucker)
-  Guoxiang - tell us the size of the filesystem
-  Nathan - create an appropriate CephFS
-  Clayton - create the NFS share
 
 
-  New storage for apache web logs? 
-  RT# 1196145 - Will use DFSc since web service already relies on DFSc (for homedirs).  Configuration to be done and verified at end of term reboot. (a2brenna, ijmorlan, nfish) 
-  Anthony feels we're in good shape.  Will update the ticket with next steps to be done at term end
 
 
 
-  TEACHING - Progress reports on 
-  Unmounting /oldhome immediately (gxshen). Appears to be done on all general use servers? 
-  Guoxiang - is this gone now? Yes, done.
 
-  Perform final (end of year backup) of /oldhome data on the Netapp and remove data from Netapp in January (gxshen) 
-  Guoxiang: did a full backup in May 2020
-  /opt/CSCF/packages is also being unmounted (collections of really old packaged software - no longer used). 
-  Appears to be done on all general use servers. (a2brenna)
 
 
 
-  XHIER 
-  fs-homedirs.student.cs.uwaterloo.ca:/regional/.software/regional 
-  Seems to be mounted on only one linux.student.cs machine... Can we remove this? (a2brenna)
-  RT#1067477  
 
-  fs-homedirs.cs.uwaterloo.ca:/regional_core 
-  Seems to be mounted on only one linux.cs machine... Can we remove this? (a2brenna)
-  RT#1067480 * Guoxiang to check with Isaac and Adrian re: Odyssey and web servers * Guoxiang to check with Isaac and Adrian re: Odyssey and web servers
 
 
-  Are we still on track for moving mail in December? 
-  lfolland sent notification for move CS mail to IST on 2021-11-24 - Done 
-  Do not move mail homes to DFSC until after move to IST 
-  lfolland to discuss with sdinney. Progress?
 
-  /var/mail (still on NetApp) 
-  TEACHING is done
-  CS-GENERAL is done
-  Fraser notes that he is still getting some new mail on CS ( can be from CS to CS)
-  Adrian points out there are several reasons mail still comes directly 
-  spammers, Let's Encrypt, other CS mail users, CS systems using mx.cs
-  servers to be changed to send mail directly to the destination - has that happened? Not yet. (a2brenna) 
-  configuration change to postfix required, will need a recipe for other machines that may be setup to do that (arpepper, a2brenna)
 
 
-  need to change CS hosts to send mail directly, rather than through mx.cs 
-  do we have a ticket for this?
-  Anthony and Adrian to find/create a ticket and discuss
-  Anthony has some configurations that may be easily adapted to work 
 
 
 
 Monitoring 
 
-  We now have an Icinga GPU monitoring plugin for nvidia based cards - RT#1198875 
-  Should we look into support for ATI cards? 
-  does not appear that we do - possible future development
 
-  Do we have any Icinga agent hosts that we can use to collect some test data? 
-  basilisk.cs
-  if we add it everywhere, then it should start showing up
-  Anthony will add it to standard installation process
 
-  Anthony has a script for adding Icinga client-side - down to one step 
-  get ticket from Launch Wizard
 
 
-  Reminder - we have a lot of host problems in Icinga that need to be cleaned up to make it more useful 
-  a lot of these are research, I know   
 
-  host groups 
-  Lawrence to create a ticket for Devon with a big list of Host Groups and contacts - needs to be done manually by Devon
 
 CS Teaching - slowness of systems - how to address? (All) 
 
-  Ceph: 
-  Gateway systems (NFS/Samba) upgrade status? Schedule denylisting of old client. Was this scheduled? (ctucker, ldpaniak) 
-  has there been any progress in turning off the old ganesha servers
-  concerns about security and performance
-  Lori - wants us to move to the latest supported software versions
-  Clayton - decommission old servers as soon as possible
 
-  Memory depletion on login servers.  Reserve 10% memory for system/root/ceph use? 
-  unclear whether this is the cause or problems or symptom of the problems
 
 
 Other Issues 
 *-postgres-2004 
 
-  How much disk space does it actually need? 500 + 256 GB.  Current *-203.cloud.cs.uwaterloo.ca are adequate. 
-  problem was that one was using 2TB
 
 Avoid rebooting troubled systems (a2brenna) 
 
-  eg: ubuntu2004-????.???  last night       * makes it (nearly) impossible to diagnose after a reboot
-  If you can access the running OS at all, there are better options
-  Urgency is often an illusion
-  In the case of machines in the linux.student.cs round-robin, just take them out of the round robin
-  Lori - counter-concern - a single slow system may impact all the systems using the same filesystem
-  Lawrence - suggestion to have an agreed-upon communications channel to deal with emergency issues (eg: Teams Channel, eg: Emergencies)  
-  Anthony - has installed a method to analyze crash dumps, if crashed appropriately 
-  To crash a machine in a useful fashion that generates a dump: 'echo c > /proc/sysrq-trigger'
 
 Joining new linux hosts to AD 
 
-  higher volume of container creation and rebuilds could benefit from more automation and more authorized users 
-  Clayton has a tool that follows INF standards of nscd and kerberos
-  Anthony - interested in the part that generates a keytab file
-  Clayton: needs to be run on linux.cscf and handles the appropriate tickets
-  Lori: what about net join ads?
 
 rebooting INF machines 
 
-  last day of exams is Dec 23rd
-  either Dec 28th (Tues) or 29th (Wed)
-  Anthony and Guoxiang will reboot on the 29th
-  Lawrence to send out email to SCS everybody starting at time 1pm, expected end time 5pm? 
-  all CS Teaching and CS General, plus other services
 
 Last meeting Action items 
 
-  Anthony/Adrian - work on new postfix recipe to have servers send mail out directly
-  Anthony/Guoxiang - initiate warranty - RT#1196085 - under warranty until 2025-03-30
-  Devon - collect power data to show Plant Operations 
-  yes
-  still doing 5V deviance
-  someone to communicate with PlantOps?
 
-  Lori - Possibly recover 960GB Optane cards from 422 systems? 
-  probably more use in a database server
 
-  Guoxiang/Lori - create or report# ticket for Storage option catalogue
-  Omar - create ticket(s) for VScode/git workflow 
-  no RT, but Nick is discussing with faculty
 
--++ This meeting Action Items 
-  Anthony/Adrian - work on new postfix recipe to have servers send mail out directly
-  Anthony/Guoxiang - initiate warranty - RT#1196085 - under warranty until 2025-03-30
-  Lawrence to create a ticket for Devon with a big list of Host Groups and contacts - needs to be done manually by Devon
-  RT#1194157 - /opt/csw 
-  Guoxiang - tell us the size of the filesystem
-  Nathan - create an appropriate CephFS
-  Clayton - create the NFS share
 
-  Devon - create ticket to document power issues to show Plant Operations
-  Lori/Nathan - consider whether we can recover 960GB Optane cards from 422 systems
-  Guoxiang/Lori - create or report# ticket for Storage option catalogue (low priority)
-  Clayton - document process of adding hosts to AD and move to a generally accessible place - create a ticket
-  Lawrence to send out email to SCS everybody starting at time 1pm, expected end time 5pm