Linux Working Group

AGENDA LOCKED

AGENDA LOCKED

Invitees - Attendees

Invited: Anthony, Adrian, Guoxiang, Clayton, Lori, Fraser, Nathan, Dave, Omar, Devon, Todd, Nick, Lawrence
Adrian, Anthony (group leader), Clayton, Lori, Fraser, Devon, Nathan, Todd, Omar

Review and accept previous meeting minutes.

CsLWGMeeting20220309

Last meeting's tasks [10 minutes]

Lawrence - confirm with Daniel the maintenance of OpenDCIM
- OpenDCIM is being maintained as a project. We need to catch up on some updates, but no security issues
- will plan an update in the summer, I think
Lawrence - work with Graham to update the OpenDCIM data
- will be working with Graham on this project
Clayton - document process of adding hosts to AD and move to a generally accessible place
- keytab work
- 4.13 auto updates of Samba causing issues. Currently pinned.
- at a point where Clayton believes that people can use it * where will it be documented and where will it live?
- Clayton - have updated most keytabs, but any updates let him know
Lawrence / RSG - update jerusalem and graceland to mount new NFS share - RT#1194157
- not yet, will be working with Tom
Dave - put up Beta version of Virtual Host Index / Anthony to create a ticket - RT#1211603 -> working on it
- Dave's not here - but no new version up yet, but Anthony understands that it should be ready soon
"Dirty Pipe" - https://rt.uwaterloo.ca/Ticket/Display.html?id=1213217
- https://ubuntu.com/security/CVE-2022-0847
- Present from 5.8 onward, possibly on older kernels via backport
- Fraser: prudent to rebuild all login servers from scratch. O(hours) to rebuild all student login machines
- ctucker: reasonable to replace keytab files, invalidate existing keytabs
- Rebuild webservers and other container servers
- Appetite for risk on rebuilding servers? Question for management - RT#??? * outstanding question - no response from management * however, technical consensus is that we are not prepared to do that now * could do on a round-robin basis (one at a time) * Lori: could "cloud init" be used to start with a standard image? * Anthony: probably not - would likely be more work to maintain than it would save * would eventually like to get to a stateless setup
- Lori - setuid binaries in user filesystems? Mount a survey (per term?). Set ACL on snapshots? https://rt.uwaterloo.ca/Ticket/Display.html?id=1213685 * in upcoming Ceph version there is a "root squash" feature - add to to-do list
- Run vulnerability check on questionable systems (e.g. 5.4 kernels)
- Fraser to create a ticket for updating the Graphics Lab machines

New proposed agenda items (include name and desired time)

linux.student.cs loadaverage is much higher and more variable than in the past. (Fraser, 10 minutes)
- concern is load average is 10x what it used to be
- afternoons and evenings much high
- ratio of load average to number of cores
- processes in the "runable" state, but may be waiting for disk
- used to be that more processes were waiting for Ceph, but not so much anymore
- right now on cs-general general use hosts, a single user slamming a machine
  - Peter van Beek; correctly niced +10; Fraser talked to Peter and moved the (CPU cores and RAM intensive) research work to ugster73*
- teaching hosts - load from actual compute jobs / VS Code
- Omar - how closely does load average correlate with user response
  - Anthony: if load stays below number of cores, should not impact response
  - Fraser - would see it jump up to 200 then back to 60 - very jumpy
  - Anthony - does Icinga have process usage monitoring?
  - Devon - used to, with smtp daemon
  - Anthony - will create a ticket for monitoring processor data - will work with Devon
    - Prometheus is installed on most systems
      - exporters that provides web data that can be scraped
      - easy to use and setup
      - query language is easy to use
    - any way to get a breakdown by core? top, htop, cat /proc/loadavg (latter is low impact)
    - Fraser: do we have load average graphs? Devon - yes:
      - https://icinga.cscf.uwaterloo.ca/icingaweb2/monitoring/host/services?host=ubuntu2004-010.student.cs.uwaterloo.ca#!/icingaweb2/monitoring/service/show?host=ubuntu2004-010.student.cs.uwaterloo.ca&service=Load
      - Lawrence - any way to test timing of interactive response?
      - Devon - possibly, maybe there's a python library?
  - Fraser - would like to see a combined load average all on one screen
    - Devon - yes we can do that
    - Lori - would also like that for DFSc
    - Lori - would also like to see network usage (netstat -ai )
      - Devon - there are some stats - are they sufficient? needs to be graphed

Looking for some out-of-production hardware for OpenNebula work (Lawrence, 5 minutes)
- 2 machines - 32 GB+, SSDs
  - Devon has an R815 we can use
  - Fraser offered the ugster200s - although they have only 16GB RAM and only one 7200rpm SATA drive each (0 or all 5)
  - as an aside - CS013499 (formerly ubuntu2004-006) is dead and needs RMA

Action items

Clayton - document process of adding hosts to AD and move to a generally accessible place
Lawrence / RSG - update jerusalem and graceland to mount new NFS share - RT#1194157
Dave - put up Beta version of Virtual Host Index / Anthony to create a ticket - RT#1211603 -> working on it
Fraser to create a ticket for updating the Graphics Lab machines
Anthony - will create a ticket for monitoring processor data - will work with Devon
Fraser - create request for Devon for combined general use cpu load graph (and possibly other metrics)
Lori - create ticket for Devon to create combined graph for DFSc
Lori - create RT for graphing network statistics data
Lawrence - follow-up with SuperMicro re: RT#1079451

Topic revision: r10 - 2022-03-26 - FraserGunn

Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.

Other Webs

My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images

Edit