Meeting Date

  • TEAMS: 2024-06-12

Invited

Anthony (group leader), Lori, Clayton, Guoxiang, Nathan, Nick, Todd, Ed, Devon, Gwen

Attendees

Anthony (group leader), Nathan, Nick, Todd, Ed, Devon

Review and accept previous meeting minutes.

CsLWGMeeting20240529

Review last meeting's Action Items

linux.cscf.uwaterloo.ca

  • Still blocked on comprehensive review of authentication/authorization configuration, work stalled - need to schedule in person meetings

New DFSc Hardware

  • Has been racked
  • Network cards found and installed, cabling complete
  • Should be able to get them running this week -a2brenna

DFSc Performance

  • Snapshots blocked on cleanup of old homedirs
    • Waiting on completion of ongoing tape backup before moving and deleting approximately 9400 retired homedirs, accounting for approximately 20% of space usage in TEACHING * Need to double check that no course accounts are included in the retired homedirs * Should also email users who will have their homedirs deleted - Nick
  • Need to send message to users (login message (only applies to login shell), other?) when snapshots are re-enabled, links to appropriate end user documentation - Anthony
    • Will leverage MOTD to point to other more comprehensive documentation
    • Engage management on documentation location (Anthony)
      • Nick Lee maintains a useful page on www.cs.uwaterloo.ca about the general use TEACHING environment, collocating documentation there is an option.

Endpoint Protection

  • IST wants to deploy SentinelOne end point security agents on all devices
  • Planning test deployment to linux.student.cs (on a single node) in passive (watch and complain) mode for spring term,
    • Waiting on homedir backups to complete so we can get a clear picture of DFSc traffic caused by S1
    • IST has been informed, is generally supportive

NFS Ganesha server issues (nfs-files.student.cs)

  • CCO event ran last week in MC 3003 - https://cemc.uwaterloo.ca/contests/ccc-cco.html
  • NFS Ganesha server keeps crashing during lab sessions for the event (due to specific operation or load issues) https://rt.uwaterloo.ca/Ticket/Display.html?id=1316860
  • Anthony is collecting crash dumps (at least 3-4 bugs noticed so far)
  • Server is recovering automatically, but can take up to 5 minutes for lab machines to recover (leaving them frozen until then)
  • Analysis of crashes is blocked due to lack of staff time
  • Looking into alternatives is also blocked due to lack of staff time
    • File locking is a concern with other options
  • New NFS Ganesha v5.9 released last week, need to check changelog and see if we can/should update (currently running v5.7?)
    • We still need to build a third server anyways, we might want to look into using the third server to test the latest release.
      • New server deployed, bugs persist, more analysis required pending staff time

Future of RT

  • CSCF and MFCF to discuss - June 12 @ 1PM
    • INF proceeding with work to take on RT (and only moving CSCF + MFCF data)
      • Latest indications are that MFCF will be taking on management of the frontend as they are more heavily invested
    • 0.2-0.25 FTE time commitment
    • Have forwarded questions about details to Management so that they might be resolved in negotiations with MFCF/IST.
  • RT is currently breaking frequently, unsure of when these issues might be resolved (potentially related to recent RT upgrade)

Backups

  • Investigated using Borg, see RT: 1316774

Comments

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2024-06-12 - ToddLichty
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback