Meeting Date

  • TEAMS: 2024-02-21

Invited

Anthony (group leader), Lori, Dave, O, Clayton, Guoxiang, Nathan, Nick, Todd, Ed, Devon, Gwen

Attendees

  • Anthony, Dave, Clayton, Guoxiang, Nathan, Nick, Ed, Devon, Gwen

Review and accept previous meeting minutes.

CsLWGMeeting20240207

Review last meeting's Action Items

Ongoing problems with Inventory and IPAM are hobbling Infrastructure operations

  • Will bring up Management's failure to honour staffing commitments of existing critical services at CSCF Staff meeting

Ongoing problems with Ganesha service RT#1303795

  • Production instance of Ganesha now running version 5.6+ stable
    • Standby instance now also upgraded
  • Numerous independent bugs, including possible cause of previously noted tendency to hang
  • New monitoring dashboard
  • Debug symbols
  • This service (using NFS-Ganesha) has a tendency to hang (i.e mounts disappear on clients, and the server refuses new connections)
    • New evidence suggests this causality is backwards and that ganesha distress causes blocked operations on CephFS clients (a2brenna)
      • Further evidence from new Ganesha instance
  • Only current solution is to reboot every active server in order for clients to work again
  • Yet another example of important work that went undone until it became urgent... Devon and Anthony up till 2am. Will bring up at CSCF Staff meeting.
  • Currently switching servers requires a reboot or remount
  • Will switch to m3 tonight

Monitoring Services

  • Number of false alerts is a concern.
    • Important alerts continue to be missed as a result of alert fatigue
  • Container networking will not survive a reboot
    • Delayed due to lack of staff time
  • Possible issue with communication between inventory and webserver: RT#1257072 update to DNS fields cause hang
  • Lack of Service Maintenance outside of standard working hours has been more of a problem lately.
    • Management is aware and need to review this.
      • Will bring this up at upcoming staff meeting (a2brenna)

linux.cscf.uwaterloo.ca

  • New linux.cscf.uwaterloo.ca running Ubuntu 22.04 is almost ready - soft roll out next week
    • Testing has revealed some issues, rollout delayed until they're resolved.
    • Duo policy discussion to be had with Dave
      • No Duo at this time, all CSCF staff have yubikeys
    • More delays due to security concerns

New DFSc Hardware

  • No word yet on hardware
  • Investigating what can be cobbled together with existing hardware to handle anticipated load spikes in the next month (CS 136)
  • Prioritizing this will cause delays in other work

Comments

Edit | Attach | Watch | Print version | History: r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r2 - 2024-02-21 - NathanFish
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback