DFSc Working Group 
 Meeting Date 
  
 Invitees - Attendees 
 
-  Anthony,  Gouxiang, Lori, Nathan, Nick, Lawrence
 Review and accept previous meeting minutes. 
  
 Proposed Agenda Items 
 Old business 
 Action items for next meeting 2022-02-15 
 
-  Nathan - create a RBD with the -thick-provision option -> Pending
-  Clayton - create a ticket to get container made for v4 ganesha node -> Done
-  Fraser - create a ticket for the plans for local storage option -> https://rt.uwaterloo.ca/Ticket/Display.html?id=1209206  
-  Nathan/Lori will update Ceph configuration to no longer allow insecure connections -> Scheduled
-  Lori to generate a schedule for upgrading the 42x systems -> Done
 not near future 
 
-  move one or more home directories to Rados NFS device
-  move one or more home directories to NetApp
 New business 
 Increase number of pgs for cs-teaching: nfish 
    * Currently 256.  Going to 512
    * Some extra mon load.  Those on 422 systems
 Replace failing drive causing data inconsistencies/scrub errors: gxshen 
    * After pg rebalance (2022-02-23 or later)
    * GNOME VFS 
    * lsof gives funny results
 Diagnostics running? Latency per request 
@ubuntu2004-002% cat /usr/local/bin/cephfs-trace 
#!/bin/bash
bpftrace -e 'kprobe:ceph_mdsc_do_request {​​​​​​​​ @start[tid] = nsecs; }​​​​​​​​ kretprobe:ceph_mdsc_do_request /@start[tid]/ {​​​​​​​​ $duration = nsecs - @start[tid]; printf("CephFS MDS Request by %d comm: %s pid: %d tid: %d dur: %d\n", uid, comm, pid, tid, $duration); delete(@start[tid]); }​​​​​​​​' | logger
 strace:  writes on cs-teaching almost always fast.   rm -rf under load, across snapshots 
 
-  sticky on unlink:  unlinkat(4, "url", AT_REMOVEDIR
 Dynamic MDS allocation 
 
-  Need smaller changes in cap allocation
-  Spot heating for periods of time
 Upgrades 
 Server side 
 
-  Want to upgrade to Pacific by end of summer at the latest:  strays, upgrade path, less OSD spill from RocksDB(sharding) currently:3, 30, 300GB..., mclock scheduler, graphana daemons. Octopus out of support 2022-06-01: Early May?
-  One MDS problem
-  Remove all snapshots? 
-  Real downtime with low/no cluster load
-  ceph deploy deprecated: MIgrate to ceph-adm (docker containerization) then upgrade
-  splitting cs-teaching into multiple real filesystems(?) 
-  ceph-adm upgrade on the 902s
 Client side 
  
 Scratch(ish) drives on 211 systems (fhgunn) 
  
 Upcoming maintenance 
 
-  New upgrade schedule for 42x systems (ldpaniak) 
-  rebooting these systems seems to be helping after an update
 
-  Reading Week maintenance: Feb 20-27 (nfish/ldpaniak) 
-  Increase number of pgs for cs-teaching pool.  Start on 2022-02-23
-  PS: Feb 22 is now a University Holiday
 
 To do 
 
-  ceph-adm upgrade on the 902s (ldpaniak/nfish)