DFSc Working Group 
 Meeting Date 
  
 Invitees - Attendees 
 
-  Anthony, Dave, Fraser, Gouxiang, Lori, Nathan, Nick
 Review and accept previous meeting minutes. 
 Proposed Agenda Items 
 Old business 
 global_id reclaim 
https://docs.ceph.com/en/latest/security/CVE-2021-20288/ 
Mitigating this CVE requires upgrading userspace Ceph clients. Samba has been upgraded, only nfs-ganesha servers still need upgrading.
This is the cause of these warnings:
   clients are using insecure global_id reclaim
   mons are allowing insecure global_id reclaim
Ganesha PPA: version 3.5+
saltstack formula for ganesha (dlgawley)
If NFS servers are updated, go with toggle to disallow insecure reclaim.
 Update on bug report/patch for Ubuntu 5.11 kernel on ceph append bug 
Apparently in 5.11.40
https://tracker.ceph.com/issues/51948 https://rt.uwaterloo.ca/Ticket/Display.html?id=1194773
https://rt.uwaterloo.ca/Ticket/Display.html?id=1194773 
Roll to 100,000 (a2brenna) EOD 2021-11-23.
fhgunn to test.
If OK, roll to prod thereafter.
 TImeline to get client systems back to 5.11 kernel 
https://icinga.cscf.uwaterloo.ca/grafana/d/03EnhXZGz/dfsc-monitoring?var-hostname=mc-3015-422.cloud.cs.uwaterloo.ca&orgId=1&from=1623347299875&to=1634149134520 
See above...
Start with 004 which is down for BIOS update.
 New business 
 ubuntu2004-012 
This is a VM? Why so many old ceph processes?  Network issue?
a2brenna: Highest uptime, most threads
kworkers do go away eventually...
Can we track the kworkers back to user activity?
lfolland: reboot systems regularly?
a2brenna: Uptime expectations, not addressing the real issues. Does the machine really have a problem?
ldpaniak: Take 012 out of pool?  dlgawley has removed from round-robin.
Needs further research...
-c hostname; ps uax |grep Nov |grep msgr
ldpaniak@charon:~/Temp$ ./check-ceph-student.sh 
ubuntu2004-002
root       8097  0.0  0.0      0     0 ?        I<   Nov17   0:00 [ceph-msgr]
ldpaniak 143969  0.0  0.0   9492  3376 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr 
ubuntu2004-004
root       8175  0.0  0.0      0     0 ?        I<   Nov07   0:00 [ceph-msgr]
ldpaniak 131204  0.0  0.0   9492  3280 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr 
ubuntu2004-008
root       5641  0.0  0.0      0     0 ?        I<   Nov16   0:00 [ceph-msgr]
ldpaniak  47748  0.0  0.0   9492  3324 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr 
ubuntu2004-010
root       4669  0.0  0.0      0     0 ?        I<   Nov13   0:00 [ceph-msgr]
ldpaniak  90585  0.0  0.0   9492  3316 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr
ubuntu2004-012
root       5068  0.0  0.0      0     0 ?        I<   Nov01   0:00 [ceph-msgr]
root       6457  0.0  0.0      0     0 ?        I    Nov20   0:00 [kworker/90:2-ceph-msgr]
root      18180  0.0  0.0      0     0 ?        I    Nov20   0:00 [kworker/12:0-ceph-msgr]
root      85071  0.0  0.0      0     0 ?        I    Nov20   0:19 [kworker/91:0-ceph-msgr]
root     107070  0.0  0.0      0     0 ?        I    Nov20   0:00 [kworker/26:2-ceph-msgr]
root     122108  0.0  0.0      0     0 ?        I    Nov19   0:00 [kworker/64:0-ceph-msgr]
root     148385  0.0  0.0      0     0 ?        I    Nov20   0:13 [kworker/16:0-ceph-msgr]
root     181730  0.0  0.0      0     0 ?        I    Nov20   0:33 [kworker/15:1-ceph-msgr]
ldpaniak 205256  0.0  0.0   9492  3184 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr
root     208581  0.0  0.0      0     0 ?        I    Nov20   0:00 [kworker/65:0-ceph-msgr]
root     228965  0.0  0.0      0     0 ?        I    Nov20   0:51 [kworker/108:0-ceph-msgr]
ubuntu2004-014
root       4500  0.0  0.0      0     0 ?        I<   Nov10   0:00 [ceph-msgr]
ldpaniak  62494  0.0  0.0   9492  3220 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr
ubuntu2004-016
root       3962  0.0  0.0      0     0 ?        I<   Nov10   0:00 [ceph-msgr]
ldpaniak  98678  0.0  0.0   9492  3452 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr
ldpaniak@charon:~/Temp$ 
 Client churn 
Why does number of clients on cs-teaching change all the time?
ctucker working on ganesha servers?
nfish checking on mon reporting for client (dis)connects.
[
https://icinga.cscf.uwaterloo.ca/grafana/d/03EnhXZGz/dfsc-monitoring?var-hostname=mc-3015-422.cloud.cs.uwaterloo.ca&orgId=1&from=1636348065232&to=1636658833526
]
 Time keeping on client systems 
Please use CSCF NTP servers
a2brenna will investigate
ldpaniak@charon:~/Temp$ ./check-time-student.sh 
ubuntu2004-002
synchronised to NTP server (129.97.167.4) at stratum 2 
   time correct to within 30 ms
   polling server every 1024 s
ubuntu2004-004
synchronised to NTP server (129.97.167.12) at stratum 2 
   time correct to within 43 ms
   polling server every 1024 s
ubuntu2004-008
synchronised to NTP server (129.97.167.12) at stratum 2 
   time correct to within 47 ms
   polling server every 1024 s
ubuntu2004-010
synchronised to NTP server (254.173.0.178) at stratum 2 
   time correct to within 27 ms
   polling server every 1024 s
ubuntu2004-012
synchronised to NTP server (254.173.0.178) at stratum 2 
   time correct to within 32 ms
   polling server every 1024 s
ubuntu2004-014
synchronised to NTP server (24.174.107.122) at stratum 2 
   time correct to within 26 ms
   polling server every 1024 s
ubuntu2004-016
synchronised to NTP server (254.173.0.178) at stratum 2 
   time correct to within 24 ms
   polling server every 1024 s
 Ceph updates 
New point release out.  Wait on update for now - end of term. ldpaniak to set regular schedule.
Ceph upgrade to Pacific. Best to move to containers.  Only one MDS per filesystem per host.
 Future configurations to evaluate 
UofT export of NFS/ZFS on cluster block.  Here, use RBD for backend.
Progressive testing: parts of a filesystem at a time.
Demo NFS server with upcoming hardware on RBD.