DFSc Working Group



Meeting Date

  • TEAMS: 2021-11-23

Invitees - Attendees

  • Anthony, Dave, Fraser, Gouxiang, Lori, Nathan, Nick

Review and accept previous meeting minutes.

Proposed Agenda Items

Old business

global_id reclaim

https://docs.ceph.com/en/latest/security/CVE-2021-20288/

Mitigating this CVE requires upgrading userspace Ceph clients. Samba has been upgraded, only nfs-ganesha servers still need upgrading. This is the cause of these warnings:

   clients are using insecure global_id reclaim
   mons are allowing insecure global_id reclaim

Ganesha PPA: version 3.5+

saltstack formula for ganesha (dlgawley)

If NFS servers are updated, go with toggle to disallow insecure reclaim.

Update on bug report/patch for Ubuntu 5.11 kernel on ceph append bug

Apparently in 5.11.40 https://tracker.ceph.com/issues/51948 https://rt.uwaterloo.ca/Ticket/Display.html?id=1194773

Roll to 100,000 (a2brenna) EOD 2021-11-23. fhgunn to test. If OK, roll to prod thereafter.

TImeline to get client systems back to 5.11 kernel

https://icinga.cscf.uwaterloo.ca/grafana/d/03EnhXZGz/dfsc-monitoring?var-hostname=mc-3015-422.cloud.cs.uwaterloo.ca&orgId=1&from=1623347299875&to=1634149134520

See above...

Start with 004 which is down for BIOS update.

New business

ubuntu2004-012

This is a VM? Why so many old ceph processes? Network issue?

a2brenna: Highest uptime, most threads

kworkers do go away eventually...

Can we track the kworkers back to user activity?

lfolland: reboot systems regularly? a2brenna: Uptime expectations, not addressing the real issues. Does the machine really have a problem? ldpaniak: Take 012 out of pool? dlgawley has removed from round-robin.

Needs further research...

-c hostname; ps uax |grep Nov |grep msgr
ldpaniak@charon:~/Temp$ ./check-ceph-student.sh 
ubuntu2004-002
root       8097  0.0  0.0      0     0 ?        I<   Nov17   0:00 [ceph-msgr]
ldpaniak 143969  0.0  0.0   9492  3376 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr 
ubuntu2004-004
root       8175  0.0  0.0      0     0 ?        I<   Nov07   0:00 [ceph-msgr]
ldpaniak 131204  0.0  0.0   9492  3280 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr 
ubuntu2004-008
root       5641  0.0  0.0      0     0 ?        I<   Nov16   0:00 [ceph-msgr]
ldpaniak  47748  0.0  0.0   9492  3324 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr 
ubuntu2004-010
root       4669  0.0  0.0      0     0 ?        I<   Nov13   0:00 [ceph-msgr]
ldpaniak  90585  0.0  0.0   9492  3316 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr
ubuntu2004-012
root       5068  0.0  0.0      0     0 ?        I<   Nov01   0:00 [ceph-msgr]
root       6457  0.0  0.0      0     0 ?        I    Nov20   0:00 [kworker/90:2-ceph-msgr]
root      18180  0.0  0.0      0     0 ?        I    Nov20   0:00 [kworker/12:0-ceph-msgr]
root      85071  0.0  0.0      0     0 ?        I    Nov20   0:19 [kworker/91:0-ceph-msgr]
root     107070  0.0  0.0      0     0 ?        I    Nov20   0:00 [kworker/26:2-ceph-msgr]
root     122108  0.0  0.0      0     0 ?        I    Nov19   0:00 [kworker/64:0-ceph-msgr]
root     148385  0.0  0.0      0     0 ?        I    Nov20   0:13 [kworker/16:0-ceph-msgr]
root     181730  0.0  0.0      0     0 ?        I    Nov20   0:33 [kworker/15:1-ceph-msgr]
ldpaniak 205256  0.0  0.0   9492  3184 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr
root     208581  0.0  0.0      0     0 ?        I    Nov20   0:00 [kworker/65:0-ceph-msgr]
root     228965  0.0  0.0      0     0 ?        I    Nov20   0:51 [kworker/108:0-ceph-msgr]
ubuntu2004-014
root       4500  0.0  0.0      0     0 ?        I<   Nov10   0:00 [ceph-msgr]
ldpaniak  62494  0.0  0.0   9492  3220 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr
ubuntu2004-016
root       3962  0.0  0.0      0     0 ?        I<   Nov10   0:00 [ceph-msgr]
ldpaniak  98678  0.0  0.0   9492  3452 ?        Ss   22:33   0:00 bash -c hostname; ps uax |grep Nov |grep msgr
ldpaniak@charon:~/Temp$ 

Client churn

Why does number of clients on cs-teaching change all the time?

ctucker working on ganesha servers? nfish checking on mon reporting for client (dis)connects.

[https://icinga.cscf.uwaterloo.ca/grafana/d/03EnhXZGz/dfsc-monitoring?var-hostname=mc-3015-422.cloud.cs.uwaterloo.ca&orgId=1&from=1636348065232&to=1636658833526]

Time keeping on client systems

Please use CSCF NTP servers

a2brenna will investigate

ldpaniak@charon:~/Temp$ ./check-time-student.sh 
ubuntu2004-002
synchronised to NTP server (129.97.167.4) at stratum 2 
   time correct to within 30 ms
   polling server every 1024 s
ubuntu2004-004
synchronised to NTP server (129.97.167.12) at stratum 2 
   time correct to within 43 ms
   polling server every 1024 s
ubuntu2004-008
synchronised to NTP server (129.97.167.12) at stratum 2 
   time correct to within 47 ms
   polling server every 1024 s
ubuntu2004-010
synchronised to NTP server (254.173.0.178) at stratum 2 
   time correct to within 27 ms
   polling server every 1024 s
ubuntu2004-012
synchronised to NTP server (254.173.0.178) at stratum 2 
   time correct to within 32 ms
   polling server every 1024 s
ubuntu2004-014
synchronised to NTP server (24.174.107.122) at stratum 2 
   time correct to within 26 ms
   polling server every 1024 s
ubuntu2004-016
synchronised to NTP server (254.173.0.178) at stratum 2 
   time correct to within 24 ms
   polling server every 1024 s

Ceph updates

New point release out. Wait on update for now - end of term. ldpaniak to set regular schedule.

Ceph upgrade to Pacific. Best to move to containers. Only one MDS per filesystem per host.

Future configurations to evaluate

UofT export of NFS/ZFS on cluster block. Here, use RBD for backend.

Progressive testing: parts of a filesystem at a time.

Demo NFS server with upcoming hardware on RBD.

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2021-11-23 - LoriPaniak
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback