Linux Working Group

Meeting Date

  • TEAMS: 2023-02-22


  • Anthony (group leader), Clayton, Guoxiang, Lori, Fraser, Devon, Nathan, Nick, Todd, Dave, O


  • Anthony, Guoxiang, Clayton, Lori, Devon, Todd, O, Fraser

Review and accept previous meeting minutes.

Review last meeting's Action Items

New Items

Ceph troubleshooting:


  • CERN paper attached below: one bad disk can ruin cluster performance, various tunings, erasure coding configs
    • Annotations... We don't use erasure coding;
    • OSDs are underperforming because of configuration; hardware itself is performing as expected
    • OSD queues have been observed to be idle most of the time.
    • The vast majority of requests, from general use host clients to MDS layer, are file locks/unlocks.
  • High caps (over 200k) on MDS leads to poor performance:
    • This is a rare occurrence; journaling latencies due to large dumps as opposed to "amortized" dumps ?


(Possible) Actions (Currently waiting on cluster to heal/data collection)

  • Increase number of pgs per (cs-teaching) metadata pool (a2brenna)
  • Increase mds_recall_max_caps size (improve cap recovery from clients) (nfish)
  • objecter_inflight_op_bytes to 10485760000 (See attached CERN paper)


Sick MDS
  • cs-teaching/high load filesystems damage MDS affecting their ability to recall/process caps (and IO)
  • Concentrated load can drive caps (demand outstripping recall?) count on MDS leading to problems? cap velocity/acceleration. See daily caps load ticket above
  • Single MDS home directory: Assign mds.11 on cs-teaching to /u6/ldpaniak. Consistent, excellent performance even at the same time when other cs-teaching users on same client machine see very poor performance]]

Insufficient parallelization of MDS OSD workload

Networking latency mimics OSD drive failure
  • System to system pings around HS100 ring can show multi-millisec times (typical high-performance networking ping time are hundred microsec):
  • As mentioned in CERN paper, a single bad OSD device can "caus[e] small IO requests to take longer than 2s on average"
  • Is intermittent high-latency networking impacting the cluster the same way as a (room full of) failing OSD device(s)?

Filesystem activity on one client (ubuntu-2004-012) can be correlated with OSD latency?
@ubuntu2004-012%rch)`xf': date && time tar xf gcc-9.1.0.tar
Tue 21 Feb 2023 11:16:20 PM EST

real	2m48.426s
user	0m0.718s
sys	0m7.031s
@ubuntu2004-012%rch)`rm': date && time rm -rf g^C-12.2.0
@ubuntu2004-012%)`9.1.0': date && time rm -rf gcc-9.1.0
Tue 21 Feb 2023 11:20:36 PM EST

real	2m28.312s
user	0m0.239s
sys	0m5.214s
@ubuntu2004-012% date && tdate && time tar xf gcc-12.2.0.tar^C
@ubuntu2004-012% date && time tar xf gcc-12.2.0.tar 
Tue 21 Feb 2023 11:37:14 PM EST

real	3m21.589s
user	0m1.218s
sys	0m9.050s
@ubuntu2004-012% date && t^C
@ubuntu2004-012% date && time rm -rf gcc-12.2.0
Tue 21 Feb 2023 11:43:25 PM EST

real	1m49.031s
user	0m0.230s
sys	0m6.318s

@ubuntu2004-002% date && tdate && time tar xf gcc-9.1.0.tar
Tue 21 Feb 2023 11:31:17 PM EST

real	0m35.288s
user	0m0.492s
sys	0m4.670s
@ubuntu2004-002%rch)`rm': date && time rm -rf gcc-9.1.0
Tue 21 Feb 2023 11:35:16 PM EST

real	0m53.688s
user	0m0.139s
sys	0m3.240s


Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf s41781-021-00071-1.pdf r1 manage 2075.6 K 2023-02-21 - 14:20 LoriPaniak  
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2023-02-22 - OmNafees
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback