Linux Working Group
Meeting Date
Invited
- Anthony (group leader), Clayton, Guoxiang, Lori, Fraser, Devon, Nathan, Nick, Todd, Dave, O
Attendees
- Anthony, Guoxiang, Clayton, Lori, Devon, Todd, O, Fraser
Review and accept previous meeting minutes.
Review last meeting's Action Items
New Items
Ceph troubleshooting:
References
- CERN paper attached below: one bad disk can ruin cluster performance, various tunings, erasure coding configs
- Annotations... We don't use erasure coding;
- OSDs are underperforming because of configuration; hardware itself is performing as expected
- OSD queues have been observed to be idle most of the time.
- The vast majority of requests, from general use host clients to MDS layer, are file locks/unlocks.
- High caps (over 200k) on MDS leads to poor performance: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/
- This is a rare occurrence; journaling latencies due to large dumps as opposed to "amortized" dumps ?
Observations
(Possible) Actions (Currently waiting on cluster to heal/data collection)
- Increase number of pgs per (cs-teaching) metadata pool (a2brenna)
- Increase mds_recall_max_caps size (improve cap recovery from clients) (nfish)
- objecter_inflight_op_bytes to 10485760000 (See attached CERN paper)
Hypotheses
Sick MDS
- cs-teaching/high load filesystems damage MDS affecting their ability to recall/process caps (and IO)
- Concentrated load can drive caps (demand outstripping recall?) count on MDS leading to problems? cap velocity/acceleration. See daily caps load ticket above
- Single MDS home directory: Assign mds.11 on cs-teaching to /u6/ldpaniak. Consistent, excellent performance even at the same time when other cs-teaching users on same client machine see very poor performance https://rt.uwaterloo.ca/Ticket/Display.html?id=1273593#txn-31548531]]
Insufficient parallelization of MDS OSD workload
Networking latency mimics OSD drive failure
- System to system pings around HS100 ring can show multi-millisec times (typical high-performance networking ping time are hundred microsec): https://rt.uwaterloo.ca/Ticket/Display.html?id=1270078#txn-31490589
- As mentioned in CERN paper, a single bad OSD device can "caus[e] small IO requests to take longer than 2s on average"
- Is intermittent high-latency networking impacting the cluster the same way as a (room full of) failing OSD device(s)?
Filesystem activity on one client (ubuntu-2004-012) can be correlated with OSD latency?
@ubuntu2004-012%rch)`xf': date && time tar xf gcc-9.1.0.tar
Tue 21 Feb 2023 11:16:20 PM EST
real 2m48.426s
user 0m0.718s
sys 0m7.031s
@ubuntu2004-012%rch)`rm': date && time rm -rf g^C-12.2.0
@ubuntu2004-012%)`9.1.0': date && time rm -rf gcc-9.1.0
Tue 21 Feb 2023 11:20:36 PM EST
real 2m28.312s
user 0m0.239s
sys 0m5.214s
@ubuntu2004-012% date && tdate && time tar xf gcc-12.2.0.tar^C
@ubuntu2004-012% date && time tar xf gcc-12.2.0.tar
Tue 21 Feb 2023 11:37:14 PM EST
real 3m21.589s
user 0m1.218s
sys 0m9.050s
@ubuntu2004-012% date && t^C
@ubuntu2004-012% date && time rm -rf gcc-12.2.0
Tue 21 Feb 2023 11:43:25 PM EST
real 1m49.031s
user 0m0.230s
sys 0m6.318s
@ubuntu2004-002% date && tdate && time tar xf gcc-9.1.0.tar
Tue 21 Feb 2023 11:31:17 PM EST
real 0m35.288s
user 0m0.492s
sys 0m4.670s
@ubuntu2004-002%rch)`rm': date && time rm -rf gcc-9.1.0
Tue 21 Feb 2023 11:35:16 PM EST
real 0m53.688s
user 0m0.139s
sys 0m3.240s
Comments