DFSc Working Group



Meeting Date

  • TEAMS: 2022-03-17

Invitees - Attendees

  • Anthony, Gouxiang, Lori, Nathan, Nick, Lawrence

Review and accept previous meeting minutes.

Proposed Agenda Items

Old business

Action items for this meeting 2022-03-17

  • Nathan - create a RBD with the -thick-provision option -> Pending
  • ceph-adm upgrade on the 902s (ldpaniak/nfish)
    • no action
    • Lori will look at it and plan to have Nathan rebuild
    • pilot step to the full upgrade/rebuild
    • high-fidelity test-bed for the real system
  • Fraser - create a ticket for the plans for local storage option -> https://rt.uwaterloo.ca/Ticket/Display.html?id=1209206
    • Anthony will discuss available hardware with Fraser to determine next steps - updated ticket
  • Nathan/Lori will update Ceph configuration to no longer allow insecure connections -> Done
  • Increase number of pgs for cs-teaching: nfish -> Done * did general and considering other ones * see RT#1211370 * chopped data into smaller pieces to be distributed across the hardware * optimizer balances number of PGs, not the size of data, so can cause imbalance of data

New business

  • root_squash on octopus/suid problem: https://tracker.ceph.com/issues/42451
    • a user who gains root access can leave exploits on mounted filesystems
    • root_squash could be a solution
    • mount keys that don't allow root to write
    • in NFS interferes with root's ability to read, so need to bear that in mind
    • work-around - root access from more privileged machines, rather than general use machines
    • might not be a problem in the future if we stop using shared mounts
  • Access to DFSc performance counters, RT#1211673 dlgawley
    • Dave's not here - leave for next meeting
  • Diagnostics a2brenna - also RT#1211673
    • debug symbols
    • Anthony wants them installed
    • Lori wants to know what happens with these debug symbols?
    • Anthony: Symbolic information for debugging in stack traces, inert on the disk, takes space, but do not run in code
    • separated back in the time when disk space was very limited
    • why installed on the servers? Cannot do any meaning profiling without on the server itself
    • could maybe be used on another host if all other code lined up the same
    • but cannot do any profiling of a running system if debug symbols are not on the host itself
    • Lori - what problems are we solving?
    • Anthony - has profiled the client side and found no problems, but cannot profile the server side
    • Anthony - sampling profiler is non-invasive, unlike strace
    • Lori - current software is EOL in 6-8 weeks, is it worth doing on this version?
    • Anthony - nothing in next release notes seem to indicate major fixes, so will want to start reviewing

Upgrades

Update on status of the Ceph Dashboard RT# 973431 dlgawley

Start with 902 systems. Practice upgrades, work out bugs nfish/a2brenna/ldpaniak

Server side

  • Want to upgrade to Pacific by end of summer at the latest: strays, upgrade path, less OSD spill from RocksDB(sharding) currently:3, 30, 300GB..., mclock scheduler, graphana daemons. Octopus out of support 2022-06-01: Early May?
  • One MDS problem
  • Remove all snapshots?
  • Real downtime with low/no cluster load
  • ceph deploy deprecated: MIgrate to ceph-adm (docker containerization) then upgrade
  • splitting cs-teaching into multiple real filesystems: u0-u19(?)

Client side

  • 5.13 kernel on 2004-002,016 at this time
  • Wait to end of term -> Done
  • NFS servers to ganesha -> ctucker

Scratch(ish) drives on 211 systems (fhgunn)

  • ZFS sends for sync

Upcoming maintenance

Action items for next meeting

  • Lori will create a ticket regarding high sustained data access (500 MB/s), assign it to Dave to determine if he wants it investigated further https://rt.uwaterloo.ca/Ticket/Display.html?id=1214817
  • ceph-adm upgrade on the 902s (ldpaniak/nfish) - review and rebuild
  • Anthony will discuss available hardware with Fraser to determine next steps for purchased drives - RT#1209206
  • Install of debug symbols and profiler application bits on server side RT#1211673 -> a2brenna, nfish
Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2022-03-17 - LoriPaniak
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback