DFSc Working Group



Meeting Date

  • TEAMS: 2022-02-02

Invitees - Attendees

  • Anthony, Dave, Gouxiang, Lori, Nathan, Nick, Lawrence, Omar, Fraser

Review and accept previous meeting minutes.

Proposed Agenda Items

Old business

Immediate (from 2022-01-11)

  • communicate with ISG/students to use specific linux.student.cs hosts rather than the round-robin name - Nick
    • done
    • no noticeable issues recently, but unclear about before vs after . May just be low load
    • any impact of VSCode on CS-General?
      • not clear how much those machines are actually used for such work
  • create a Rados block device - Lori and Nathan * increase to 40TB * done, Anthony running tests on it * realized that should have done a bunch of random writes all over before starting for better science * initial allocations creating wildly different results * Nathan suggests that when you create the RBD to pre-allocate the whole device, option: -thick-provision * Anthony: sounds like what we want * Nathan to create a new RBD with the -thick-provision option
  • create NFS bridge - Anthony and Guoxiang -
    • have one setup, not currently exporting
    • RT#1197177
  • talk to MFCF about continued use of NetApp - Lawrence/Dave
    • done
    • they've agreed to the end of S22
    • will appreciate if we're covering the maintenance/licensing costs
    • plans for moving a bucket to the NetApp?
      • likely needs to rsync the data a couple times, unmount and remount
      • will need to coordinate with Guoxiang
    • will want to use test data for now before moving things live
    • create a shadow copy of home directories? Or just create some artificial, generated data
    • reconsider this topic at the next meeting
  • consider Ceph tweaks to current environment - Lori/Nathan
    • see also New Business
    • each /uN is in its own MDS, therefore roughly half the load for each MDS
      • may even go further (eg: cs*)
    • how many MDSes can we run?
      • currently 20 active ones
      • depends on the amount of RAM available
  • review Ceph data - Lori/Anthony
    • discussion of CAPs, loads on MDS
    • software bugs correlated with load (?)
    • updating code bases around the system as much as possible
    • Nathan has filed some bug reports over the past couple years, some useful responses
    • original design had different use-cases in mind?
    • pool rate
      • has been low recently?
  • kernel updates?
    • 5.13 is running on 002
  • communicate with faculty - Omar/Lori - forward message to this group
    • was done
    • also asked for schedule of due dates, put on cscf-away?

near future

  • move one or more home directories to Rados NFS device
  • move one or more home directories to NetApp

Changes to cs-teaching

New business

Update of ganesha to v 4.0

  • what is the schedule?
  • need a container for a test machine - Anthony
  • Clayton - create a ticket to get container made for v4 ganesha node

Local storage options for teaching login servers (fhgunn)

  • looking for machines in CS environment with 3.5" drives
  • is there a ticket? -> Fraser

nowsync ("No Write Sync") option mounts did not have appreciable improvement for rm -rf workload

Have permission to use NetApp through summer

  • noted

Update on RBD/ZFS/NFS

  • see above

No insecure connections since older NFS gateways shut down (2022-01-13). Going to enforce secure id reclaim on the cluster (ldpaniak, nfish)

  • Clayton shut down some older ganesha in the past couple weeks
  • will no longer allow insecure connections
  • Nathan/Lori will update Ceph configuration to no longer allow insecure connections

Upcoming maintenance

  • New upgrade schedule for 42x systems (ldpaniak)
    • rebooting these systems seems to be helping after an update
  • Reading Week maintenance: Feb 20-27 (nfish/ldpaniak)
    • Increase number of pgs for cs-teaching pool. Start on 2022-02-20
    • PS: Feb 22 is now a University Holiday

Action items for next meeting 2022-02-16

  • Nathan - create a RBD with the -thick-provision option
  • Clayton - create a ticket to get container made for v4 ganesha node
  • Fraser - create a ticket for the plans for local storage option
  • Nathan/Lori will update Ceph configuration to no longer allow insecure connections
  • Lori to generate a schedule for upgrading the 42x systems
Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2022-02-02 - LawrenceFolland
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback