Meeting: 2017-01-06 DC-2102

Attendance: Guoxiang Shen, Lori D. Paniak, Nathan Fish, Devon Merner

Agenda:

Discussion of remaining tasks:

  • Filesystem
  • Containers/wiring
  • Salt
  • Documentation
  • Testing

Discussion:

Ceph:

  • difficulty with remounting OSDs after restart recovery after power cut on DFS system. Requires use of mount_start_osd script - usually several applications. gxshen will add logic to script to wait for md devices to come up before starting OSDs.
  • questions about backing up OSD keys (later lead to realization OSD encryption keys are stored in plaintext on device(s) they encrypt...)
  • would like to a command to show master ceph mon/mds for monitoring.
  • ldpaniak to reset DFS system with ceph at idle and look for time for ceph to recover (only a few seconds - as expected).

Containers/wiring:

  • dmerner has 10Gbit SFP cable to finish hardware install. Should be complete by end of 2017-01-06.
  • Discussion of network configuration and interface names in *-211 container hosts.
  • Ideally looking for trunk connection for 10/40GbE interfaces to bridges on container hosts for maximum flexibility of (future) configuration and support of new containers/services.

Salt:

  • salt configurations for container hosts, nextcloud and haproxy containers essentially ready to go.
  • salt configuration for DFS *421 systems to be solidified on ceph cluster rebuild.

Monitoring:

  • do UPSes support delayed start to reduce chance of power loss/return/loss bounce?
  • need to saltify all monitoring configurations for DFS, hosts, containers and services.
  • possibly add hidden file at root of ceph mount to check for mount. Nagios test to write to this file to check for cephfs health.

Documentation:

Other:

  • Moving production start date back to 2017-01-23
  • Need a name for system. Need to take into account latency in certificate procurement to meet deadlines.
  • possibility of using ceph test cluster hardware for backup. gxshen estimates tape backup can hold O(10TB) of DFS data. Would likely use old hardware to build cold storage cluster with minimal performance and maximum redundancy to offset old hardware.

To do:

  • gxshen+ldpaniak will rebuild DFS/ceph from scratch by 2017-01-11. Need preseed and salt configuration details. https://cs.uwaterloo.ca/cscf/internal/request_debug/UpdateRequest?108628
  • ldpaniak to review journal sizing requirements in light of 18TB OSDs in our configuration.
  • ldpaniak needs to change 40GbE switch trunking to support
  • dmerner to look into UPS delayed start and UPS for switching. List of relevant UPSes.
  • ldpaniak to check reboot/OSD restart times compared to ceph "no out" limits. (Default "no out" is 300s - within time for DFS reboot (150s) and OSD restart (120s) ).
  • ldpaniak and dmerner to RMA current bad DFS HDD.
  • ldpaniak to review encryption options to ride on top of Nextcloud eg.: veracrypt and arq
  • nfish to look at quotas in Nextcloud for users.
  • ldpaniak to look at quotas in ceph and possibility of adding additional ceph filesystems.

-- LoriPaniak - 2017-01-07

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2017-01-07 - LoriPaniak
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback