Meeting: 2016-11-18 DC-2102

Attendance: Guoxiang Shen, Lori D. Paniak; Nathan Fish

Agenda: Ceph on "fat" OSD, current obstacles

Discussion: Review of building OSDs on md/dm manually as outlined in: https://cs.uwaterloo.ca/cscf/internal/request_debug/UpdateRequest?106956 Script at Nov 18 from ldpaniak needs to have md and sd devices replaced by UUID. 6GB journals per OSD seen as good initial sizing.

Discussion of failure rates for 3+1R5 vs 6+2R6 OSDs. Concluded that data loss probability <0.01% over 5yrs for 3+1R5 is essentially correct.

Decided that extra effort to build system was worth it if it reduced maintenance load in production. Customization should not increase maintenance load or impede upgrade path for the system.

md-based OSD removes direct monitoring of hard drives by Ceph. It is essential that HDD health be monitored by secondary means (eg. smartd) and reported (eg. Nagios) and acted on (eg. drive replacement) in a timely manner.

Suggestion by nfish for monthly updates/reboots of DFS systems separately to test HDD spin-down problems and complications from updates. It will be useful to have a (VM) analogue of the DFS cluster where updates can be applied first.

Suggestion to use existing Ceph cluster on AMD hardware as a backup for the DFS cluster.

To do: Build md/dm-backed OSD with journal on SSD partition.

-- LoriPaniak - 2016-11-21

Topic revision: r1 - 2016-11-21 - LoriPaniak
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback