FileShareDFSProject20161118 < CF

CF Web>InternalProjects>SecureWeb-basedFile-sharingSystem-backingStorage>FileShareDFSProject20161118 (2016-11-21, LoriPaniak)

EditAttach

Meeting: 2016-11-18 DC-2102

Attendance: Guoxiang Shen, Lori D. Paniak; Nathan Fish

Agenda: Ceph on "fat" OSD, current obstacles

Discussion: Review of building OSDs on md/dm manually as outlined in: https://cs.uwaterloo.ca/cscf/internal/request_debug/UpdateRequest?106956 Script at Nov 18 from ldpaniak needs to have md and sd devices replaced by UUID. 6GB journals per OSD seen as good initial sizing.

Discussion of failure rates for 3+1R5 vs 6+2R6 OSDs. Concluded that data loss probability <0.01% over 5yrs for 3+1R5 is essentially correct.

Decided that extra effort to build system was worth it if it reduced maintenance load in production. Customization should not increase maintenance load or impede upgrade path for the system.

md-based OSD removes direct monitoring of hard drives by Ceph. It is essential that HDD health be monitored by secondary means (eg. smartd) and reported (eg. Nagios) and acted on (eg. drive replacement) in a timely manner.

Suggestion by nfish for monthly updates/reboots of DFS systems separately to test HDD spin-down problems and complications from updates. It will be useful to have a (VM) analogue of the DFS cluster where updates can be applied first.

Suggestion to use existing Ceph cluster on AMD hardware as a backup for the DFS cluster.

To do: Build md/dm-backed OSD with journal on SSD partition.

-- LoriPaniak - 2016-11-21

Topic revision: r1 - 2016-11-21 - LoriPaniak

Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.

Other Webs

My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images

Edit