Meeting: 2016-06-09 13:00
Attended: ldpaniak (project manager), a2brenna, cscflab, gxshen, hchotara
Objectives:
- Decide on details of DFS configuration, software and hardware.
- Learn more about existing network filesystems in use in CS and Math.
Already under work:
- high-speed dedicated storage network (Devon/Dan/Lori)
- Options for active-active NFS service from DFS (cscflab-Nathan)
- (Re-)Build of Ceph cluster with latest version on Ubuntu 16.04 (a2brenna)
To be determined:
- Expected typical/max load of FileShare/OwnCloud gateway in production - how much hardware do we need for active/passive configuration? Containers OK?
- Stats of filesize distribution for current CS NetApp usage (gxshen). Apparently lots of small files in student environment.
Discussion:
- Review of Math DFS work with hchotara. Some investigation into Ceph.
- Latest Math NetApp purchase: 2 controller heads + 24xSSD read accelerator ~$186k.
- Math is not currently using Kerberized NFS. Using SMB/CIFS.
- One limitation of NetApp in past was file indexing at logon for Apple clients. Should aim for DFS here to be able to handle ~30 workers of random reads at reasonable performance.
- Dedup saves 30% capacity for CS environment.
- No known file share/sync option support client-side encryption for data. Client data will be readable in principle at service host before encryption and storage on backing media (DFS).
- Discussion of Ceph configuration on current test cluster:
- three nodes: each with OSD(backing storage) as single volume from RAID controller, MDS, monitor daemon.
- Placement map specified in configuration gives redundancy characteristics for a Ceph pool (~filesystem). This produces a crush map for distribution of data on OSDs. Can have different redundancy characteristics for different pools. Unclear if distribution of data can change autonomously in unexpected ways.
- Ceph characteristics: no compression, no dedup, yes to snapshots.
Decisions:
- Gluster and Ceph options for software layer here will use the same backing hardware. Proceed with quote procurement for following spec:
3x Large storage servers
each
SSG-6048R-E1CR36L
2x Intel E5-2640v4 CPU
8x 32GB DDR4 ECC RDIMM
2x Intel S3510 240GB SSD
MCP-220-82609-0N rear drive kit
36x 6TB SAS2 7200RPM 3.5" HDD (eg. Seagate ST6000NM0034)
Mellanox dual-port 50GbE x16 MCX416A-GCAT
2x 2m 50GbE QSFP28 cables MCP1600-C002
3yr depot warranty (5yr option)
Spares (as less expensive):
SSG-6048R-E1CR36L barebone chassis
or
2x PSU
1x motherboard
1x front drive backplane
1x rear drive backplane
Additionally
2x sticks of RAM as above
4x HDD as above
- DFS block product/LUKS at service/filesystem at service is block device model. a2brenna has demonstrated resizing (grow) at each level for RBD/LUKS/BTRFS.
- All end-user services will be mediated by systems/containers/VMs attached to the DFS 40GbE ring network. No direct end-user access to DFS products will be provided.
- Mediating services decouples service upgrades, maintenance, security and configuration from DFS core functionality. This modularity will be invaluable as the number and type of services using DFS products increases.
- Mediating services allows for per-service level of data encryption and isolation.
Next meeting:
2016-06-17-1100
Deliverables:
cscflab (Nathan Fish): Investigate options for multi-server (parallel) NFS system that utilizes DFS products (block, glusterfs, cephFS, etc) to provide Kerberized NFS to Math/CS client systems.
a2brenna: Build latest Ceph system on 16.04 with three nodes. Demonstrate cluster configuration at command line including features (eg. add/remove OSD etc.)
ldpaniak: Use gluster 3.8 to investigate DFS features with NSR
https://www.gluster.org/community/roadmap/4.0/. Understand how to recover/restore encrypted
OwnCloud service data per-user from backing storage snapshots.
--
LoriPaniak - 2016-06-09