An active/active configuration is two storage systems (nodes) whose
controllers are connected to each other either directly or through switches.
The nodes are connected through a cluster adapter or an NVRAM
adapter, which allows one node to serve data to the disks of its failed partner
node. Each node continually monitors its partner, mirroring the data for each
other's nonvolatile RAM (NVRAM).
In student environment we have a filer FAS3240 (shared with MFCF) cluster with two systems (nodes)
fs02.student.cs and fs03-admin.student.math. Fs02 serves home directories,
regional files, mails and csw software. An active/active
is set up for both nodes. If one of the two nodes becomes impaired, the
partner node will assumes the identity of the failed node and
serve its data to its clients in addition to serving its own data.
After the failed one is fixed, we need to run cf giveback
partner node to let it resumes normal operation during reboot. We will
see Waiting for giveback
on its console. Sometimes we see
one node is taken over because it loses network connection (the
switch attached to is rebooted, for example), we may need to check network
cable and switch before run cf giveback
In core.cs region, we have a FA3240 cluster (shared with MFCF as well) with two nodes
fs102-mgmt.cs and fs105-admin.math. A vfiler fs102-san.cs is set up on fs102-mgmt.cs.
Fs102-san.cs serves mails (including mails on maildir),
regional files, csw software and home directories.
The active/active configuration
is set up for fs102 and fs105 too. As mentioned above, cf giveback
be run from its partner node when one node is rebooted after it is taken over.
Aggregate and volume
An aggregate is a RAID-level collection of disks, it could
contain more than one RAID group. A volume is a logical
unit of storage. The disk space that a volume occupies
is provided by an aggregate. There are two type of volumes, flexible and
traditional volumes. The flexible volume may be grown or shrunk in size.
An aggregate can contain multiple, completely
independent flexible volumes.
All volumes on our NetApp filers are flexible volumes.
We have only one aggregate on each of our NetApp filers fs02
and fs102. These aggregates (all named aggr0) consist of RAID-DP
(Double Parity) RAID group(s). That means we could afford the failure
of two disks at the same time on any filer. Usually we'll receive a
replacement disk within four hours from NetApp in the case that a disk
fails. We actually have some spare disks on fs02 and fs102.
One can use sysconfig -r
to find disk information.
We could un-assign and re-assign a disk from one node to the other as below:
unassign from fs02
fs02> disk assign 0c.00.16 -s unowned
assign it to fs03
fs03> disk assign 0c.00.16
Add a disk to existing aggregate by running aggr add aggrname -d disk1
fs02> aggr add aggr0 -d 0b.60
Set up ssh to NetApps using public key authentication
It is convenient to log on to a filer without typing its password. Need to
to two things.
- Enable ssh on the filer, run secureadmin setup ssh
- Copy the public key of the host(s) to filer:/etc/sshd/root/.ssh/authorized_keys
Don't need to turn ssh.enable off
and back on
on the filer.
Read man page from NetApp filer
All commands from NetApp filer could be found by running
ssh fs02 \?
. To read the man page of a command, run
ssh filer man command
. For example, ssh fs02 man sysconfig
- Disable and re-enable the cluster using cf disable and cf enable from the console or ssh session.
- Disable the cluster by typing cf disable on one of the nodes.
- Type cf status to make sure the cluster is disabled.
- Type halt on each node.
- Power down both nodes.
- Using halt -f
- Type halt -f
- Power down both nodes.
- The first filer to come up may show the following status when type cf status
- file may be down, takeover disabled because that partner halted in notakeover mode.
- When the second files comes up, it will automatically try to enable the cluster.
- Power on the switches.
- Power on the drive shelves for each filer.
- Power on the heads or nodes (filers), one at a time.
- Type cf enable from console or ssh session if it is disabled