An active/active configuration is two storage systems (nodes) whose controllers are connected to each other either directly or through switches. The nodes are connected through a cluster adapter or an NVRAM adapter, which allows one node to serve data to the disks of its failed partner node. Each node continually monitors its partner, mirroring the data for each other's nonvolatile RAM (NVRAM).
In student environment we have a filer FAS3020 cluster with two systems (nodes) fs04.student.cs and fs06.student.cs. Fs04 serves home directories and regional files. Fs06 serves mails and csw software. An active/active configuration is set up for both nodes. If one of the two nodes becomes impaired, the partner node will assumes the identity of the failed node and serve its data to its clients in addition to serving its own data. After the failed one is fixed, we need to run cf giveback on its partner node to let it resumes normal operation during reboot. We will see Waiting for giveback on its console. Sometimes we see one node is taken over because it loses network connection (the switch attached to is rebooted, for example), we may need to check network cable and switch before run cf giveback.
In core.cs region, we have a FAS2050 cluster with two nodes fs104.cs and fs106.cs. Fs104 serves home directories and mails. Fs106 serves regional files and csw software. The active/active configuration is set up for fs104 and fs106 too. As mentioned above, cf giveback needs to be run from its partner node when one node is rebooted after it is taken over.
An aggregate is a RAID-level collection of disks, it could contain more than one RAID group. A volume is a logical unit of storage. The disk space that a volume occupies is provided by an aggregate. There are two type of volumes, flexible and traditional volumes. The flexible volume may be grown or shrunk in size. An aggregate can contain multiple, completely independent flexible volumes. All volumes on our NetApp filers except vol0 on fs106.cs are flexible volumes.
We have only one aggregate on each of our NetApp filers fs04/fs06 and fs104/fs106. These aggregates (all named aggr0) consist of RAID-DP (Double Parity) RAID group(s). That means we could afford the failure of two disks at the same time on any filer. Usually we'll receive a replacement disk within four hours from NetApp in the case that a disk fails. We actually have some spare disks on fs04/fs06 and one spare disk on fs104/fs106. One can use sysconfig -r to find disk information. The spare disk on fs104/fs106 is currently owned by fs104. In case one disk fails and we cannot get a replacement in time, we can assign that spare disk to fs106 as below:
unassign from fs104 fs104> disk assign 0c.00.16 -s unowned assign it to fs106 fs106> disk assign 0c.00.16
Add a disk to existing aggregate by running aggr add aggrname -d disk1. For example fs04> aggr add aggr0 -d 0b.60
Don't need to turn ssh.enable off and back on on the filer.