An active/active configuration is two storage systems (nodes) whose controllers are connected to each other either directly or through switches. The nodes are connected through a cluster adapter or an NVRAM adapter, which allows one node to serve data to the disks of its failed partner node. Each node continually monitors its partner, mirroring the data for each other's nonvolatile RAM (NVRAM).
In student environment we have a filer FAS3240 (shared with MFCF) cluster with two systems (nodes) fs02.student.cs and fs03-admin.student.math. Fs02 serves home directories, regional files, mails and csw software. An active/active configuration is set up for both nodes. If one of the two nodes becomes impaired, the partner node will assumes the identity of the failed node and serve its data to its clients in addition to serving its own data. After the failed one is fixed, we need to run cf giveback on its partner node to let it resumes normal operation during reboot. We will see Waiting for giveback on its console. Sometimes we see one node is taken over because it loses network connection (the switch attached to is rebooted, for example), we may need to check network cable and switch before run cf giveback.
In core.cs region, we have a FA3240 cluster (shared with MFCF as well) with two nodes fs102-mgmt.cs and fs105-admin.math. A vfiler fs102-san.cs is set up on fs102-mgmt.cs. Fs102-san.cs serves mails (including mails on maildir), regional files, csw software and home directories. The active/active configuration is set up for fs102 and fs105 too. As mentioned above, cf giveback needs to be run from its partner node when one node is rebooted after it is taken over.
An aggregate is a RAID-level collection of disks, it could contain more than one RAID group. A volume is a logical unit of storage. The disk space that a volume occupies is provided by an aggregate. There are two type of volumes, flexible and traditional volumes. The flexible volume may be grown or shrunk in size. An aggregate can contain multiple, completely independent flexible volumes. All volumes on our NetApp filers are flexible volumes.
We have only one aggregate on each of our NetApp filers fs02 and fs102. These aggregates (all named aggr0) consist of RAID-DP (Double Parity) RAID group(s). That means we could afford the failure of two disks at the same time on any filer. Usually we'll receive a replacement disk within four hours from NetApp in the case that a disk fails. We actually have some spare disks on fs02 and fs102. One can use sysconfig -r to find disk information. We could un-assign and re-assign a disk from one node to the other as below:
unassign from fs02 fs02> disk assign 0c.00.16 -s unowned assign it to fs03 fs03> disk assign 0c.00.16
Add a disk to existing aggregate by running aggr add aggrname -d disk1. For example fs02> aggr add aggr0 -d 0b.60
Don't need to turn ssh.enable off and back on on the filer.