CF Web>NetApp (revision 5)~~EditAttach~~

NetApp Filer

NetApp Filer

Active/active configuration

An active/active configuration is two storage systems (nodes) whose controllers are connected to each other either directly or through switches. The nodes are connected through a cluster adapter or an NVRAM adapter, which allows one node to serve data to the disks of its failed partner node. Each node continually monitors its partner, mirroring the data for each other's nonvolatile RAM (NVRAM).

In student environment we have a filer FAS3020 cluster with two systems (nodes) fs04.student.cs and fs06.student.cs. Fs04 serves home directories and regional files. Fs06 serves mails and csw software. An active/active configuration is set up for both nodes. If one of the two nodes becomes impaired, the partner node will assumes the identity of the failed node and serve its data to its clients in addition to serving its own data. After the failed one is fixed, we need to run cf giveback on its partner node to let it resumes normal operation during reboot. We will see Waiting for giveback on its console. Sometimes we see one node is taken over because it loses network connection (the switch attached to is rebooted, for example), we may need to check network cable and switch before run cf giveback.

In core.cs region, we have a FAS2050 cluster with two nodes fs104.cs and fs106.cs. Fs104 serves home directories and mails. Fs106 serves regional files and csw software. The active/active configuration is set up for fs104 and fs106 too. As mentioned above, cf giveback needs to be run from its partner node when one node is rebooted after it is taken over.

Aggregate and volume

An aggregate is a RAID-level collection of disks, it could contain more than one RAID group. A volume is a logical unit of storage. The disk space that a volume occupies is provided by an aggregate. There are two type of volumes, flexible and traditional volumes. The flexible volume may be grown or shrunk in size. An aggregate can contain multiple, completely independent flexible volumes. All volumes on our NetApp filers except vol0 on fs106.cs are flexible volumes.

We have only one aggregate on each of our NetApp filers fs04/fs06 and fs104/fs106. These aggregates (all named aggr0) consist of RAID-DP (Double Parity) RAID group(s). That means we could afford the failure of two disks at the same time on any filer. Usually we'll receive a replacement disk within four hours from NetApp in the case that a disk fails. We actually have some spare disks on fs04/fs06 and one spare disk on fs104/fs106. One can use sysconfig -r to find disk information. The spare disk on fs104/fs106 is currently owned by fs104. In case one disk fails and we cannot get a replacement in time, we can assign that spare disk to fs106 as below:

    unassign from fs104
    fs104> disk assign 0c.00.16 -s unowned
    assign it to fs106
    fs106> disk assign 0c.00.16

Add a disk to existing aggregate by running aggr add aggrname -d disk1. For example fs04> aggr add aggr0 -d 0b.60

Set up ssh to NetApps using public key authentication

It is convenient to log on to a filer without typing its password. Need to to two things.

Enable ssh on the filer, run secureadmin setup ssh
Copy the public key of the host(s) to filer:/etc/sshd/root/.ssh/authorized_keys

Don't need to turn ssh.enable off and back on on the filer.

Read man page from NetApp filer

All commands from NetApp filer could be found by running ssh fs104.cs \?. To read the man page of a command, run ssh filer man command. For example, ssh fs104 man sysconfig

Power Down

Disable and re-enable the cluster using cf disable and cf enable from the console of ssh session.
- Disable the cluster by typing cf disable on one of the nodes
- Type cf status to make sure the cluster is disabled
- Type halt. Power down both nodes.
Using halt -f
- Type halt -f
- Power down both nodes.
- The first filer to come up may show the following status when type cf status
  - file may be down, takeover disabled because that partner halted in notakeover mode
- When the second files comes up, it will automatically try to enable the cluster.

Power Up

Power on the switches.
Power on the drive shelves for each filer.
Power on the filers, one at a time
Type cf enable if it is disabled

Topic revision: r5 - 2010-04-23 - GuoxiangShen

Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.

Other Webs

My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images

Edit