NetApp Filer

Active/active configuration

An active/active configuration is two storage systems (nodes) whose controllers are connected to each other either directly or through switches. The nodes are connected through a cluster adapter or an NVRAM adapter, which allows one node to serve data to the disks of its failed partner node. Each node continually monitors its partner, mirroring the data for each other's nonvolatile RAM (NVRAM).

In student environment we have a filer FAS3240 (shared with MFCF) cluster with two systems (nodes) fs02.student.cs and fs03-admin.student.math. Fs02 serves home directories, regional files, mails and csw software. An active/active configuration is set up for both nodes. If one of the two nodes becomes impaired, the partner node will assumes the identity of the failed node and serve its data to its clients in addition to serving its own data. After the failed one is fixed, we need to run cf giveback on its partner node to let it resumes normal operation during reboot. We will see Waiting for giveback on its console. Sometimes we see one node is taken over because it loses network connection (the switch attached to is rebooted, for example), we may need to check network cable and switch before run cf giveback.

In core.cs region, we have a FA3240 cluster (shared with MFCF as well) with two nodes fs102-mgmt.cs and fs105-admin.math. A vfiler fs102-san.cs is set up on fs102-mgmt.cs. Fs102-san.cs serves mails (including mails on maildir), regional files, csw software and home directories. The active/active configuration is set up for fs102 and fs105 too. As mentioned above, cf giveback needs to be run from its partner node when one node is rebooted after it is taken over.

Aggregate and volume

An aggregate is a RAID-level collection of disks, it could contain more than one RAID group. A volume is a logical unit of storage. The disk space that a volume occupies is provided by an aggregate. There are two type of volumes, flexible and traditional volumes. The flexible volume may be grown or shrunk in size. An aggregate can contain multiple, completely independent flexible volumes. All volumes on our NetApp filers are flexible volumes.

We have only one aggregate on each of our NetApp filers fs02 and fs102. These aggregates (all named aggr0) consist of RAID-DP (Double Parity) RAID group(s). That means we could afford the failure of two disks at the same time on any filer. Usually we'll receive a replacement disk within four hours from NetApp in the case that a disk fails. We actually have some spare disks on fs02 and fs102. One can use sysconfig -r to find disk information. We could un-assign and re-assign a disk from one node to the other as below:

    unassign from fs02
    fs02> disk assign 0c.00.16 -s unowned
    assign it to fs03
    fs03> disk assign 0c.00.16
   

Add a disk to existing aggregate by running aggr add aggrname -d disk1. For example fs02> aggr add aggr0 -d 0b.60

Set up ssh to NetApps using public key authentication

It is convenient to log on to a filer without typing its password. Need to to two things.
  • Enable ssh on the filer, run secureadmin setup ssh
  • Copy the public key of the host(s) to filer:/etc/sshd/root/.ssh/authorized_keys

Don't need to turn ssh.enable off and back on on the filer.

Read man page from NetApp filer

All commands from NetApp filer could be found by running ssh fs02 \?. To read the man page of a command, run ssh filer man command. For example, ssh fs02 man sysconfig

Power Down

  • Disable and re-enable the cluster using cf disable and cf enable from the console or ssh session.
    • Disable the cluster by typing cf disable on one of the nodes.
    • Type cf status to make sure the cluster is disabled.
    • Type halt on each node.
    • Power down both nodes.
  • Using halt -f
    • Type halt -f
    • Power down both nodes.
    • The first filer to come up may show the following status when type cf status
      • file may be down, takeover disabled because that partner halted in notakeover mode.
    • When the second files comes up, it will automatically try to enable the cluster.

Power Up

  • Power on the switches.
  • Power on the drive shelves for each filer.
  • Power on the heads or nodes (filers), one at a time.
  • Type cf enable from console or ssh session if it is disabled

Topic revision: r8 - 2012-06-08 - GuoxiangShen
Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.


Edit

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2014 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback