Legato Networker

We use Legato to back up the majority of our systems. Here are some usage notes.

Client Setup

Checking backup results in Legato (CSCF staff)

  • login to backup.cs (or even cscf.cs will work)
  • (ssh to cscf, suw, ssh backup.cs)
  • run: nwadmin
    • note that nwadmin is an X application -- if you're running from Windows, make sure your X Server is running
  • Go to Clients -> Indexes (or just click the "Indexes" icon)
  • scroll down to the machine in question
  • Click on a file system
  • click on "Instances"
Note: For checking backups one can run nwadmin from any unix machine. Since not every one has an account on backup.cs, running nwadmin as root on backup.cs should be very careful. nwadmin is not available since the version (7.3.3).

One can run mminfo on any unix host, for example

ubuntu1204-102[48]% mminfo -s backup.cs -c m160.cs -t "3 days ago" -ot
 volume        client       date      size   level  name
77001          m160.cs     13-05-01   2 KB    full  /data2
77001          m160.cs     13-05-01   2 KB    full  /data
77001          m160.cs     13-05-01 545 GB    full  /
Disk202        m160.cs     13-05-02   4  B    incr  /data2
Disk202        m160.cs     13-05-02   4  B    incr  /data
Disk202        m160.cs     13-05-02 8345 MB   incr  /

Restore files from backups

  • Quick guide:
    • Run the command recover ( become root first if it is for users other then yourself )
      • help to get a list of available commands
      • changetime to change the time you wish to recover files from
      • ls to list files
      • cd to change directories
      • add to add files or directories for recovery
      • rel if you need to relocate where recovered files are finally restored to.
      • recover to actually do the restore after using the previously listed commands
  • See also See https://cs.uwaterloo.ca/cscf/internal/infrastructure/setups/services/backup/restore.shtml

Monitor what Legato is doing

  • on cscf.cs, run:
    • nsrwatch
It's safer to run this from the client machine.

Alternatively, you can log in to the client machine, run recover, cd to whatever directory for which you want to see information, and then type versions.

Brief Information about backups

Backup server is backup-0.cs (backup.cs). It is a Dell PowerEdge R815 running Red Hat Enterprise Linux Server release 6.3. We are running NetWorker 8.1.1. There are 4 LTO 5 tape drives in Qualstar RLS-8500 library (jukebox). Four drive devices are

     /dev/tape/by-id/scsi-350050763124cb325-nst  (drive 1)
     /dev/tape/by-id/scsi-35005076312486d24-nst  (drive 2)
     /dev/tape/by-id/scsi-350050763124c5454-nst  (drive 3)
     /dev/tape/by-id/scsi-350050763124c485f-nst  (drive 4) 

The jukebox consists of two cabinets. The top one is an expansion (slots from 1 to 114), The bottom one is the main unit (slots from 115 to 222). Four drives are in the top unit (Cabinet 1). There is one I/O port in the expansion on its left side (ports 1 - 4) and two I/O ports in the main unit, one on each side (ports 5 - 8 on the left, 9 - 12 on the right side). Each I/O has 4 ports. There is a nice touch screen to operate the jukebox. For example, to open an I/O, just touch the "open" button next to the I/O, pushing it in to close it. An operation manual of the jukebox is on the top of it. It could also be found at 511000.pdf

Sometime the jukebox may have a "Fault". Look at the information on the touch screen, fix the problem, then press "Dismiss" on the screen.There are 222 slots in the Qualstar library. Cleaning tape is in slot 222. Run nsrjb to list tapes in all slots and tapes mounted on all drives.

We do backups in four different levels: incremental, weekly, monthly and full (term end). We clone all special saves, i.e., weekly, monthly and full saves. All special saves are spread weekly long to reduce load. On any single day, we always have incremental and one of special saves are running. And we always have full save running for bootstrap every day. We've moved all incremental and weekly saves to disks for all machines.

We divide all clients in three different groups: CSCF_Default_Group, CSCF_Mac_Group and CSCF_Pc_Group. We also do backups for some machines of MFCF which are in MFCF_Default_Group. Tapes (volumes) name convention:

         
                     LTO 3 tapes or disk volumes              LTO 5 tapes
Incremental                Disk2XX
Weekly                     Disk3XX
                           35XXX                                  37XXX
Weekly Clone               45XXX                                  47XXX
Monthly                    55XXX                                  57XXX
Mothly Clone               65XXX                                  67XXX
Full                       75XXX                                  77XXX
Full Clone                 85XXX                                  87XXX 

NetApps(fs03.student.math, fs102.cs, and fs02.student.cs) are backed up once a week, a special save on Saturdays. (Backing up fs02 is much slower than backing up fs102).

Shut Down and Start NetWorker Server

Always check if any user is recovering files or backups are running before shutting down NetWorker server. Use nsr_shutdown to shut down the NetWorker server. Use /etc/init.d/networker start to start the server, and /etc/init.d/gst start to start Networker Management Console (NMC). Need to run these two commands if NetWorker server or NMC is not started after reboot the host backup.cs .

Add New Client

Use /u/gxshen/bin/{add_networker_NT_cs_client,add_networker_cs_client} to add new client depending on the architecture of the client.

Withdraw and Deposit Tapes

  • Use "nsrjb -w [ -P port ] volume" to withdraw a tape, if no port is given, it will moved to port 1. For example, if we know tape 75149 is in the left storage matrix of bottom unit (i.e., slots 115 - 168), we colud choose any port of 5 - 8, nsrjb -w -P 5 75149, then press Open button to open the I/O port, get the volume 75149 from the I/O port.
  • Open the I/O port and load the tape to the I/O port. Use nsrjb -d [-P port] -S slot_number volume or nsrjb -d [-P port] -S slot for a unlabled tape with a barcode to deposit a tape in jukebox. For example, nsrjb -d -P 7 -S 150 75147. Need to wait the following and answer yes for version 8.1.1.
Load the cartridges into the ports, and enter Yes to continue.

Inventory

We need to do inventory sometimes. Use GUI nwadmin, choose an empty drive (umount the tape in a not current used drive if no empty drive available), then click Media -> Inventory -> fill in first slot, last slot, click OK. or run nsrjb -I -f device -S first[-last_slot], where device is one of four drive devices (see above), which could be found by typing "nsrjb".

Label Tapes

A tape can be used only after it is labeled. Use nwadmin to label a tape (tapes). Choose a drive without a tape mounted on then choose Media -> Label -> Pool -> Fill in first slot, last slot, click OK or use /u/gxshen/bin/labelLto5Tapes

Note: nwadmin is not available in version 8.1.1. Need to use Networker Management Console to label tape. The password to access Network Management Console can be found at the usual place under letter B. The NetWorker Management Console can be started by launching mozilla on backup.cs and point the browser to http://backup-0.cs:8000

Reenable a Drive by Using *NetWorker Management Console

Sometime a drive is disabled or in service mode after a certain number of errors. The drive need to be reenabled before it can be used again. Check the drive first before reenabling it.

  • Click NetWorker from NetWorker Management Console to lunch Administration window.
  • In the Administration window, click Devices. The Devices detail table appears.
  • Right-click the drive to be enabled, and select Properties. The Properties window appears.
  • On the General tab, in the Status area, set Enabled to Yes.
  • Click OK.

Clone tapes in MC 3015

We store most recently clone tapes in MC 3015. Run /u/gxshen/bin/getOffsiteTapes to get a list of tapes. We usually do this once a week.

Daily Operation

on cscf.cs

  • Run nwadmin ( /software/networker-7_client/bin/nwadmin) for monitoring backups and recoveries or nsrwatch on any host (a backup client).

  • To expire tape volume(s), run /u/gxshen/bin/expireTapes on backup.cs

  • To expire disk volume(s), run /u/gxshen/bin/expireDiskVolumes
    • One will see something like "space recovered from volume Disk20X or Disk30X" from the output of nsrwatch.

Index files of these expired save sets on Disk volumes will be removed immediately. Since 'nsrim' runs only once every 24 hours, data on expired Disk volume will be removed after next 'nsrim' run. To mount a disk volume, run /u/gxshen/bin/mountdiskvolume Disk20X or Disk30X.

Troubleshooting

Backup logs are in /nsr/logs on backup.cs. /nsr/logs/message and /nsr/logs/OLD/daemon.raw.1 are more helpful. To read daemon.raw, need to use 'nsr_render_log', nsr_render_log /nsr/logs/OLD/daemon.raw.1 > /tmp/gs1 for example.

  • "Waiting for drive" on LCD display of Qualstar juskebox
Usually power-cycling the jukebox fix the problem. If a saving process is running, kill the process, then power-cycle the jukebox.

  • "Library Fault" The fault messages are
Library Faulted:
Problem in Cabinet 2 ...
Faulted while attempting to place cartridge in destination slot
Press "Dismiss" to Restart

Check the message on the touch screen of the Library, if nothing is unusual, press "Dismiss".

  • Drive in Service Mode
Go to the machine room to have a look.

If a red letter is lit on the drive, usually pressing the reset button (in the front of the drive) will fix the problem, then enabled the drive (use Networker Management Console).

If the tape is ejected from the drive, but the tape is not took out from it by NetWorker, run nsrjb to find out which tape in the drive, which slot the tape is stored; take the tape out from the drive, then put it back in; enabled the drive, and run

nsrjb -u -f device

where is is one of four drive devices (see above), which could be found by typing "nsrjb". If the above doesn't fix the problem, put the tape in the drive; shut down NetWorker, restart NetWorker; then run

nsrjb -u -f device

Usually need to do inventory for the slot(s) found from above since the version of NetWorker (7.6.4) it seems like to change slot where the tape was stored for this situation.

  • "E" code on a drive
If an orange "E" flashing on a drive, we usually lose a device file for that drive, /dev/nstN. Normally we have four

[root@backup-0 ]# ls -l /dev/nst?
crw-rw---- 1 root tape 9, 128 Sep 19 14:40 /dev/nst0
crw-rw---- 1 root tape 9, 129 Sep 25 09:59 /dev/nst1
crw-rw---- 1 root tape 9, 130 Sep 19 14:44 /dev/nst2
crw-rw---- 1 root tape 9, 131 Sep 19 14:45 /dev/nst3

If OS can still see the Library (jukebox), for example,

[root@backup-0 ]# /usr/local/sbin/atinfo -i all
[...]

B:T:L Vendor Product / Rev Type Capacity
Serial Number
--------- -------- ---------------- ---- ------- ----------
0:1:0 IBM ULTRIUM-HH5 D2A1 Tape N/A
SN:9068051328
0:1:1 QUALSTAR RLS-85 007D Changer N/A
SN:213034410
0:2:0 IBM ULTRIUM-HH5 D2A1 Tape N/A
SN:1068074057
0:3:0 IBM ULTRIUM-HH5 D2A1 Tape N/A
SN:1068073885

The "QUALSTAR" line is the jukebox. We missed one drive from the above output. We could go to the back of the jukebox, slide out and back in the drive that has an "E" error. The above output of "ls -lt /dev/nst?" shows that /dev/nst1 has a newer time stamp, that is the result after sliding out and back in. If OS cannot see the jukebox, go ahead to slide out and in the drive in question, if stil cannot see the jukebox, we need to power-cycle the jukebox, and restart NetWorker.

  • cannot access the hardware

If we see this error message from the output of 'nsrwatch', wait for 2 minutes, usually it will be followed by a message like "Hardware status of jukebox changed from 'cannot access the hardware' to 'ready' ". Check the logs /nsr/logs/message. That message appears when we open an I/O port, or library cleans a tape drive. At situations the real error happens, if OS can see all drives and jukebox, restart NetWorker. If cannot see any drive or If could see all drives but not jukebox (see above), need to power-cycle the jukebox, and restart NetWorker. The jukebox uses the first drive for the library interface. After setting the drive 1 to read only, we never need to power-cycle the jukebox when we see "cannot access hardware" message.

  • Waiting for more available space

We use 'adv_file' type for Disk volumes, save sets will not be written continuously from one Disk volume to another. In case seeing message like '(alert) Waiting for more available space on filesystem /jet_dsik1 for device /jet_disk1', run /u/gxshen/bin/deleoldsavesets, or kill the the save process on backup.cs. Mark the volume as full (nsrmm -o full -y Diskvolume), and mount another available Disk volume (look for hints from /u/gxshen/bin/mountdiskvolume) and run a make up save for aborted clients (found from logs or emails).

Please see phone numbers at the end of 'Upgrade NetWorker Server' for hardware and EMC NetWorker support.



Upgrade Networker Server

  • Record the latest bootstrap save set ID and its associated volume label. Run mminfo -B to get this information.

  • Save a copy of the current configuration, i.e., /nsr/res

  • Save a copy of /etc/{rpc,syslog.conf}

  • Shut down the NetWorker server by running nsr_shutdown.

  • Shut down NetWorker Console server by running /etc/init.d/gst stop

  • Remove the earlier NetWorker release in the following order
  •  lgtonmc (NetWorker Management Console) 
  •  lgtoserv (server package) 
  •  lgtonode (storage node package) 
  •  lgtolicm (licensing manager package) 
  •  lgtoclnt (client package) 
  •  lgtoman (Man Pages) 
  • Install the new NetWorker release in the following order
  •  lgtoclnt 
  •  lgtonode 
  •  lgtoserv 
  •  lgtolicm 
  •  lgtoman
  •  lgtonmc 

  • Start the NetWorker daemons by running /etc/init.d/networker start.
  • Start the NetWorker Console server by running /etc/init.d/gst start

  • Check daemons nsrd, nsrexecd, nsrindexd, nsrmmdbd and nsrmmd are running.
You may need to Enter the license enabler code after upgrading To enter the license enabler code:

  • Start the NetWorker Management Console software if it is not started.
  • Open the Administration window:
    • In the Console window, click Enterprise.
    • In the left pane, click a NetWorker server in the Enterprise list.
    • In the right pane, click the application.
    • From the Enterprise menu, select Launch Application. The Administration window is launched as a separate application.
  • In the Administration window, click Configuration.
  • In the left pane, select Registrations.
  • From the File menu, select New. The Create Registration dialog box appears.
  • In the Enabler Code attribute, type the enabler code.
  • In the Name attribute, type the name of the license.
  • (Optional) In the Comment attribute, type a description of the license.
  • Click OK.
If you need to enter the enable code, you also need to get an authorization code, send an email to licensing@emc.com. After getting the code, enter the code by the above procedure. This time, you don't need to select New, just clicking the new registration created, and enter the authorization code in the auth code field.

Legato NetWorker Directive File

General Description

During backup processes, Legato NetWorker uses directives to control how particular files are to be backed up, how descendant directories are searched, and how subsequent directives are processed.

We use directives on the backup server to skip /tmp, /cdrom, /var/tmp, /mnt, and /floopy, and to back up /var/mail using mail style file locking and preserving "new mail has arrived" flag.

A .nsr directive file is parsed before any file in that directory is backed up, so any user can create a .nsr file and place it in his or her home directory (or subdirectories) to eliminate files to be backed up. A privileged user can place a .nsr file in the root directory (/) to eliminate a whole file system to be backed up. Each line of a .nsr file contains one directive. The most useful directive for a usr is skip directive. It does not back up the specified files and directories. The standard shell file pattern matching (*, [...], [!...], [x-y], ?) can be used to match file names. If a "+" precedes skip, then the directive is propagated to subdirectories.

Examples of .nsr File

A /.nsr file containing:

  << /usr/src >>
      +skip: core *.o
      +compressasm: .

will skip all files named core or * .o in /usr/src and subdirectories. And other files in the /usr/src will be compressed during backup (and will be set up for automatic decompression on recover).

The following .nsr file will skip everything in the directory (and subdirectories) it is placed in. This is useful to skip some directory used for large temporary files.

  
  << . >>
      +skip: .

Having a .nsr file containing

  << . >>
      skip: *.jpg *.gif

without the "+" sign, it will skip named *.jpg or *.gif only in the directory, not in subdirectories.

The following example will skip every thing in /toberaw and /toberaw2

  << /toberaw >>
      +skip: .?* *
  << /toberaw2 >>
      +skip: .?* *

For more information about .nsr file, please look at man pages of nsr(5), nsr_directive(5), and uasm(1). This note is based on these man pages.

Disaster Recovery

Legato NetWorker can recover a client machine and backup server as well. In case our backup serve, backup-0.cs, crashes, we need to do a disaster recovery. Currently we have our NetWorker server installed under /nsr -> /fsys/nsr.

1. Re-install operating system, we are running Red Hat Enterprise Linux Server release 6.3.We have one Jetstor disk shelf attached to backup.cs.

2. Reinstall NetWorker software. Get them from Legato web site, current version we are running is NetWorker 8.1.1. Be sure to install packages in correct order, i.e.,

  
   lgtoclnt
   lgtonode
   lgtoserv
   lgtolicm
   lgtoman
   lgtonmc

   For example
   # rpm -ivh  lgtoclnt-8.1.1.2-1.x86_64.rpm lgtonode-8.1.1.2-1.x86_64.rpm  \
       lgtoserv-8.1.1.2-1.x86_64.rpm lgtolicm-8.1.1.2-1.x86_64.rpm
   # rpm -ivh lgtoman-8.1.1.2-1.x86_64.rpm
   # rpm -ivh lgtonmc-8.1.1-1.x86_64.rpm
   # /opt/lgtonmc/bin/nmc_config 
   

Installation guide can be found at docu50625_NetWorker-8.1-SP1-Installation-Guide--.pdf

3. Start NetWorker server by running /etc/init.d/networker start

4. Configure jukeboxe, run jbconfig

  Jbconfig is running on host backup-0.cs (Linux 2.6.32-279.14.1.el6.x86_64),
  and is using backup-0.cs as the NetWorker server.

         1) Configure an AlphaStor Library.
         2) Configure an Autodetected SCSI Jukebox.
         3) Configure an Autodetected NDMP SCSI Jukebox.
         4) Configure an SJI Jukebox.
         5) Configure an STL Silo.
         6) Exit.

  which activity do you want to perform? [1] 2
  14484:jbconfig: Scanning SCSI buses; this may take a while ... 
  Installing 'Qualstar' jukebox - scsidev@6.0.0. 

  What name do you want to assign to this jukebox device? lto_jukebox

  Turn NetWorker auto-cleaning on (yes / no) [yes]? 

  The following drive(s) can be auto-configured in this jukebox:
   1> LTO Ultrium-3 @ 6.1.0 ==> /dev/nst0
   2> LTO Ultrium-3 @ 6.2.0 ==> /dev/nst1
   3> LTO Ultrium-3 @ 6.3.0 ==> /dev/nst2
  These are all the drives that this jukebox has reported.

  To change the drive model(s) or configure them as shared or NDMP drives, 
  you need to bypass auto-configure. Bypass auto-configure? (yes / no) [no] 

  Jukebox has been added successfully

  The following configuration options have been set:

  > Jukebox description to the control port and model.
  > Autochanger control port to the port at which we found it.
  > Networker managed tape autocleaning on.
  > Barcode reading to on.
  > Volume labels that match the barcodes.
  > Slot intended to hold cleaning cartridge to 132.  Please insure that a
          cleaning cartridge is in that slot
  > Number of times we will use a new cleaning cartridge to 5.
  > Cleaning interval for the tape drives to 6 months.

  You can review and change the characteristics of the autochanger and its
         associated devices using the NetWorker Management Console.

  Would you like to configure another jukebox? (yes/no) [no]
 

5. Reset jukeboxes by running nsrjb -HE

6. Recover the media database and resource configuration files.

1) Get the latest bootstrap save set ID from cscf_nw_maint@backup, currently gxshen,dlgawley,wcwince,daroloso,jjohnsto. Most likely that information is stored on a 77XXX or 87XXX tape, run "nsrjb" to find out which slot that tape is stored. then

Run nsrjb -Inv -S# -f device-name where # is the slot number you find from above and device-name is your choice, say /dev/nst1

If you cannot get bootstrap save set ID, run "scanner -B device-name" to find out.

For example, if the first and last lines of bootstrap file is

    date       time            level    ssid            file    record   volume
    ...
    05/28/2014 03:43:24 AM   full 2911212444   229       0        77057
    

2) Run mmrecov -v (for example)

      What is the name of the device you plan on using [/dev/nst0]? /dev/nst1
      Enter the latest bootstrap save set id: 2911212444
      Enter starting file number (if known) [0]: 229
      Enter starting record number (if known) [0]: 0

      Please insert the volume on which save set id  2911212444  started
      into /dev/nst0.  When you have done this, press :

      Scanning /dev/nst0 for save set  2911212444; this may take a while...
      

7. Stop the Legato server by running /etc/init.d/networker stop

8. Move res directory away, and copy res.R to res, cd /nsr; mv res res.tmp; mv res.R res

9. Restart Legato server by running /etc/init.d/networker start

10. Reset jukeboxes by running nsrjb -j lto_jukebox -HE and re-inventory nsrjb -j lto_jukebox -Iv

11. Run nsrck -L7 to recover the indexes

12. Recover /.software/local, maybe the whole xhier tree /.software if needed from backups.

13. Run a test backup and recovery to make sure the server is fully recovered.

14. If need more information, look at NetWorker_8.1_Server_Disaster_Recovery_and_Availability_Best_Practices_Guide.pdf

15. We have service contract (hardware) with Qualstar, the phone number is 1-877-444-1744 or email support@qualstar.com

16. For software support, contact EMC at 1-877-534-2867. Our site ID is 4347277

Debian / Ubuntu clients

See LinuxLegatoClientSetup.

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf 511000.pdf r1 manage 6819.7 K 2013-05-03 - 14:35 GuoxiangShen RLS-85XX Tape Library Installation and Operation Manual
PDFpdf NetWorker_8.1_Server_Disaster_Recovery_and_Availability_Best_Practices_Guide.pdf r1 manage 1551.4 K 2014-05-28 - 16:40 GuoxiangShen  
PDFpdf docu50625_NetWorker-8.1-SP1-Installation-Guide--.pdf r1 manage 2179.8 K 2014-05-28 - 16:22 GuoxiangShen  
Compressed Zip archivezip itmpt-5.07.01.zip r1 manage 615.6 K 2007-07-16 - 14:39 GuoxiangShen  
PDFpdf legato_disaster_7.0.pdf r1 manage 1811.0 K 2007-07-16 - 13:30 GuoxiangShen  
Unknown file formatconf sd.conf r1 manage 1.2 K 2007-07-16 - 14:38 GuoxiangShen  
Unknown file formatconf ssd.conf r1 manage 1.1 K 2009-06-02 - 15:23 GuoxiangShen  
Edit | Attach | Watch | Print version | History: r45 < r44 < r43 < r42 < r41 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r45 - 2020-02-20 - MikeGore
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback