Legato Networker (Migrated to: https://uwaterloo.atlassian.net/wiki/x/T4FcUwo)

We use Legato to back up the majority of our systems. Here are some usage notes.

Client Setup

Checking backup results in Legato

Run mminfo on any unix host, for example
@ubuntu2204-102[175]% mminfo -s backup.cs -c m160.cs -t "3 days ago" -ot
 volume        client       date      size   level  name
Disk302        m160.cs   02/21/2024  28 GB    incr  /
Disk303        m160.cs   02/22/2024  27 GB    incr  /
Disk303        m160.cs   02/23/2024  28 GB    incr  /
@ubuntu2204-102[176]% 

Restore files from backups

  • Quick guide:
    • Run the command recover ( become root first if it is for users other then yourself )
      • help to get a list of available commands
      • changetime to change the time you wish to recover files from
      • ls to list files
      • cd to change directories
      • add to add files or directories for recovery
      • rel if you need to relocate where recovered files are finally restored to.
      • recover to actually do the restore after using the previously listed commands
  • See also See https://cs.uwaterloo.ca/cscf/internal/infrastructure/setups/services/backup/restore.shtml

Monitor what Legato is doing

  • on linux.cscf or linux.cs, run:
    • nsrwatch
It's safer to run this from the client machine.

Alternatively, you can log in to the client machine, run recover, cd to whatever directory for which you want to see information, and then type versions.

Brief Information about backups

Backup server is backup-0.cs (backup.cs). It is a Dell PowerEdge R815 running Red Hat Enterprise Linux Server release 8 We are running NetWorker 19.8. There are 4 LTO 8 tape drives in Qualstar Q80 library (jukebox). Four drive devices are

     /dev/tape/by-id/scsi-35000e111ce1930b5-nst   (drive 1)
     /dev/tape/by-id/scsi-35000e111ce1930bf-nst   (drive 2)
     /dev/tape/by-id/scsi-35000e111ce1930c9-nst   (drive 3)
     /dev/tape/by-id/scsi-35000e111ce1930d3-nst   (drive 4) 

The jukebox consists of two Modules. The top one is an expansion (Module 2, slots from 71 to 150), The bottom one is the base (Module 1, slots from 1 to 70). Four drives are in the base unit. There is one Mailslot enabled in the Module 1 on its right side, ports 1 - 10, numbered from bottom to top. Need an account to log onto Q80 to operate the jukebox either remotely or using the touch screen. For example, to open mailslot, just touch/click the "Operation", then touch/click "Open Mailslot", pull the handle after Mailslot is unlocked. After loading tape(s) to Mailslot, push it back to close it. An operation manual of the jukebox could be found at Qualstar_Q80_Manual.pdf

We do backups in 3 different levels: incremental, Cumulative incremental and full. Incremental saves are backed up to disks daily, Cumulative incremental (every 2 weeks) and full (each term) are backed up to tapes. Cumulative incremental or full saves are spread weekly long to reduce load. And we always have full save running for bootstrap every day.

Shut Down and Start NetWorker Server

Always check if any user is recovering files or backups are running before shutting down NetWorker server. Use systemctl stop networker or nsr_shutdown to shut down the NetWorker server. Use systemctl start networker to start the server, and systemctl stop/start gst to stop/start Networker Management Console (NMC). Need to run these two commands if NetWorker server or NMC is not started after reboot the host backup.cs .

Add New Client

Use NMC or /u/gxshen/bin/add_networker_cs_client to add new client.

Withdraw and Deposit Tapes

  • Use "nsrjb -w [ -P port ] volume" to withdraw a tape to Mailslot. For example, nsrjb -w -P 10 79001 to withdraw tape 79001 to port 10 (very top one slot in Mailslot), then open Mailslot to get the volume 79001.
  • Open the Mailslot and load the tape to port(s) in Mailslot. Use nsrjb -d [-P port] -S slot_number volume or nsrjb -d [-P port] -S slot for a unlabled tape with a barcode to deposit a tape in jukebox. For example, nsrjb -d -P 7 -S 72 79001. Need to wait the following and answer yes for version 19.8.
Load the cartridges into the ports, and enter Yes to continue.

Inventory

We need to do inventory sometimes. Use GUI nwadmin, choose an empty drive (umount the tape in a not current used drive if no empty drive available), then click Media -> Inventory -> fill in first slot, last slot, click OK. or run nsrjb -I -f device -S first[-last_slot], where device is one of four drive devices (see above), which could be found by typing "nsrjb".

Label Tapes

A tape can be used only after it is labeled. Use nwadmin to label a tape (tapes). Choose a drive without a tape mounted on then choose Media -> Label -> Pool -> Fill in first slot, last slot, click OK.

Need to use Networker Management Console to label tape. The password to access Network Management Console can be found at the usual place under letter B. The NetWorker Management Console can be started by launching mozilla on backup.cs and point the browser to https://backup-0.cs:9000

Reenable a Drive by Using *NetWorker Management Console

Sometime a drive is disabled or in service mode after a certain number of errors. The drive need to be reenabled before it can be used again. Check the drive first before reenabling it.

  • Click NetWorker from NetWorker Management Console to lunch Administration window.
  • In the Administration window, click Devices. The Devices detail table appears.
  • Right-click the drive to be enabled, and select Properties. The Properties window appears.
  • On the General tab, in the Status area, set Enabled to Yes.
  • Click OK.

Routine Operation

  • Run nsrwatch on any backup client (ubuntu2204-102.cs for example) for monitoring backups and recoveries.

  • Check backup notifications
    • Backup logs are in /nsr/logs on backup, older ones are under /nsr/logs/OLD. policy_notifications.log and /nsr/logs/OLD/daemon.raw.1 are more helpful. To read daemon.raw, need to use 'nsr_render_log',
nsr_render_log /nsr/logs/OLD/daemon.raw.1 > /tmp/gs1 for example.

  • Check if there are enough tapes in the jukebox using "nsrjb -C -v |egrep -v full |egrep 'volume|no'"
        slot  volume                                     used  pool            barcode   volume id        recyclable
           1: AYV480L8                                   100%  SCS Full Pool   AYV480L8  3443852592       no
           2: AYV481L8                                   100%  SCS Full Pool   AYV481L8  3427075543       no
           3: AYV482L8                                    24%  SCS Full Pool   AYV482L8  174130262        no
           4: AYV483L8                                     0%  SCS Full Pool   AYV483L8  157353236        no
           5: AYV484L8                                     0%  SCS Full Pool   AYV484L8  3469543210       no
           6: AYV485L8                                     0%  SCS Full Pool   AYV485L8  3486320263       no
           7: AYV486L8                                     0%  SCS Full Pool   AYV486L8  3452766162       no
          22: AYX199L8                                     0%  DFSc Full Pool  AYX199L8  1758949483       no
          23: AYX198L8                                     0%  DFSc Full Pool  AYX198L8  1775726507       no
          24: AYX196L8                                     0%  DFSc Full Pool  AYX196L8  847634421        no
          40: AYX195L8                                   100%  DFSc Full Pool  AYX195L8  864411375        no
         115: 79083L8                                     52%  SCS Full Pool   79083L8   3460629610       no

LTO 8 tape delivers 12 TB native capacity and up to 30 TB of compressed capacity. A 100% used tape shows that at least 12 TB of data are written to the tape. 'mminfo' is handy to check how full a tape is

 [gxshen@backup-0 ~]$ mminfo -mv -q volume=AYV482L8
          state volume                  written  (%)  expires     read mounts capacity volid      next type
                AYV482L8                2823 GB  24% 2032-03-15   0 KB     2     12 TB 174130262   391 LTO Ultrium-8

  • To expire disk volume(s), run /u/gxshen/bin/expireDiskVolumes
    • One will see something like "space recovered from volume Disk30X" from the output of nsrwatch.

Index files of these expired save sets on Disk volumes will be removed immediately. Since 'nsrim' runs only once every 24 hours, data on expired Disk volume will be removed after next 'nsrim' run. To mount a disk volume, run /u/gxshen/bin/mountdiskvolume Disk30X.

Troubleshooting

Check logs for any hint

  • "Waiting for drive" on LCD display of Qualstar juskebox
Usually power-cycling the jukebox fix the problem. If a saving process is running, kill the process, then power-cycle the jukebox.

  • "Library Fault" The fault messages are
Library Faulted:
Problem in Cabinet 2 ...
Faulted while attempting to place cartridge in destination slot
Press "Dismiss" to Restart

Check the message on the touch screen of the Library, if nothing is unusual, press "Dismiss".

  • Drive in Service Mode
Go to the machine room to have a look.

If a red letter is lit on the drive, usually pressing the reset button (in the front of the drive) will fix the problem, then enabled the drive (use Networker Management Console).

If the tape is ejected from the drive, but the tape is not took out from it by NetWorker, run nsrjb to find out which tape in the drive, which slot the tape is stored; take the tape out from the drive, then put it back in; enabled the drive, and run

nsrjb -u -f device

where is is one of four drive devices (see above), which could be found by typing "nsrjb". If the above doesn't fix the problem, put the tape in the drive; shut down NetWorker, restart NetWorker; then run

nsrjb -u -f device

Usually need to do inventory for the slot(s) found from above since the version of NetWorker (7.6.4) it seems like to change slot where the tape was stored for this situation.

  • "E" code on a drive
If an orange "E" flashing on a drive, we usually lose a device file for that drive, /dev/nstN. Normally we have four

[root@backup-0 ]# ls -l /dev/nst?
crw-rw---- 1 root tape 9, 128 Sep 19 14:40 /dev/nst0
crw-rw---- 1 root tape 9, 129 Sep 25 09:59 /dev/nst1
crw-rw---- 1 root tape 9, 130 Sep 19 14:44 /dev/nst2
crw-rw---- 1 root tape 9, 131 Sep 19 14:45 /dev/nst3

If OS can still see the Library (jukebox), for example,

[root@backup-0 ]# /usr/local/sbin/atinfo -i all
[...]

B:T:L Vendor Product / Rev Type Capacity
Serial Number
--------- -------- ---------------- ---- ------- ----------
0:1:0 IBM ULTRIUM-HH5 D2A1 Tape N/A
SN:9068051328
0:1:1 QUALSTAR RLS-85 007D Changer N/A
SN:213034410
0:2:0 IBM ULTRIUM-HH5 D2A1 Tape N/A
SN:1068074057
0:3:0 IBM ULTRIUM-HH5 D2A1 Tape N/A
SN:1068073885

The "QUALSTAR" line is the jukebox. We missed one drive from the above output. We could go to the back of the jukebox, slide out and back in the drive that has an "E" error. The above output of "ls -lt /dev/nst?" shows that /dev/nst1 has a newer time stamp, that is the result after sliding out and back in. If OS cannot see the jukebox, go ahead to slide out and in the drive in question, if stil cannot see the jukebox, we need to power-cycle the jukebox, and restart NetWorker.

  • cannot access the hardware

If we see this error message from the output of 'nsrwatch', wait for 2 minutes, usually it will be followed by a message like "Hardware status of jukebox changed from 'cannot access the hardware' to 'ready' ". Check the logs /nsr/logs/message. That message appears when we open an I/O port, or library cleans a tape drive. At situations the real error happens, if OS can see all drives and jukebox, restart NetWorker. If cannot see any drive or If could see all drives but not jukebox (see above), need to power-cycle the jukebox, and restart NetWorker. The jukebox uses the first drive for the library interface. After setting the drive 1 to read only, we never need to power-cycle the jukebox when we see "cannot access hardware" message.

  • Waiting for more available space

We use 'adv_file' type for Disk volumes, save sets will not be written continuously from one Disk volume to another. In case seeing message like '(alert) Waiting for more available space on filesystem /jet_dsik1 for device /jet_disk1', run /u/gxshen/bin/deleoldsavesets, or kill the the save process on backup.cs. Mark the volume as full (nsrmm -o full -y Diskvolume), and mount another available Disk volume (look for hints from /u/gxshen/bin/mountdiskvolume) and run a make up save for aborted clients (found from logs or emails).

Please see phone numbers below for tape library and EMC NetWorker support.

Upgrade Networker Server

  • Put the NetWorker databases in a consistent state. Run nsrim -X, nsrck -m, and nsrck -L6

  • Record the latest bootstrap save set ID and its associated volume label. Run mminfo -B to get this information.

  • Record the current location of the NetWorker client file indexes using nsrls

  • Record the range of ports the NetWorker software uses using nsrports

  • Shut down the NetWorker server by running nsr_shutdown or systemctl stop networker

  • Shut down NetWorker Console server by running systemctl stop gst

  • Use 'rpm -qa | grep lgto' command to display the list of installed NetWorker packages.
        [gxshen@backup-0 ~]$ rpm -qa | grep lgto
        lgtoserv-19.8.0.2-1.x86_64
        lgtoxtdclnt-19.8.0.2-1.x86_64
        lgtoman-19.8.0.2-1.x86_64
        lgtoauthc-19.8.0.2-1.x86_64
        lgtoclnt-19.8.0.2-1.x86_64
        lgtonmc-19.8.0.2-1.x86_64
        lgtonode-19.8.0.2-1.x86_64
        [gxshen@backup-0 ~]$

  • Use "rpm -Uvh package [package]..." where package [package]... is a list of the software package for the installation type. That is, seven newer ones for packages shown from the output of 'rpm -qa | grep lgto'

  • Start the NetWorker daemons by running systemctl start networker.

  • Start the NetWorker Console server by running systemctl start gst

  • Check daemons nsrd, nsrexecd, nsrindexd, nsrmmdbd and nsrmmd are running.

  • Test backups and recovery

Legato NetWorker Directive File

General Description

During backup processes, Legato NetWorker uses directives to control how particular files are to be backed up, how descendant directories are searched, and how subsequent directives are processed.

We use directives on the backup server to skip /tmp, /cdrom, /var/tmp, /mnt, and /floopy, and to back up /var/mail using mail style file locking and preserving "new mail has arrived" flag.

A .nsr directive file is parsed before any file in that directory is backed up, so any user can create a .nsr file and place it in his or her home directory (or subdirectories) to eliminate files to be backed up. A privileged user can place a .nsr file in the root directory (/) to eliminate a whole file system to be backed up. Each line of a .nsr file contains one directive. The most useful directive for a usr is skip directive. It does not back up the specified files and directories. The standard shell file pattern matching (*, [...], [!...], [x-y], ?) can be used to match file names. If a "+" precedes skip, then the directive is propagated to subdirectories.

Examples of .nsr File

A /.nsr file containing:

  << /usr/src >>
      +skip: core *.o
      +compressasm: .

will skip all files named core or * .o in /usr/src and subdirectories. And other files in the /usr/src will be compressed during backup (and will be set up for automatic decompression on recover).

The following .nsr file will skip everything in the directory (and subdirectories) it is placed in. This is useful to skip some directory used for large temporary files.

  
  << . >>
      +skip: .

Having a .nsr file containing

  << . >>
      skip: *.jpg *.gif

without the "+" sign, it will skip named *.jpg or *.gif only in the directory, not in subdirectories.

The following example will skip every thing in /toberaw and /toberaw2

  << /toberaw >>
      +skip: .?* *
  << /toberaw2 >>
      +skip: .?* *

For more information about .nsr file, please look at man pages of nsr(5), nsr_directive(5), and uasm(1). This note is based on these man pages.

Support information

Debian / Ubuntu clients

See LinuxLegatoClientSetup.

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf NetWorker_8.1_Server_Disaster_Recovery_and_Availability_Best_Practices_Guide.pdf r1 manage 1551.4 K 2014-05-28 - 16:40 GuoxiangShen  
PDFpdf Qualstar_Q80_Manual.pdf r1 manage 6700.6 K 2024-02-23 - 10:56 GuoxiangShen Q80 Tape Library Installation and Operations Manual
PDFpdf docu50625_NetWorker-8.1-SP1-Installation-Guide--.pdf r1 manage 2179.8 K 2014-05-28 - 16:22 GuoxiangShen  
Compressed Zip archivezip itmpt-5.07.01.zip r1 manage 615.6 K 2007-07-16 - 14:39 GuoxiangShen  
PDFpdf legato_disaster_7.0.pdf r1 manage 1811.0 K 2007-07-16 - 13:30 GuoxiangShen  
Unknown file formatconf sd.conf r1 manage 1.2 K 2007-07-16 - 14:38 GuoxiangShen  
Unknown file formatconf ssd.conf r1 manage 1.1 K 2009-06-02 - 15:23 GuoxiangShen  
Edit | Attach | Watch | Print version | History: r50 < r49 < r48 < r47 < r46 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r50 - 2024-11-19 - MariHassanzada
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback