We use Legato to back up the majority of our systems. Here are some usage notes.
Client Setup
Checking backup results in Legato
Run mminfo on any unix host, for example
@ubuntu2204-102[175]% mminfo -s backup.cs -c m160.cs -t "3 days ago" -ot
volume client date size level name
Disk302 m160.cs 02/21/2024 28 GB incr /
Disk303 m160.cs 02/22/2024 27 GB incr /
Disk303 m160.cs 02/23/2024 28 GB incr /
@ubuntu2204-102[176]%
Restore files from backups
- Quick guide:
- Run the command recover ( become root first if it is for users other then yourself )
- help to get a list of available commands
- changetime to change the time you wish to recover files from
- ls to list files
- cd to change directories
- add to add files or directories for recovery
- rel if you need to relocate where recovered files are finally restored to.
- recover to actually do the restore after using the previously listed commands
- See also See https://cs.uwaterloo.ca/cscf/internal/infrastructure/setups/services/backup/restore.shtml
Monitor what Legato is doing
- on linux.cscf or linux.cs, run:
It's safer to run this from the client machine.
Alternatively, you can log in to the client machine, run
recover
, cd to whatever directory for which you want to see information, and then type
versions
.
Brief Information about backups
Backup server is backup-0.cs (backup.cs). It is a Dell
PowerEdge R815 running Red Hat Enterprise Linux Server release 8 We are running NetWorker 19.8. There are 4 LTO 8 tape drives in Qualstar Q80 library (jukebox). Four drive devices are
/dev/tape/by-id/scsi-35000e111ce1930b5-nst (drive 1)
/dev/tape/by-id/scsi-35000e111ce1930bf-nst (drive 2)
/dev/tape/by-id/scsi-35000e111ce1930c9-nst (drive 3)
/dev/tape/by-id/scsi-35000e111ce1930d3-nst (drive 4)
The jukebox consists of two Modules. The top one is an expansion (Module 2, slots from 71 to 150), The bottom one is the base (Module 1, slots from 1 to 70). Four drives are in the base unit. There is one Mailslot enabled in the Module 1 on its right side, ports 1 - 10, numbered from bottom to top. Need an account to log onto Q80 to operate the jukebox either remotely or using the touch screen. For example, to open mailslot, just touch/click the "Operation", then touch/click "Open Mailslot", pull the handle after Mailslot is unlocked. After loading tape(s) to Mailslot, push it back to close it. An operation manual of the jukebox could be found at
Qualstar_Q80_Manual.pdf
We do backups in 3 different levels: incremental, Cumulative incremental and full. Incremental saves are backed up to disks daily, Cumulative incremental (every 2 weeks) and full (each term) are backed up to tapes. Cumulative incremental or full saves are spread weekly long to reduce load. And we always have full save running for bootstrap every day.
Shut Down and Start NetWorker Server
Always check if any user is recovering files or backups are running before shutting down NetWorker server.
Use
systemctl stop networker or
nsr_shutdown to shut down the NetWorker server. Use
systemctl start networker to start the server, and
systemctl stop/start gst to stop/start
Networker Management Console (NMC). Need to run these two commands if NetWorker server or NMC is not started after reboot the host backup.cs .
Add New Client
Use NMC or
/u/gxshen/bin/add_networker_cs_client to add new client.
Withdraw and Deposit Tapes
- Use "nsrjb -w [ -P port ] volume" to withdraw a tape to Mailslot. For example, nsrjb -w -P 10 79001 to withdraw tape 79001 to port 10 (very top one slot in Mailslot), then open Mailslot to get the volume 79001.
- Open the Mailslot and load the tape to port(s) in Mailslot. Use nsrjb -d [-P port] -S slot_number volume or nsrjb -d [-P port] -S slot for a unlabled tape with a barcode to deposit a tape in jukebox. For example, nsrjb -d -P 7 -S 72 79001. Need to wait the following and answer yes for version 19.8.
Load the cartridges into the ports, and enter Yes to continue.
Inventory
We need to do inventory sometimes. Use GUI
nwadmin, choose an empty drive (umount the tape in a not current used drive if no empty drive available), then click
Media -> Inventory -> fill in first slot, last slot, click OK. or run
nsrjb -I -f device -S first[-last_slot], where device is one of four drive devices (see above), which could be found by typing
"nsrjb".
Label Tapes
A tape can be used only after it is labeled. Use
nwadmin to label a tape (tapes). Choose a drive without a tape mounted on then choose
Media -> Label -> Pool -> Fill in first slot, last slot, click OK.
Need to use
Networker Management Console to label tape. The password to access
Network Management Console can be found at the usual place under letter B. The
NetWorker Management Console can be started by launching mozilla on backup.cs and point the browser to
https://backup-0.cs:9000
Reenable a Drive by Using *NetWorker Management Console
Sometime a drive is disabled or in service mode after a certain number of errors. The drive need to be reenabled before it can be used again. Check the drive first before reenabling it.
- Click NetWorker from NetWorker Management Console to lunch Administration window.
- In the Administration window, click Devices. The Devices detail table appears.
- Right-click the drive to be enabled, and select Properties. The Properties window appears.
- On the General tab, in the Status area, set Enabled to Yes.
- Click OK.
Routine Operation
- Run nsrwatch on any backup client (ubuntu2204-102.cs for example) for monitoring backups and recoveries.
- Check backup notifications
- Backup logs are in /nsr/logs on backup, older ones are under /nsr/logs/OLD. policy_notifications.log and /nsr/logs/OLD/daemon.raw.1 are more helpful. To read daemon.raw, need to use 'nsr_render_log',
nsr_render_log /nsr/logs/OLD/daemon.raw.1 > /tmp/gs1 for example.
- Check if there are enough tapes in the jukebox using "nsrjb -C -v |egrep -v full |egrep 'volume|no'"
slot volume used pool barcode volume id recyclable
1: AYV480L8 100% SCS Full Pool AYV480L8 3443852592 no
2: AYV481L8 100% SCS Full Pool AYV481L8 3427075543 no
3: AYV482L8 24% SCS Full Pool AYV482L8 174130262 no
4: AYV483L8 0% SCS Full Pool AYV483L8 157353236 no
5: AYV484L8 0% SCS Full Pool AYV484L8 3469543210 no
6: AYV485L8 0% SCS Full Pool AYV485L8 3486320263 no
7: AYV486L8 0% SCS Full Pool AYV486L8 3452766162 no
22: AYX199L8 0% DFSc Full Pool AYX199L8 1758949483 no
23: AYX198L8 0% DFSc Full Pool AYX198L8 1775726507 no
24: AYX196L8 0% DFSc Full Pool AYX196L8 847634421 no
40: AYX195L8 100% DFSc Full Pool AYX195L8 864411375 no
115: 79083L8 52% SCS Full Pool 79083L8 3460629610 no
LTO 8 tape delivers 12 TB native capacity and up to 30 TB of compressed capacity. A 100% used tape shows that at least 12 TB of data are written to the tape. 'mminfo' is handy to check how full a tape is
[gxshen@backup-0 ~]$ mminfo -mv -q volume=AYV482L8
state volume written (%) expires read mounts capacity volid next type
AYV482L8 2823 GB 24% 2032-03-15 0 KB 2 12 TB 174130262 391 LTO Ultrium-8
- To expire disk volume(s), run /u/gxshen/bin/expireDiskVolumes
- One will see something like "space recovered from volume Disk30X" from the output of nsrwatch.
Index files of these expired save sets on Disk volumes will be removed immediately.
Since 'nsrim' runs only once every 24 hours, data on expired Disk volume will be removed after next 'nsrim' run. To mount a disk volume, run
/u/gxshen/bin/mountdiskvolume Disk30X.
Troubleshooting
Check logs for any hint
- "Waiting for drive" on LCD display of Qualstar juskebox
Usually power-cycling the jukebox fix the problem. If a saving process is running, kill the process, then power-cycle the jukebox.
- "Library Fault" The fault messages are
Library Faulted:
Problem in Cabinet 2 ...
Faulted while attempting to place cartridge in destination slot
Press "Dismiss" to Restart
Check the message on the touch screen of the Library, if nothing is unusual, press "Dismiss".
Go to the machine room to have a look.
If a red letter is lit on the drive, usually pressing the reset button (in the front of the drive) will fix the problem, then enabled the drive (use
Networker Management Console).
If the tape is ejected from the drive, but the tape is not took out from it by NetWorker, run
nsrjb to find out which tape in the drive, which slot the tape is stored; take the tape out from the drive, then put it back in; enabled the drive, and run
nsrjb -u -f device
where is is one of four drive devices (see above), which could be found by typing
"nsrjb". If the above doesn't fix the problem, put the tape in the drive; shut down NetWorker, restart NetWorker; then run
nsrjb -u -f device
Usually need to do inventory for the slot(s) found from above since the version of NetWorker (7.6.4) it seems like to change slot where the tape was stored for this situation.
If an orange "E" flashing on a drive, we usually lose a device file for that drive, /dev/nstN. Normally we have four
[root@backup-0 ]# ls -l /dev/nst?
crw-rw---- 1 root tape 9, 128 Sep 19 14:40 /dev/nst0
crw-rw---- 1 root tape 9, 129 Sep 25 09:59 /dev/nst1
crw-rw---- 1 root tape 9, 130 Sep 19 14:44 /dev/nst2
crw-rw---- 1 root tape 9, 131 Sep 19 14:45 /dev/nst3
If OS can still see the Library (jukebox), for example,
[root@backup-0 ]# /usr/local/sbin/atinfo -i all
[...]
B:T:L Vendor Product / Rev Type Capacity
Serial Number
--------- -------- ---------------- ---- ------- ----------
0:1:0 IBM ULTRIUM-HH5
D2A1 Tape N/A
SN:9068051328
0:1:1 QUALSTAR RLS-85 007D Changer N/A
SN:213034410
0:2:0 IBM ULTRIUM-HH5
D2A1 Tape N/A
SN:1068074057
0:3:0 IBM ULTRIUM-HH5
D2A1 Tape N/A
SN:1068073885
The "QUALSTAR" line is the jukebox. We missed one drive from the above output. We could go to the back of the jukebox, slide out and back in the drive that has an "E" error. The above output of "ls -lt /dev/nst?" shows that /dev/nst1 has a newer time stamp, that is the result after sliding out and back in. If OS cannot see the jukebox, go ahead to slide out and in the drive in question, if stil cannot see the jukebox, we need to power-cycle the jukebox, and restart NetWorker.
- cannot access the hardware
If we see this error message from the output of 'nsrwatch', wait for 2 minutes, usually it will be followed by a message like "Hardware
status of jukebox changed from 'cannot access the hardware' to 'ready' ". Check the logs /nsr/logs/message. That message appears when we open an I/O port, or library
cleans a tape drive. At situations the real error happens, if OS can see all drives and jukebox, restart NetWorker. If cannot see any drive or If could see all drives but not jukebox (see above), need to power-cycle the jukebox, and restart NetWorker. The jukebox uses the first drive
for the library interface. After setting the drive 1 to read only, we never need to power-cycle the jukebox when we see "cannot access hardware" message.
- Waiting for more available space
We use 'adv_file' type for Disk volumes, save sets will not be written continuously from one Disk volume to another. In case seeing message like '(alert) Waiting for more
available space on filesystem /jet_dsik1 for device /jet_disk1', run /u/gxshen/bin/deleoldsavesets, or kill the the save process on backup.cs. Mark the volume as
full (nsrmm -o full -y Diskvolume), and mount another available Disk volume (look for hints from /u/gxshen/bin/mountdiskvolume) and run a make up save for aborted
clients (found from logs or emails).
Please see phone numbers below for tape library and EMC NetWorker support.
Upgrade Networker Server
- Put the NetWorker databases in a consistent state. Run nsrim -X, nsrck -m, and nsrck -L6
- Record the latest bootstrap save set ID and its associated volume label. Run mminfo -B to get this information.
- Record the current location of the NetWorker client file indexes using nsrls
- Record the range of ports the NetWorker software uses using nsrports
- Shut down the NetWorker server by running nsr_shutdown or systemctl stop networker
- Shut down NetWorker Console server by running systemctl stop gst
- Use 'rpm -qa | grep lgto' command to display the list of installed NetWorker packages.
[gxshen@backup-0 ~]$ rpm -qa | grep lgto
lgtoserv-19.8.0.2-1.x86_64
lgtoxtdclnt-19.8.0.2-1.x86_64
lgtoman-19.8.0.2-1.x86_64
lgtoauthc-19.8.0.2-1.x86_64
lgtoclnt-19.8.0.2-1.x86_64
lgtonmc-19.8.0.2-1.x86_64
lgtonode-19.8.0.2-1.x86_64
[gxshen@backup-0 ~]$
- Use "rpm -Uvh package [package]..." where package [package]... is a list of the software package for the installation type. That is, seven newer ones for packages shown from the output of 'rpm -qa | grep lgto'
- Start the NetWorker daemons by running systemctl start networker.
- Start the NetWorker Console server by running systemctl start gst
- Check daemons nsrd, nsrexecd, nsrindexd, nsrmmdbd and nsrmmd are running.
- Test backups and recovery
Legato NetWorker Directive File
General Description
During backup processes, Legato NetWorker uses directives to control how particular files are to be backed up, how descendant directories are searched, and how subsequent directives are processed.
We use directives on the backup server to skip /tmp, /cdrom, /var/tmp, /mnt, and /floopy, and to back up /var/mail using mail style file locking and preserving "new mail has arrived" flag.
A
.nsr directive file is parsed before any file in that directory is backed up, so any user can create a
.nsr file and place it in his or her home directory (or subdirectories) to eliminate files to be backed up. A privileged user can place a
.nsr file in the root directory (/) to eliminate a whole file system to be backed up. Each line of a
.nsr file contains one directive. The most useful directive for a usr is
skip directive. It does not back up the specified files and directories. The standard shell file pattern matching (*, [...], [!...], [x-y], ?) can be used to match file names. If a "+" precedes skip, then the directive is propagated to subdirectories.
Examples of .nsr File
A
/.nsr file containing:
<< /usr/src >>
+skip: core *.o
+compressasm: .
will skip all files named
core or *
.o in /usr/src and subdirectories. And other files in the /usr/src will be compressed during backup (and will be set up for automatic decompression on recover).
The following
.nsr file will skip everything in the directory (and subdirectories) it is placed in. This is useful to skip some directory used for large temporary files.
<< . >>
+skip: .
Having a
.nsr file containing
<< . >>
skip: *.jpg *.gif
without the "+" sign, it will skip named
*.jpg or
*.gif only in the directory, not in subdirectories.
The following example will skip every thing in /toberaw and /toberaw2
<< /toberaw >>
+skip: .?* *
<< /toberaw2 >>
+skip: .?* *
For more information about
.nsr file, please look at man pages of nsr(5), nsr_directive(5), and uasm(1). This note is based on these man pages.
Support information
Debian / Ubuntu clients
See
LinuxLegatoClientSetup.