Documentation
EDOCS
Console Access
- cscf.cs# rsh curuhead.cs
- cu -s38400 -l/dev/ttyd2
Power-up
- There are 4 power buttons next to an green backlit LCD display - each shows the power status
- Make sure each unit is on - curahead is the top part and the bottom two are curupira
Manual Power up of curupira
- log onto cscf and become root
- suw-2.03# rsh curuhead.cs
IRIX Release 6.5 IP22 curuhead
Copyright 1987-2002 Silicon Graphics, Inc. All Rights Reserved.
Last login: Wed Jul 3 11:33:32 EDT 2013 by root@cscf.cs.uwaterloo.ca
curuhead 1# ^[[A^[[A^?
curuhead 1# cu -s38400 -l/dev/ttyd2
Connected
System Maintenance Menu
1) Start System
2) Install System Software
3) Run Diagnostics
4) Recover System
5) Enter Command Monitor
Option? 5
Command Monitor. Type "exit" to return to the menu.
>> auto
Hardware Problems
Bottom two sections of curupira are shutting down frequently
- Likely due to the following problems
curupira.cs console login: 001c04
001c04 ATTN: 1.5V low warning limit reached @ 1.340V.
WARNING: 001c04 ATTN: 1.5V low warning limit reached @ 1.340V.
disks dksc0d118vol and dksc0d125vol are dead
- Note: This Array will not be fixed! - see RT#85393
- See D Brick Notes below
- /dev/xlv/xlv0 on /share/disk/curupira1 and /dev/xlv/xlv1 on /share/disk/curupira1.mirror is broken
- During startup you will see this message concerning failed disks: dksc0d118vol and dksc0d125vol
- There are two RAID0 arrays disks (dksc0d114vol...dksc0d118vol ) and (dksc0d119vol...dksc0d122vol and dksc0d125vol) each form a RAID0 array
- Both of these arrays were mirrored for redundancy - however each disk in the corresponding array has failed so the whole array is broken.
- When we backed up images of the good disks we found that a few of the other disks had bad blocks
Selecting Default Server
NOTICE: Starting failsoftd
dksc0d118vol: Device not ready, spinning up
dksc0d118vol: Device not ready: Not ready to perform command (asc=0x4, asq=0x0) (FRU=0x2)
dksc0d118vol: Device spin up failed, unable to use device -- corrective action necessary
ioconfig: ERROR:scsi_ctlr_walk_fn : Cannot open the file : /hw/module/001c04/Ibrick/xtalk/15/pci/1/scsi_ctlr/0/target/118/lun/0/disk/volume/char
error is: I/O error
dksc0d125vol: Device not ready, spinning up
dksc0d125vol: Device not ready: Not ready to perform command (asc=0x4, asq=0x0) (FRU=0x2)
dksc0d125vol: Device spin up failed, unable to use device -- corrective action necessary
ioconfig: ERROR:scsi_ctlr_walk_fn : Cannot open the file : /hw/module/001c04/Ibrick/xtalk/15/pci/1/scsi_ctlr/0/target/125/lun/0/disk/volume/char
error is: I/O error
...
...
mount: /dev/xlv/xlv0 on /share/disk/curupira1: No such file or directory
mount: giving up on:
/share/disk/curupira1
mount: /dev/xlv/xlv1 on /share/disk/curupira1.mirror: No such file or directory
mount: giving up on:
/share/disk/curupira1.mirror
Hardware notes
Curupira is an SGI 3200 server. It has C-, G-, I-, and D-Bricks. This twiki concerns the D-Bricki hard drive configuration.
Viewing the D-Brick
The D-Brick consists of 12 drive bays of which 10 are populated. They are numbered differently depending on what SGI data manual or section is referred. This is a view of the drive looking from the front of the machine, numbered col/row, #drive, and system number (dks0dXXXvh):
1/1 #9 2/1 #10 3/1 #11 4/1 #12
dks0d122vh X X dks0d125vh
1/2 #5 2/2 #6 3/2 #7 4/2 #8
dks0d118vh dks0d119vh dks0d120vh dks0d121vh
1/3 #2 2/3 #3 3/3 #4 4/3 #1
dks0d115vh dks0d116vh dks0d117vh dks0d114vh
Drive grouping
The two drives at the bottom corners dks0d115vh and dks0d114vh are reserved for the system.
The drives dks0d116vh, dks0d117vh, dks0d118vh, and dks0d119vh are striped and mirrored by the drives dks0d120vh, dks0d121vh, dks0d122vh, and dks0d125vh.
The two drive groups are managed by the XFS volume manager referred to as XLV:
curupira 26# mount
/dev/root on / type xfs (rw,raw=/dev/rroot)
/hw on /hw type hwgfs (rw)
/proc on /proc type proc (rw)
/dev/fd on /dev/fd type fd (rw)
/dev/xlv/xlv0 on /share/disk/curupira1 type xfs (rw,grpid,raw=/dev/rxlv/xlv0)
/dev/dsk/dks0d1s4 on /home type xfs (rw,grpid,raw=/dev/rdsk/dks0d1s4)
/dev/xlv/xlv1 on /share/disk/curupira1.mirror type xfs (rw,grpid,raw=/dev/rxlv/xlv1)
/dev/dsk/dks0d1s3 on /.software type xfs (rw,grpid,raw=/dev/rdsk/dks0d1s3)
--
GordBoerke - 05 Nov 2012