LSI MegaRaid controllers are available in various configurations.
See www.lsi.com
The RAID array can be set up via the Adapter BIOS on boot.
Save the Adapter configuration to a file:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -help| grep CfgSave MegaCli -CfgSave -f filename -aN root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -CfgSave -f /admhome/cscf-adm/MegaRaidAdapterConfiguration -a0 Config data is saved to the file. Exit Code: 0x00
To retrieve the configuration:
MegaCli -CfgRestore -f filename -aN
cscf-adm@hops:~$ sudo alien MegaCli-8.07.06-1.noarch.rpm [sudo] password for cscf-adm: Warning: Skipping conversion of scripts in package MegaCli: postinst postrm Warning: Use the --scripts parameter to include the scripts. megacli_8.07.06-2_all.deb generated cscf-adm@hops:~$ sudo dpkg -i megacli_8.07.06-2_all.deb Selecting previously unselected package megacli. (Reading database ... 181600 files and directories currently installed.) Unpacking megacli (from megacli_8.07.06-2_all.deb) ... Setting up megacli (8.07.06-2) ... Processing triggers for libc-bin ... ldconfig deferred processing now taking place cscf-adm@hops:~$ ls /opt MegaRAID cscf-adm@hops:/opt/MegaRAID/MegaCli$ ls -la total 5576 drwxr-xr-x 2 root root 4096 Jan 30 15:17 . drwxr-xr-x 3 root root 4096 Jan 30 15:17 .. -r--r--r-- 1 root root 510200 Nov 14 02:42 libstorelibir-2.so.13.05-0 -rwxr-xr-x 1 root root 2467036 Nov 14 02:42 MegaCli -rwxr-xr-x 1 root root 2716224 Nov 14 02:42 MegaCli64 cscf-adm@hops:/opt/MegaRAID/MegaCli$ ./MegaCli64 -h ... provides us with a whole bunch of command options
Normally one would execute the CLI via the current path. e.g. ./MegaCli64 options
cscf-adm@hops:/opt/MegaRAID/MegaCli$ ./MegaCli64 -CfgDsply -a0 User specified controller is not present. Failed to get CpController object. Exit Code: 0x01 cscf-adm@hops:/opt/MegaRAID/MegaCli$ ./MegaCli64 -AdpAllinfo -aALL Exit Code: 0x00
The problem lies in the architecture. The architecture must be specified in the command.
cscf-adm@hops:/usr/local/MegaRAID Storage Manager$ sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpCount Controller Count: 1. Exit Code: 0x01
sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -help
setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -aALL
Here we see that the RAID array was created by dividing the 45 drives into two groups called "Spans", each with 22 drives called "PD" Physical Devices.
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -aALL|grep -E 'SPAN|Span\ Ref|Number\ of' Number of DISK GROUPS: 1 SPANNED DISK GROUP: 0 Number of Spans: 2 SPAN: 0 Span Reference: 0x00 Number of PDs: 22 Number of VDs: 1 Number of dedicated Hotspares: 0 SPAN: 1 Span Reference: 0x01 Number of PDs: 22 Number of VDs: 1 Number of dedicated Hotspares: 0
cscf-adm@hops:/usr/local/MegaRAID Storage Manager$ sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -AdpSetProp AlarmSilence -aALL Adapter 0: Set alarm to Silenced success. Exit Code: 0x00
Finding a faulty drive may be easy on some systems. A faulty drive may indicate as a flashing red LED on the drive bay. The RAID adapter may also sound an alarm.
To get information about the entire RAID controller:
cscf-adm@hops:/usr/local/MegaRAID Storage Manager$ sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0
One interesting piece of information resulting from the above command:
Device Present ================ Virtual Drives : 1 Degraded : 1 Offline : 0 Physical Devices : 48 Disks : 45 Critical Disks : 0 Failed Disks : 1
The logical drive information will not give a specific drive, but will report whether the array is degraded. If a "hot spare" drive is faulty the array won't be degraded.
The next command asks for "ldinfo" Logical Drive Info, "lall" All Logical devices, "a0" for Adapter 0.
cscf-adm@hops:/usr/local/MegaRAID Storage Manager$ sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -ldinfo -lall -a0 Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name : RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 72.753 TB Sector Size : 512 Parity Size : 7.275 TB State : Partially Degraded Strip Size : 64 KB Number Of Drives per span:22 Span Depth : 2 Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Default Power Savings Policy: Controller Defined Current Power Savings Policy: None Can spin up in 1 minute: Yes LD has drives that support T10 power conditions: No LD's IO profile supports MAX power savings with cached writes: Yes Is VD Cached: No Exit Code: 0x00
To get a log of the controller:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpeventlog -getevents -f lsi-events.log -a0 -nolog Success in AdpEventLog Exit Code: 0x00 root@hops:/opt/MegaRAID/MegaCli# ls -lat total 16428 -rw-r--r-- 1 root root 9764271 Feb 26 16:08 lsi-events.log -rw-r--r-- 1 root root 1226454 Feb 26 16:06 MegaSAS.log
Log entry pertaining to the faulty drive:
cscf-adm@hops:/opt/MegaRAID/MegaCli$ less lsi-events.log ...searched for "Jan 30" in this long file containing log data from "Oct 3" Time: Wed Jan 30 15:51:49 2013 Code: 0x00000071 Class: 0 Locale: 0x02 Event Description: Unexpected sense: PD 0c(e0x21/s3) Path 50030480015b134f, CDB: 28 00 86 8c 04 80 00 00 80 00, Sense: 3/11/00 Event Data: =========== Device ID: 12 Enclosure Index: 33 Slot Number: 3 CDB Length: 10 CDB Data: 0028 0000 0086 008c 0004 0080 0000 0000 0080 0000 0000 0000 0000 0000 0000 0000 Sense Length: 18
To get information about all the drives in the array use the pdlist option on the array controller 0:
cscf-adm@hops:/usr/local/MegaRAID Storage Manager$ sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0
To determine which drive is faulty:
cscf-adm@hops:/usr/local/MegaRAID Storage Manager$ sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -E 'Inquiry|Firmware\ state:\ Failed|Slot|ID|Unconfigured'
Inquiry displays the drive's serial number.
Firmware states are either "Failed", "Online, Spun Up", "Online, Spun Down", "Unconfigured(bad)", "Unconfigured(good), Spun down", "Hotspare, Spun down", "Hotspare, Spun up" or "not Online".
If a hot spare has been built into the array to compensate for the failed drive, then the above command may not show the failed drive as failed. Use the next command to view the drive:
setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [E:S] -a0 #Where E is the enclosure ID and S is the drive bay slot.
If the hot spare is built into the array it will report as "Online, Spun Up". Hence, it can't be determined that it was previously the hot spare.
In the "pdlist" command "Slot" will display drive bay number.
ID displays the enclosure.
The MegaRAID SAS9285CV-8e has two divisions. In this example 24 drives are connected to ID 33 at the front of the JBOD drive bay enclosure and 21 drives to ID 55 at the back of the JBOD enclosure.
Here's a sample output of "setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL" showing only the failed drive.
Notice the media error count. Also, notice no SMART alert is reported, yet the drive is faulty:
Enclosure Device ID: 33 Slot Number: 3 Drive's position: DiskGroup: 0, Span: 0, Arm: 3 Enclosure position: 1 Device Id: 12 WWN: 5000c5004537e2fb Sequence Number: 3 Media Error Count: 401 Other Error Count: 5 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Failed Device Firmware Level: CC49 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x50030480015b134f Connected Port Number: 0(path0) Inquiry Data: ATA ST2000DM001-9YN1CC49 W2406BSP FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified Drive Temperature :31C (87.80 F) PI Eligibility: No Drive is formatted for PI information: No PI: No PI Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No
A mention of drive identification within the enclosure is in order.
Most drive enclosures will have disk drive bays with LEDs to alert a fault or proper functioning. A drive bay LED can be "blinked" to determine its location. Note that the drive must have the correct firmware to allow this function.
As an example this Seagate Baraccuda drive will light the blue LED on the drive bay enclosure:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [33:0] -a0 Enclosure Device ID: 33 Slot Number: 0 ... Device Firmware Level: CC49 ... Inquiry Data: W1E05QBAST2000DM001-9YN164 CC49
A similar Seagate Baraccuda drive will not light the blue LED. It is a newer model number with a different firmware.
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [33:18] -a0 Enclosure Device ID: 33 Slot Number: 18 ... Device Firmware Level: CC24 ... Inquiry Data: Z240K4H2ST2000DM001-1CH164 CC24
Although the drive with model number 1CH164 won't turn on its drive bay blue LED it will still blink its red LED with the command below:
Start the blinking
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[E:S] -aALLWhere E is the enclosure and S is the slot.
Stop the blinking
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[E:S] -aALL
The JBOD enclosure may not have all drives assigned to the array. An unused drive will show as "Unconfigured(good), Spun down".
It may require several minutes (maybe half an hour) for the array controller to recognize a new drive install.
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [55:20] -a0 Enclosure Device ID: 55 Slot Number: 20 Enclosure position: 1 Device Id: 54 WWN: 5000c500452e1aa7 Sequence Number: 1 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Unconfigured(good), Spun down Device Firmware Level: CC49 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x5003048001943c5c Connected Port Number: 1(path0) Inquiry Data: W1E065X6ST2000DM001-9YN164 CC49 ...
To make the drive a global Hot Spare
Note there is a difference between dedicated and global hot spares. In this example there are two spans as this is a RAID 60. A dedicated hot spare would be assigned to only one of the two spans. A global hot spare will work for either span, hence for the whole array.
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -PhysDrv [55:20] -a0 Adapter: 0: Set Physical Drive at EnclId-55 SlotId-20 as Hot Spare Success. Exit Code: 0x00
If the system had a faulty drive at this point the array will immediately rebuild with the hot spare, as seen in the "Firmware state" field and the drive bay will blink its red LED:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [55:20] -a0 Enclosure Device ID: 55 Slot Number: 20 Drive's position: DiskGroup: 0, Span: 0, Arm: 3 Enclosure position: 1 Device Id: 54 WWN: 5000c500452e1aa7 Sequence Number: 3 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Rebuild Device Firmware Level: CC49
The auto rebuild option can be seen in the adapter "Settings". The "Device Present" will continue to show "degraded" until the rebuild is complete:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 ... Settings ================ Current Time : 20:47:28 2/5, 2013 Predictive Fail Poll Interval : 300sec Interrupt Throttle Active Count : 16 Interrupt Throttle Completion : 50us Rebuild Rate : 30% PR Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruction Rate : 30% Cache Flush Interval : 4s Max Drives to Spinup at One Time : 4 Delay Among Spinup Groups : 2s Physical Drive Coercion Mode : Disabled Cluster Mode : Disabled Alarm : Enabled Auto Rebuild : Enabled ... Device Present ================ Virtual Drives : 1 Degraded : 1 Offline : 0 Physical Devices : 48 Disks : 45 Critical Disks : 0 Failed Disks : 1
If the hot spare is at any point replaced, the replacement drive may need to be reset as a hot spare. Simply run the command again...
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -PhysDrv [55:20] -a0 Adapter: 0: Set Physical Drive at EnclId-55 SlotId-20 as Hot Spare Success.
A RAID 60 is composed of two RAID 6 arrays spanned at RAID 0. Both RAID 6 arrays may sustain two failed drives and retain data integrity.
If a "hot spare" is built into the array after the first drive failure, then three failed drives may be sustained in one of the RAID 6, but the other RAID 6 will only sustain two failed drives.
Here we see 45 drives (one of which is a hot spare), one failed drive, and the "hot spare" [55:20] as on-line. This may indicate that the hot spare has already been built into the array (as it is no longer degraded) and that a second drive has failed.
root@hops:/opt/MegaRAID# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 ... Device Present ================ Virtual Drives : 1 Degraded : 0 Offline : 0 Physical Devices : 48 Disks : 45 Critical Disks : 0 Failed Disks : 1 ... root@hops:~# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0|grep -E 'Inquiry|Firmware|Slot|ID' ... Enclosure Device ID: 55 Slot Number: 20 Firmware state: Online, Spun Up Device Firmware Level: CC49 Inquiry Data: W1E065X6ST2000DM001-9YN164 CC49
Take the faulty drive offline. It may start the alarm once offline, so stop the alarm.
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv [33:3] -a0 Adapter: 0: EnclId-33 SlotId-3 state changed to OffLine. Exit Code: 0x00 root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aAll ... Number of enclosures on adapter 0 -- 3 Enclosure 0: Device ID : 33 Number of Slots : 24 Number of Power Supplies : 2 Number of Fans : 5 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 23 root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [33:3] -a0 | grep state Firmware state: Unconfigured(bad)
A faulty drive may not go into the off-line mode. Check to see if the array is in the "spun down" state. If it is spun down then access or edit a file in the array. That will then spin up the drives. Then try to "off-line" the faulty drive.
myself@hops:/usr/local/MegaRAID Storage Manager$ sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdList -a0|grep -E 'Inquiry|Firmware|Slot|ID' Enclosure Device ID: 33 Slot Number: 0 Firmware state: Online, Spun down
myself@hops:/usr/local/MegaRAID Storage Manager$ sudo setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 ... Settings ============ Current Time : 15:52:21 11/15, 2013 ... Max Drives to Spinup at One Time : 4 Delay Among Spinup Groups : 2s ... Maximum number of direct attached drives to spin up in 1 min : 120
If the drive won't go off-line then just go ahead with the drive replacement. However, note that the new drive may not be immediately recognized by the adapter. It may require up to half an hour for the controller to acknowledge the new drive.
Check the drive status again. It may show that the drive no longer exists in the array or that it is still faulty. After waiting half an hour you may want to re-seat the drive.
Showing drive in "Slot Number: 3" missing:
root@hops:/usr/local/MegaRAID Storage Manager# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdList -a0|grep -E 'Inquiry|Firmware|Slot|ID' ... Enclosure Device ID: 33 Slot Number: 2 Firmware state: Online, Spun Up Device Firmware Level: CC49 Inquiry Data: W1E06XMMST2000DM001-9YN164 CC49 Enclosure Device ID: 33 Slot Number: 4 Firmware state: Online, Spun Up Device Firmware Level: CC49 Inquiry Data: W1E05NLZST2000DM001-9YN164 CC49 ... Enclosure Device ID: 55 Slot Number: 20 Firmware state: Online, Spun Up Device Firmware Level: CC49 Inquiry Data: W1E065X6ST2000DM001-9YN164 CC49
Replace the drive in the bay and put it into the enclosure. It is hot-swappable and should immediately be accepted by the array.
If a "Hot Spare" was present the hot spare will now copy its data to the new drive. The alarm may sound, the blue LED should be on, and the red LED flashing indicating a "copy back". As well you should notice all other drives flashing:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 ... Device Present ================ Virtual Drives : 1 Degraded : 0 Offline : 0 Physical Devices : 48 Disks : 45 Critical Disks : 0 Failed Disks : 0 root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [33:3] -a0 Enclosure Device ID: 33 Slot Number: 3 Enclosure position: 1 Device Id: 12 WWN: 5000c5005cc361d1 Sequence Number: 10 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Copyback Device Firmware Level: CC24 ... root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [55:20] -a0 Enclosure Device ID: 55 Slot Number: 20 Drive's position: DiskGroup: 0, Span: 0, Arm: 3 Enclosure position: 1 Device Id: 54 WWN: 5000c500452e1aa7 Sequence Number: 4 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Online, Spun Up Device Firmware Level: CC49
The process may require several hours to copy back the information from the Hot spare to the new drive. In the example above the hot spare is [55:20] and the new drive [33:3]. For a RAID 60 of two 22 drive spans each a copy-back will require approximately 3.5 hours. Without a "hot spare copy-back" a rebuild requires approximately 6.5 hours.
After copy back completes the new drive is online and the hot spare once again shows "Hot Spare":
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDCpyBk -ShowProg -PhysDrv[33:3] -a0 Physical Drive is not in Copyback state. Exit Code: 0x00 root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [33:3] -a0 Enclosure Device ID: 33 Slot Number: 3 Drive's position: DiskGroup: 0, Span: 0, Arm: 3 Enclosure position: 1 Device Id: 12 WWN: 5000c5005cc361d1 Sequence Number: 11 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Online, Spun Up Device Firmware Level: CC24 ... root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [55:20] -a0 Enclosure Device ID: 55 Slot Number: 20 Enclosure position: 1 Device Id: 54 WWN: 5000c500452e1aa7 Sequence Number: 5 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Hotspare Information: Type: Global, is revertible Raw Size: 1.819 TB [0xe8e088b0 Sectors] Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors] Coerced Size: 1.818 TB [0xe8d00000 Sectors] Sector Size: 0 Firmware state: Hotspare, Spun down Device Firmware Level: CC49 ...
Note the hot spare may continue to show a red flashing LED after the copy-back. This will occur once it is in PowerSave mode. It is in a "ready" state and will automatically rebuild in case of another drive failure.
Have the hot spare drive in spun up state:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [55:20] -a0|grep Firmware\ state Firmware state: Hotspare, Spun down root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Rmv -PhysDrv [55:20] -a0 Adapter: 0: Remove Physical Drive at EnclId-55 SlotId-20 as Hot Spare Success. Exit Code: 0x00 root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [55:20] -a0|grep Firmware\ state Firmware state: Unconfigured(good), Spun Up root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -PhysDrv [55:20] -a0 Adapter: 0: Set Physical Drive at EnclId-55 SlotId-20 as Hot Spare Success. Exit Code: 0x00 root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -PhysDrv [55:20] -a0|grep Firmware\ state Firmware state: Hotspare, Spun Up
If two drives have failed, the hot spare should have been built into the array. Replace one of the drives. The hot spare may not show that it is copying back data. It will reserve that for the other failed drive. Notice that the replaced drive rebuild can be see with the command:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv [55:5] -a0 Rebuild Progress on Device at Enclosure 55, Slot 5 Completed 5% in 17 Minutes. Exit Code: 0x00
To help determine drive, fan, temperature, etc. information use the enclosure info command:
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aALL Number of enclosures on adapter 0 -- 3 Enclosure 0: Device ID : 33 Number of Slots : 24 Number of Power Supplies : 2 Number of Fans : 5 Number of Temperature Sensors : 1 Number of Alarms : 1 Number of SIM Modules : 0 Number of Physical Drives : 23 Status : Normal Position : 1 Connector Name : Port B Enclosure type : SES FRU Part Number : N/A Enclosure Serial Number : N/A ESM Serial Number : N/A Enclosure Zoning Mode : N/A Partner Device Id : 65535 Inquiry data : Vendor Identification : LSI CORP Product Identification : SAS2X36 Product Revision Level : 0717 Vendor Specific : x36-55.7.23.0 Number of Voltage Sensors :2 Voltage Sensor :0 Voltage Sensor Status :OK Voltage Value :5000 milli volts Voltage Sensor :1 Voltage Sensor Status :OK Voltage Value :11700 milli volts Number of Power Supplies : 2 Power Supply : 0 Power Supply Status : OK Power Supply : 1 Power Supply Status : OK Number of Fans : 5 Fan : 0 Fan Speed :High Speed Fan Status : OK Fan : 1 Fan Speed :High Speed Fan Status : OK Fan : 2 Fan Speed :High Speed Fan Status : OK Fan : 3 Fan Status : Not Installed Fan : 4 Fan Status : Not Installed Number of Temperature Sensors : 1 Temp Sensor : 0 Temperature : 36 Temperature Sensor Status : OK Number of Chassis : 1 Chassis : 0 Chassis Status : OK ... output continues with enclosure 1 containing 21 drives and 2 containing no drives.
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aALL BBU status for Adapter: 0 BatteryType: iBBU-09 Voltage: 4073 mV Current: 0 mA Temperature: 24 C Battery State: Optimal Segmentation fault (core dumped)
Battery Write-back cache should be enabled
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -a0 | grep -i cache Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU Disk Cache Policy : Disk's Default LD's IO profile supports MAX power savings with cached writes: Yes Is VD Cached: No
Get the event information from the Adapter
root@hops:/opt/MegaRAID/MegaCli# setarch x86_64 --uname-2.6 /opt/MegaRAID/MegaCli/MegaCli64 -adpeventlog -getevents -f lsi-events.log -a0 -nolog
View the event log.
root@hops:/opt/MegaRAID/MegaCli# less lsi-events.log
Shown next is the output after the drive in enclosure 55 slot 15 failed and was replaced:
Failed ->
Time: Thu Jun 13 17:09:29 2013
Code: 0x000000b9
Class: 2
Locale: 0x04
Event Description: Enclosure PD 37(c Port A/p1) phy bad for slot 15
Event Data:
===========
Device ID: 55
Enclosure Index: 1
Slot Number: 1
Index: 15
Replaced ->
Time: Thu Jun 13 17:40:50 2013
Code: 0x000000f7
Class: 0
Locale: 0x02
Event Description: Inserted: PD 31(e0x37/s15) Info: enclPd=37, scsiType=0, portMap=01, sasAddr=5003048001943c57,0000000000000000
Event Data:
===========
Device ID: 49
Enclosure Device ID: 55
Enclosure Index: 2
Slot Number: 15
SAS Address 1: 5003048001943c57
SAS Address 2: 0
seqNum: 0x00017939
Time: Thu Jun 13 17:40:50 2013
Code: 0x00000119
Class: 0
Locale: 0x02
Event Description: CopyBack automatically started on PD 31(e0x37/s15) from PD 36(e0x37/s20)
Event Data:
===========
None
seqNum: 0x0001793a
Time: Thu Jun 13 17:40:50 2013
Code: 0x00000072
Class: 0
Locale: 0x02
Event Description: State change on PD 31(e0x37/s15) from UNCONFIGURED_GOOD(0) to COPYBACK(20)
Event Data:
===========
Device ID: 49
Enclosure Index: 55
Slot Number: 15
Previous state: 0
New state: 32
seqNum: 0x0001793b
seqNum: 0x00017926
Finished copy back ->
Time: Thu Jun 13 21:51:06 2013
Code: 0x00000116
Class: 0
Locale: 0x02
Event Description: CopyBack complete on PD 31(e0x37/s15) from PD 36(e0x37/s20)
Event Data:
===========
None
seqNum: 0x000179f9
Time: Thu Jun 13 21:51:06 2013
Code: 0x00000072
Class: 0
Locale: 0x02
Event Description: State change on PD 31(e0x37/s15) from COPYBACK(20) to ONLINE(18)
Event Data:
===========
Device ID: 49
Enclosure Index: 55
Slot Number: 15
Previous state: 32
New state: 24
seqNum: 0x000179fa
Time: Thu Jun 13 21:51:06 2013
Code: 0x00000087
Class: 0
Locale: 0x42
Event Description: Global Hot Spare created on PD 36(e0x37/s20) (global,rev)
Event Data:
===========
Device ID: 54
Enclosure Index: 55
Slot Number: 20
Spare Type: Revertible
Arrays Dedicated to:
seqNum: 0x000179fb
Time: Thu Jun 13 21:51:06 2013
Code: 0x00000072
Class: 0
Locale: 0x02
Event Description: State change on PD 36(e0x37/s20) from ONLINE(18) to HOT SPARE(2)
Event Data:
===========
Device ID: 54
Enclosure Index: 55
Slot Number: 20
Previous state: 24
New state: 2
seqNum: 0x000179fc
...
Time: Thu Jun 13 22:24:28 2013
Code: 0x0000014b
Class: 0
Locale: 0x02
Event Description: Power state change on PD 36(e0x37/s20) from ON(0) to POWERSAVE(1)
Event Data:
===========
None
seqNum: 0x00017a09
-- GordBoerke - 26 Feb 2013