Troubleshooting

Dell™ Remote Access Controller Installation and Setup Guide

Basic Troubleshooting

Using the RAC Trace Log

Troubleshooting Network Problems

RAC Log Messages

DRAC III LED Indicators

The purpose of this section is to help you diagnose and solve problems that may occur with your RAC.

Basic Troubleshooting

This section provides solutions to common problems.

Problem

Console redirection does not work.

You may see the following message on the bottom menu of the console redirect frame:

Please wait - initial screen loading.

If you perform a hard reset on the RAC (using the racadm racreset command), the RAC driver cannot communicate with the RAC controller until the system is rebooted. Therefore, the hard reset should be reserved for extreme situations (for instance, a system lockup). Before using a hard reset, you should first try using the soft reset (using the soft reset function of the Web-based interface or the racadm racreset soft command), which does not terminate communication between the RAC driver and the RAC controller.

NOTE: Both hard and soft resets terminate all user sessions. Subsequently, issuing any kind of a reset causes all user interfaces to fail (for instance, when redirecting the console through the Web-based interface). After issuing a hard or soft reset, you must first log out and wait until the RAC is back online before logging on again.

Problem

The console redirect frame shows Please wait - initial screen loading, and seems to hang in this mode when the managed system is up, and VNC and PPP are running on the managed system.

Solution

No connection may exist between the managed system and the firmware through a PPP connection. Rebooting the managed system may correct this problem. It is not necessary to reboot the RAC.

Problem

Console redirection fails to show the operating system boot menu in the Chinese, Japanese, and Korean versions of Microsoft® Windows® 2000.

Solution

To correct this problem, on systems running Windows 2000 that can boot to multiple operating systems, you can change the default boot operating system by performing the following steps:

Right-click the My Computer icon and select Properties.
Click the Advanced tab.
Click Startup and Recovery.
Select the new default operating system from the Startup list.
In the Show list for box, type the number of seconds that the list of choices should be displayed before the default operating system automatically boots.

Problem

The redirected console screen does not refresh.

Solution

Click Refresh in the console redirection window.

Problem

The following message is displayed: Warning: remote console is not available.

Solution

This warning indicates one of the following conditions:

The managed system is rebooting.
The managed system screen is switching between text and graphic modes.
Communication between the browser and the RAC has failed.

After waiting a few minutes for the system to restart or the screen to change modes, try restarting the console redirect window to correct the problem.

Problem (DRAC III only)

The IPMI interfaces do not provide the correct information.

Solution

On Dell™ PowerEdge™ 1650 systems, the DRAC III is installed on a riser board. The riser board plugs into the RISER connector on the system board and is considered an extension of the system board. There are two riser board configurations for the PowerEdge 1650. The first features two 64-bit, 66-MHz expansion slots. The second features one 64-bit, 66-MHz expansion slot (PCI2) and one 32-bit, 33-MHz expansion slot (PCI1) for 5-V cards. In PowerEdge 1650 systems, if the DRAC III is installed on a riser card equipped with two 64-bit slots, the card operates, but the IPMI interface information provided is incorrect. To correct this problem, ensure that the DRAC III is installed on a 5-V riser card equipped with one 32-bit slot. For more information, see "Installing the DRAC III Hardware."

Problem (DRAC III and DRAC III/XT only)

Text redirection is not occurring when using console redirection.

This situation could occur if the DRAC III or DRAC III/XT is not installed on the primary PCI bus. To ensure that the DRAC III is installed in the correct PCI slot, see "Installing the DRAC III Hardware." To ensure that the DRAC III/XT is installed in the correct PCI slot, see "Installing the DRAC III/XT Hardware."

Solution

Ensure that the DRAC III or DRAC III/XT is installed on the primary PCI bus. For more information, see "Installing the DRAC III Hardware" or "Installing the DRAC III/XT Hardware."

Problem

Cannot connect to the remote access interface and the DNS sends back the IP address of the RAC instead of the managed system.

Solution

Due to functional details that are specific to Windows Dynamic DNS servers, the RAC internal PPP IP address is broadcast to the Dynamic DNS service running on Windows 2000 systems. The Dynamic DNS service stores that IP address in its DNS lookup table and associates it with the name of the managed system hosting the RAC. This action causes problems with Active Directory under Windows. The default value for a RAC's internal PPP IP address is 192.168.234.235, and it is user configurable. This issue has been addressed by Microsoft with a hot fix and Microsoft KnowledgeBase Q article (Q292822). To solve this problem, download the hot fix and perform the steps in the Q article.

Problem (DRAC III only)

Cannot connect to or ping a DRAC III from the management station after the dial-out properties have been set.

Solution

To access the management station through two distinct paths from the DRAC III, the DRAC III must have a host-based demand-dial route that does not conflict with the network-based LAN route.

Configure dial-up networking on the management station to assign static IP addresses for dial-in purposes. This configuration requires two addresses: one for the management station and one for the DRAC III.

NOTE: It is important that the static IP addresses used by the management station be on a different subnet from the DRAC III network interface controller (NIC). Otherwise, a routing loop is created.

Typically, the numerically lower address is assigned to the management station and the numerically larger address is assigned to the DRAC III when the dial-in connection is completed.

Configure the static IP address assigned to the management station as the demand- dial destination IP address on the DRAC III; configure this identical address as an SNMP trap destination (for SNMP trap alerts) or as the SMTP server address (for e- mail alerts) on the DRAC III.

NOTE: You can use DHCP to configure the IP addresses, but you must still ensure that both addresses used by the management station for dial-in are on a different subnet from the DRAC III NIC.

The management station is now able to receive alerts from the DRAC III through both the LAN and the dial-in connection.

Problem

Graphics redirection is not occurring when using console redirection.

This situation could occur if the RAC services are not installed properly or are not running.

Solution

After allowing several minutes for the graphics redirection to occur, ensure that the RAC services are running. Try stopping and then starting the services. If the problem persists, reboot the system.

Using the RAC Trace Log

The internal RAC Trace Log can be used by administrators needing to debug alerting, paging, or networking from the RAC. The Trace Log can be accessed from the RAC Web-based remote access interface by clicking the Debug tab, and then clicking Network Debug. From the Network Debug window, select Dump Trace Log, and then click Submit. The Trace Log tracks the following information:

CHAT – Traces modem interactions similar to those found on Red Hat Linux systems. The CHAT protocol includes expect and send character sequences, where certain responses from the modem are expected, and commands are sent to the modem.
DHCP – Traces packets sent to and received from a DHCP server.
IP – Traces only IP packets transmitted through PPP links, not packets transmitted through the NIC.
PPP – Traces negotiation packets.
TAP – Traces TAP interactions used with alphanumeric paging.

NOTE: Settings for CHAT, DHCP, IP, PPP, and TAP (DRAC III only) can be accessed from the RAC remote access GUI by clicking the Debug tab, and then clicking Trace Level.

NOTE: In the RAC Trace Log, nonprintable ASCII characters are translated to printable ASCII characters. If the character code is less than 0x20, or between 0x7f and 0xa0 (inclusive), the value 0x40 is exclusive-or'd with the character before printing, after a "^" is added to the beginning. Thus, the ASCII carriage return character, 0xd, is printed as "^M" in the Trace Log. Nonprintable ASCII characters may occur during tracing of the CHAT and TAP protocols, and occasionally during PPP negotiations.

Some paging services return a busy signal when a paging request is successfully accepted. This cannot be distinguished from the case where the line is busy, and the paging service never answered. Therefore, even though the chat script expects BUSY, this is indicated as a failure on the trace log.

A chat script time-out is considered a success indication for numeric paging, because no other error indications were detected. Since numeric paging services do not have a positive confirmation indication that can be detected by the modem, numeric paging is inherently unreliable. For this reason, up to three numeric paging attempts are made, and duplicate numeric pages may be received.

Troubleshooting Network Problems

The RAC provides a standard set of network diagnostic tools, similar to those found on Windows or Red Hat Linux-based systems. Using the RAC Web-based remote access interface, you can access the following network debugging tools by clicking the Debug tab and then clicking Network Debug. For more information about the Network Debug feature, see the remote access interface help.

The trace log may also contain RAC operating-system specific error codes (relating to the internal RAC operating system, not the managed system's operating system). Table B-1 can help you diagnose network problems reported by the internal RAC operating system:

Table B-1. Trace Log Codes

Error Code	Description
0x5006	ENXIO: No such address.
0x5009	EBADS: The socket descriptor is invalid.
0x500D	EACCESS: Permission denied.
0x5011	EEXIST: Duplicate entry exists.
0x5016	EINVALID: An argument is invalid.
0x5017	ENFILE: An internal table has run out of space.
0x5020	EPIPE: The connection is broken.
0x5023	EWOULDBLOCK: The operation would block; socket is nonblocking.
0x5024	EINPROGRESS: Socket is nonblocking; connection not completed immediately.
0x5025	EALREADY: Socket is nonblocking; previous connection attempt not complete.
0x5027	EDESTADDRREQ: The destination address is invalid.
0x5028	EMSGSIZE: Message too long.
0x5029	EPROTOTYPE: Wrong protocol type for socket.
0x502A	ENOPROTOOPT: Protocol not available.
0x502B	EPROTONO SUPPORT: Protocol not supported.
0x502D	EOPNOTSUPP: Requested operation not valid for this type of socket.
0x502F	EAFNOSUPPORT: Address family not support.
0x5030	EADDRINUSE: Address is already in use.
0x5031	EADDRNOTAVAIL: Address not available.
0x5033	ENETUNREACH: Network is unreachable.
0x5035	ECONNABORTED: The connection has been aborted by the peer.
0x5036	ECONNRESET: The connection has been reset by the peer.
0x5037	ENOBUFS: An internal buffer is required but cannot be allocated.
0x5038	EISCONN: The socket is already connected.
0x5039	ENOTCONN: The socket is not connected.
0x503B	ETOOMANYREFS: Too many references, cannot splice.
0x503C	ETIMEDOUT: Connection timed out.
0x503D	ECONNREFUSED: The connection attempt was refused.
0x5041	EHOSTUNREACH: The destination host could not be reached.
0x5046	ENIDOWN: NI_INIT returned -1.
0x5047	ENMTU: The MTU is invalid.
0x5048	ENHWL: The hardware length is invalid.
0x5049	ENNOFIND: The route specified cannot be found.
0x504A	ECOLL: Collision in select call; these conditions already selected by another task.
0x504B	ETID: The task ID is invalid.

Troubleshooting Alerting Problems

Use the following information to troubleshoot a particular type of RAC alert:

E-mail paging – E-mail paging uses SMTP. To troubleshoot e-mail paging problems, check SMTP entries in the trace log. All interactions with the SMTP server are logged in the Trace Log by default.
SNMP traps – SNMP trap deliveries are logged in the Trace Log by default. However, since SNMP does not confirm delivery of traps, it is best to trace the packets on the managed system using a network analyzer or a tool such as Microsoft's snmputil.
Numeric paging – (DRAC III only) Numeric paging uses the CHAT protocol. To troubleshoot numeric paging problems, check CHAT entries in the trace log. To trace CHAT, first ensure that CHAT is selected under Debug® Trace Log, and then display the trace log to identify CHAT problems.
Alphanumeric paging – (DRAC III only) Alphanumeric paging uses TAP. To troubleshoot alphanumeric paging problems, check TAP entries in the trace log. To trace TAP, first ensure that TAP is selected under Debug® Trace Log, and then display the trace log to identify TAP problems.

RAC Log Messages

RAC Log messages can be used by administrators to debug alerting from the RAC. Table B-2 provides a list of RAC log message IDs, message and description, as well as corrective actions to take for a particular message.

NOTE: In Table B-2, the character "L" is sometimes used in the Message ID column. "L" represents the severity level or type of the message, which can be one of the following: W (warning), E (error), S (severe), F (fatal), or A (always).

Table B-2. RAC Log Messages

Message ID	Description	Corrective Action
RAC186L	dhcp: no response from server, need LAN address. The NIC cannot be enabled until a response is received from the DHCP server.	Provides information only. No specific corrective action is indicated. Ensure that the DHCP server is operational.
RAC187L	dhcp: no response from server, using default PPP addresses	Provides information only. No specific corrective action is indicated. Ensure that the DCHP server is operational.
RAC188L	dhcp: no response from server, warm starting with <IP address>	Provides information only. No specific corrective action is indicated. Ensure that the DHCP server is operational.
RAC189L	snmp: trap sent to <IP address>	Provides information only. No corrective action is necessary.
RAC191L	snmp: internal failure during trap generation	Reset the RAC and retry the operation.
RAC192L	numeric page successful	Provides information only. No corrective action is necessary.
RAC193L	numeric paging attempts failed	Ensure that the telephone number is correct and that the paging service is operational.
RAC194L	numeric paging encountered an internal error	Reset the DRAC III and retry the operation.
RAC195L	alphanumeric page successful	Provides information only. No corrective action is necessary.
RAC196L	alphanumeric paging attempts failed	Ensure that the phone number, pager ID, and password are correct. Also, ensure that Paging Central is operational.
RAC197L	alphanumeric paging encountered an internal error	Reset the DRAC III and retry the operation.
RAC198L	E-mail page successful	Provides information only. No corrective action is necessary.
RAC199L	E-mail paging attempts failed, SMTP protocol failure	A trace of the SMTP connection may be found in the trace log. Examine the trace log to identify the source of the protocol failure, such as the connection could not be established (SMTP server is down or an invalid IP address), an invalid e-mail destination address, an invalid domain in the e-mail address, or the SMTP server does not support forwarding e-mail. Correct the problem and try again.
RAC200L	E-mail paging encountered an internal error	Reset the RAC and retry the operation.
RAC201L	trap paging filter passed, entry <number> user paging filter passed	Provides information only. No corrective action necessary.
RAC253L	PAP peer authentication succeeded for <user> CHAP peer authentication succeeded for <user>	Provides information only. No corrective action is necessary.
RAC254L	PAP peer authentication failed for <user> CHAP peer authentication failed for <user>	Verify that the dial-in or demand dial-out entry remote user name and password are correct. This user name and password are used for the PPP connection only, and are not an administrator log in user name and password.
RAC256L	RAC hardware log event: <formatted hardware log event>	Provides information only. No corrective action is necessary, unless the contents of the hardware log indicate a problem. In this case, the corrective action is based on the problem reported; for example, battery voltage low indicates that the battery may need replacing.
RAC016A	RAC log cleared	Provides information only.
RAC030A	RAC time was set	Provides information only.
RAC048A	RAC firmware update was initiated.	Provides information only.
RAC049A	RAC Firmware Update was initiated with config to defaults option.	Provides information only.
RAC064A	clear crash screen	Provides information only.
RAC065A	RAC hard reset, delay <seconds> was initiated	Provides information only.
RAC066A	RAC soft reset, delay <seconds> was initiated	Provides information only.
RAC067A	RAC graceful reset, delay <seconds> was initiated	Provides information only.
RAC068A	RAC cfg2default reset, delay <seconds> was initiated	Provides information only.
RAC069A	RAC shutdown was initiated	Provides information only.
RAC114A	Requested server {powerdown\|powerup\| powercycle\|hardreset\| graceshutdown\|gracepowercycle\| gracereboot}	Provides information only.
RAC115A	Could not log graceful server action to hardware log	Provides information only.
RAC122A	RAC booted	Provides information only.
RAC138A	Console redirect session enabled	Provides information only.
RAC139A	Console redirect session disabled	Provides information only.
RAC154A	Logout from <IP-address>	Provides information only.
RAC155A	Login from <IP-address>	Provides information only.
RAC156A	session cancelled from <IP-address>, max log in attempts exceeded.	Provides information only.
RAC157A	Session cancelled from <IP-address>, due to inactivity.	Provides information only.
RAC158A	Unvalidated session from <IP-address> cancelled.	Provides information only.
RAC175A	vt-100: log in {successful\|authentication failed}	Provides information only.
RAC176A	vt-100: log out	Provides information only.
RAC240A	RAC shutdown through hwmon	Provides information only.
RAC241A	RAC shutdown due to battery runtime limit expired	Provides information only.
RAC242A	RAC shutdown due to voltage below threshold	Provides information only.
RAC243A	RAC shutdown due to non-PCI slot presence	Provides information only.

DRAC III LED Indicators

The DRAC III has two LEDs located on the back of the card connector. The top LED is green, and is called the heartbeat LED. The amber LED is below the green, and is called the error LED.

The following are conditions indicated by the DRAC III LEDs:

Normal operation – Approximately 10–15 seconds after power up or reset, the two LEDs toggle for about 2 seconds. The flashing LEDs indicate that the DRAC III is running its self-test. A few seconds later, the green LED starts flashing on and off at 1-second intervals. Sometimes the green LED appears to flash sporadically; this situation may occur at start-up and at any other time the DRAC III processor is under a heavy load.
Error condition – The error LED is illuminated when the following conditions occur:

An unrecoverable hardware error – The error LED is steadily illuminated.
A firmware problem – The LED flashes at .5-second intervals.
A self-test error – The error LED flashes according to a blink code. See "Self-Test Error Blink Codes" for a description of these codes.

Nonrecoverable POST Error

If the amber LED is solid, it indicates a nonrecoverable error. A nonrecoverable error occurs when a POST memory test or core operation has failed, and the DRAC III cannot proceed with a boot process. The DRAC III must be replaced.

Summary for this condition:

Amber on
Green off

Repair Mode

If the amber LED is flashing at .5-second intervals, it indicates that the core, firmware, database, or production sector in the DRAC III flash is corrupted. A field technician must replace the DRAC III.

Summary for this condition:

Amber flash at .5-second intervals
Green off

Self-Test Error Blink Codes

The following sections define the blink codes that are produced by the amber error LED if an error is detected by any of the self-tests or extended self-tests.

The blink code repeats about every 10 seconds. For example, a code of 3114 (a problem in the uart loopback test) causes the amber LED to flash three times, pause, flash one time, pause, flash one time, pause, flash four times. The sequence then repeats after 10 seconds.

Internal DRAC III Operating System Problems

1111 = Unable to create a self-test task.

1112 = A self-test task is currently running. (Multiple self-tests cannot be started.)

1113 = Failure creating a self-test visual signal.

1114 = Failure to allocate required DRAC III system memory.

1115 = Failure writing the D_selftest_BDSTATUS.

1116 = Failure attempting to send a debug message.

1117 = Error when accessing the DRAC III database.

Memory Test Problems

2111 = Failure in extended memory testing — Read verify, write.

2112 = Failure in extended memory testing — Read verify write high memory to low.

2113 = Failure in extended memory testing — Read verify, write, write, low-to-high.

2114 = Failure in extended memory testing — Read verify, write, write, high-to-low.

2115 = Failure in extended memory testing — Read verify, high-to-low.

2116 = Failure in extended memory testing — Read verify, low-to-high.

2117 = Failure in marching memory test — Read verify, write.

2118 = Failure in marching memory test — Read verify in low-to-high memory.

VT-100 Uart Loopback Test

3111 = Failure opening uart for external loopback.

3112 = Failure in I/O control to uart driver.

3113 = Failure writing data to the uart.

3114 = Failure reading data from the uart.

3115 = Transmit/receive data miscompare.

3116 = Failure trying to suspend VT-100 task.

GPIO Test

4111 = Failure in the GPIO green LED test.

4112 = Failure in the GPIO LED test.

4113 = SMI connector GPIOs not reading inactive values.

On-Board Hardware Monitor

5111 = More than one power source is selected (internal DRAC III problem).

5112 = No power source is shown to be driving (internal DRAC III problem).

5113 = Failure in the onboard hardware monitor sensors/logic. (The managed system must be powered up or the PCI voltage tests fail.)

5114 = Failure accessing data base for hardware monitor parameters.

5115 = Failure in accessing the onboard hardware monitor.

5121 = DRAC III battery voltage is out of range.

5122 = DRAC III external power adapter voltage is out of range.

5123 = PCI AUX 3.3 voltage is out of range.

5124 = PCI +5 voltage is out of range.

5125 = PCI -12 voltage is out of range.

5126 = PCI +12 voltage is out of range.

5127 = DRAC III temperature monitor is out of range.

5128 = DRAC III battery presence is not detected.

5129 = DRAC III external power adapter presence is not detected.

IPMI Tests

6111 = No IPMI connector is detected.

6112 = IPMI Get Chassis Status command to the BMC failed.

EXPROM Tests

7111 = Failure when loading the EXPROM image from the database into shared memory.

7112 = Failure when loading the EXPROM header from the database into shared memory.

7113 = Invalid EXPROM header signature.

7114 = Invalid EXPROM vendor or device ID.

Flash Test

8111 = Failure erasing U16 (Firmware) diagnostic sector.

8112 = Failure writing U16 (Firmware) diagnostic sector.

8113 = Failure read/verify U16 (Firmware) diagnostic sector.

8114 = Failure erasing U17 (DataBase) diagnostic sector.

8115 = Failure writing U17 (DataBase) diagnostic sector.

8116 = Failure writing U17 (DataBase) diagnostic sector.

PCMCIA Tests

9111 = Failure in PCMCIA to DRAC III interface.

Back to Contents Page