Back to Contents Page

Troubleshooting

Dell™ Remote Access Controller Installation and Setup Guide

  Basic Troubleshooting

  Using the RAC Trace Log

  Troubleshooting Network Problems

  RAC Log Messages

  DRAC III LED Indicators


The purpose of this section is to help you diagnose and solve problems that may occur with your RAC.


Basic Troubleshooting

This section provides solutions to common problems.

Problem

Console redirection does not work.

You may see the following message on the bottom menu of the console redirect frame:

Please wait - initial screen loading.

Solution

If you perform a hard reset on the RAC (using the racadm racreset command), the RAC driver cannot communicate with the RAC controller until the system is rebooted. Therefore, the hard reset should be reserved for extreme situations (for instance, a system lockup). Before using a hard reset, you should first try using the soft reset (using the soft reset function of the Web-based interface or the racadm racreset soft command), which does not terminate communication between the RAC driver and the RAC controller.

NOTE: Both hard and soft resets terminate all user sessions. Subsequently, issuing any kind of a reset causes all user interfaces to fail (for instance, when redirecting the console through the Web-based interface). After issuing a hard or soft reset, you must first log out and wait until the RAC is back online before logging on again.

Problem

The console redirect frame shows Please wait - initial screen loading, and seems to hang in this mode when the managed system is up, and VNC and PPP are running on the managed system.

Solution

No connection may exist between the managed system and the firmware through a PPP connection. Rebooting the managed system may correct this problem. It is not necessary to reboot the RAC.

Problem

Console redirection fails to show the operating system boot menu in the Chinese, Japanese, and Korean versions of Microsoft® Windows® 2000.

Solution

To correct this problem, on systems running Windows 2000 that can boot to multiple operating systems, you can change the default boot operating system by performing the following steps:

  1. Right-click the My Computer icon and select Properties.

  2. Click the Advanced tab.

  3. Click Startup and Recovery.

  4. Select the new default operating system from the Startup list.

  5. In the Show list for box, type the number of seconds that the list of choices should be displayed before the default operating system automatically boots.

Problem

The redirected console screen does not refresh.

Solution

Click Refresh in the console redirection window.

Problem

The following message is displayed: Warning: remote console is not available.

Solution

This warning indicates one of the following conditions:

After waiting a few minutes for the system to restart or the screen to change modes, try restarting the console redirect window to correct the problem.

Problem (DRAC III only)

The IPMI interfaces do not provide the correct information.

Solution

On Dell™ PowerEdge™ 1650 systems, the DRAC III is installed on a riser board. The riser board plugs into the RISER connector on the system board and is considered an extension of the system board. There are two riser board configurations for the PowerEdge 1650. The first features two 64-bit, 66-MHz expansion slots. The second features one 64-bit, 66-MHz expansion slot (PCI2) and one 32-bit, 33-MHz expansion slot (PCI1) for 5-V cards. In PowerEdge 1650 systems, if the DRAC III is installed on a riser card equipped with two 64-bit slots, the card operates, but the IPMI interface information provided is incorrect. To correct this problem, ensure that the DRAC III is installed on a 5-V riser card equipped with one 32-bit slot. For more information, see "Installing the DRAC III Hardware."

Problem (DRAC III and DRAC III/XT only)

Text redirection is not occurring when using console redirection.

This situation could occur if the DRAC III or DRAC III/XT is not installed on the primary PCI bus. To ensure that the DRAC III is installed in the correct PCI slot, see "Installing the DRAC III Hardware." To ensure that the DRAC III/XT is installed in the correct PCI slot, see "Installing the DRAC III/XT Hardware."

Solution

Ensure that the DRAC III or DRAC III/XT is installed on the primary PCI bus. For more information, see "Installing the DRAC III Hardware" or "Installing the DRAC III/XT Hardware."

Problem

Cannot connect to the remote access interface and the DNS sends back the IP address of the RAC instead of the managed system.

Solution

Due to functional details that are specific to Windows Dynamic DNS servers, the RAC internal PPP IP address is broadcast to the Dynamic DNS service running on Windows 2000 systems. The Dynamic DNS service stores that IP address in its DNS lookup table and associates it with the name of the managed system hosting the RAC. This action causes problems with Active Directory under Windows. The default value for a RAC's internal PPP IP address is 192.168.234.235, and it is user configurable. This issue has been addressed by Microsoft with a hot fix and Microsoft KnowledgeBase Q article (Q292822). To solve this problem, download the hot fix and perform the steps in the Q article.

Problem (DRAC III only)

Cannot connect to or ping a DRAC III from the management station after the dial-out properties have been set.

Solution

To access the management station through two distinct paths from the DRAC III, the DRAC III must have a host-based demand-dial route that does not conflict with the network-based LAN route.

  1. Configure dial-up networking on the management station to assign static IP addresses for dial-in purposes. This configuration requires two addresses: one for the management station and one for the DRAC III.

NOTE: It is important that the static IP addresses used by the management station be on a different subnet from the DRAC III network interface controller (NIC). Otherwise, a routing loop is created.

Typically, the numerically lower address is assigned to the management station and the numerically larger address is assigned to the DRAC III when the dial-in connection is completed.

  1. Configure the static IP address assigned to the management station as the demand- dial destination IP address on the DRAC III; configure this identical address as an SNMP trap destination (for SNMP trap alerts) or as the SMTP server address (for e- mail alerts) on the DRAC III.

NOTE: You can use DHCP to configure the IP addresses, but you must still ensure that both addresses used by the management station for dial-in are on a different subnet from the DRAC III NIC.

The management station is now able to receive alerts from the DRAC III through both the LAN and the dial-in connection.

Problem

Graphics redirection is not occurring when using console redirection.

This situation could occur if the RAC services are not installed properly or are not running.

Solution

After allowing several minutes for the graphics redirection to occur, ensure that the RAC services are running. Try stopping and then starting the services. If the problem persists, reboot the system.


Using the RAC Trace Log

The internal RAC Trace Log can be used by administrators needing to debug alerting, paging, or networking from the RAC. The Trace Log can be accessed from the RAC Web-based remote access interface by clicking the Debug tab, and then clicking Network Debug. From the Network Debug window, select Dump Trace Log, and then click Submit. The Trace Log tracks the following information:

NOTE: Settings for CHAT, DHCP, IP, PPP, and TAP (DRAC III only) can be accessed from the RAC remote access GUI by clicking the Debug tab, and then clicking Trace Level.
NOTE: In the RAC Trace Log, nonprintable ASCII characters are translated to printable ASCII characters. If the character code is less than 0x20, or between 0x7f and 0xa0 (inclusive), the value 0x40 is exclusive-or'd with the character before printing, after a "^" is added to the beginning. Thus, the ASCII carriage return character, 0xd, is printed as "^M" in the Trace Log. Nonprintable ASCII characters may occur during tracing of the CHAT and TAP protocols, and occasionally during PPP negotiations.

Some paging services return a busy signal when a paging request is successfully accepted. This cannot be distinguished from the case where the line is busy, and the paging service never answered. Therefore, even though the chat script expects BUSY, this is indicated as a failure on the trace log.

A chat script time-out is considered a success indication for numeric paging, because no other error indications were detected. Since numeric paging services do not have a positive confirmation indication that can be detected by the modem, numeric paging is inherently unreliable. For this reason, up to three numeric paging attempts are made, and duplicate numeric pages may be received.


Troubleshooting Network Problems

The RAC provides a standard set of network diagnostic tools, similar to those found on Windows or Red Hat Linux-based systems. Using the RAC Web-based remote access interface, you can access the following network debugging tools by clicking the Debug tab and then clicking Network Debug. For more information about the Network Debug feature, see the remote access interface help.

The trace log may also contain RAC operating-system specific error codes (relating to the internal RAC operating system, not the managed system's operating system). Table B-1 can help you diagnose network problems reported by the internal RAC operating system:

Table B-1. Trace Log Codes

Error Code

Description

0x5006

ENXIO: No such address.

0x5009

EBADS: The socket descriptor is invalid.

0x500D

EACCESS: Permission denied.

0x5011

EEXIST: Duplicate entry exists.

0x5016

EINVALID: An argument is invalid.

0x5017

ENFILE: An internal table has run out of space.

0x5020

EPIPE: The connection is broken.

0x5023

EWOULDBLOCK: The operation would block; socket is nonblocking.

0x5024

EINPROGRESS: Socket is nonblocking; connection not completed immediately.

0x5025

EALREADY: Socket is nonblocking; previous connection attempt not complete.

0x5027

EDESTADDRREQ: The destination address is invalid.

0x5028

EMSGSIZE: Message too long.

0x5029

EPROTOTYPE: Wrong protocol type for socket.

0x502A

ENOPROTOOPT: Protocol not available.

0x502B

EPROTONO SUPPORT: Protocol not supported.

0x502D

EOPNOTSUPP: Requested operation not valid for this type of socket.

0x502F

EAFNOSUPPORT: Address family not support.

0x5030

EADDRINUSE: Address is already in use.

0x5031

EADDRNOTAVAIL: Address not available.

0x5033

ENETUNREACH: Network is unreachable.

0x5035

ECONNABORTED: The connection has been aborted by the peer.

0x5036

ECONNRESET: The connection has been reset by the peer.

0x5037

ENOBUFS: An internal buffer is required but cannot be allocated.

0x5038

EISCONN: The socket is already connected.

0x5039

ENOTCONN: The socket is not connected.

0x503B

ETOOMANYREFS: Too many references, cannot splice.

0x503C

ETIMEDOUT: Connection timed out.

0x503D

ECONNREFUSED: The connection attempt was refused.

0x5041

EHOSTUNREACH: The destination host could not be reached.

0x5046

ENIDOWN: NI_INIT returned -1.

0x5047

ENMTU: The MTU is invalid.

0x5048

ENHWL: The hardware length is invalid.

0x5049

ENNOFIND: The route specified cannot be found.

0x504A

ECOLL: Collision in select call; these conditions already selected by another task.

0x504B

ETID: The task ID is invalid.

Troubleshooting Alerting Problems

Use the following information to troubleshoot a particular type of RAC alert:


RAC Log Messages

RAC Log messages can be used by administrators to debug alerting from the RAC. Table B-2 provides a list of RAC log message IDs, message and description, as well as corrective actions to take for a particular message.

NOTE: In Table B-2, the character "L" is sometimes used in the Message ID column. "L" represents the severity level or type of the message, which can be one of the following: W (warning), E (error), S (severe), F (fatal), or A (always).

Table B-2. RAC Log Messages

 

Message ID

Description

Corrective Action

RAC186L

dhcp: no response from server, need LAN address. The NIC cannot be enabled until a response is received from the DHCP server.

Provides information only. No specific corrective action is indicated. Ensure that the DHCP server is operational.

RAC187L

dhcp: no response from server, using default PPP addresses

Provides information only. No specific corrective action is indicated. Ensure that the DCHP server is operational.

RAC188L

dhcp: no response from server, warm starting with <IP address>

Provides information only. No specific corrective action is indicated. Ensure that the DHCP server is operational.

RAC189L

snmp: trap sent to <IP address>

Provides information only. No corrective action is necessary.

RAC191L

snmp: internal failure during trap generation

Reset the RAC and retry the operation.

RAC192L

numeric page successful

Provides information only. No corrective action is necessary.

RAC193L

numeric paging attempts failed

Ensure that the telephone number is correct and that the paging service is operational.

RAC194L

numeric paging encountered an internal error

Reset the DRAC III and retry the operation.

RAC195L

alphanumeric page successful

Provides information only. No corrective action is necessary.

RAC196L

alphanumeric paging attempts failed

Ensure that the phone number, pager ID, and password are correct. Also, ensure that Paging Central is operational.

RAC197L

alphanumeric paging encountered an internal error

Reset the DRAC III and retry the operation.

RAC198L

E-mail page successful

Provides information only. No corrective action is necessary.

RAC199L

E-mail paging attempts failed, SMTP protocol failure

A trace of the SMTP connection may be found in the trace log. Examine the trace log to identify the source of the protocol failure, such as the connection could not be established (SMTP server is down or an invalid IP address), an invalid e-mail destination address, an invalid domain in the e-mail address, or the SMTP server does not support forwarding e-mail. Correct the problem and try again.

RAC200L

E-mail paging encountered an internal error

Reset the RAC and retry the operation.

RAC201L

trap paging filter passed, entry <number>

user paging filter passed

Provides information only. No corrective action necessary.

RAC253L

PAP peer authentication succeeded for <user>

CHAP peer authentication succeeded for <user>

Provides information only. No corrective action is necessary.

RAC254L

PAP peer authentication failed for <user>

CHAP peer authentication failed for <user>

Verify that the dial-in or demand dial-out entry remote user name and password are correct. This user name and password are used for the PPP connection only, and are not an administrator log in user name and password.

RAC256L

RAC hardware log event: <formatted hardware log event>

Provides information only. No corrective action is necessary, unless the contents of the hardware log indicate a problem. In this case, the corrective action is based on the problem reported; for example, battery voltage low indicates that the battery may need replacing.

RAC016A

RAC log cleared

Provides information only.

RAC030A

RAC time was set

Provides information only.

RAC048A

RAC firmware update was initiated.

Provides information only.

RAC049A

RAC Firmware Update was initiated with config to defaults option.

Provides information only.

RAC064A

clear crash screen

Provides information only.

RAC065A

RAC hard reset, delay <seconds> was initiated

Provides information only.

RAC066A

RAC soft reset, delay <seconds> was initiated

Provides information only.

RAC067A

RAC graceful reset, delay <seconds> was initiated

Provides information only.

RAC068A

RAC cfg2default reset, delay <seconds> was initiated

Provides information only.

RAC069A

RAC shutdown was initiated

Provides information only.

RAC114A

Requested server {powerdown|powerup|
powercycle|hardreset|
graceshutdown|gracepowercycle|
gracereboot}

Provides information only.

RAC115A

Could not log graceful server action to hardware log

Provides information only.

RAC122A

RAC booted

Provides information only.

RAC138A

Console redirect session enabled

Provides information only.

RAC139A

Console redirect session disabled

Provides information only.

RAC154A

Logout from <IP-address>

Provides information only.

RAC155A

Login from <IP-address>

Provides information only.

RAC156A

session cancelled from <IP-address>, max log in attempts exceeded.

Provides information only.

RAC157A

Session cancelled from <IP-address>, due to inactivity.

Provides information only.

RAC158A

Unvalidated session from <IP-address> cancelled.

Provides information only.

RAC175A

vt-100: log in {successful|authentication failed}

Provides information only.

RAC176A

vt-100: log out

Provides information only.

RAC240A

RAC shutdown through hwmon

Provides information only.

RAC241A

RAC shutdown due to battery runtime limit expired

Provides information only.

RAC242A

RAC shutdown due to voltage below threshold

Provides information only.

RAC243A

RAC shutdown due to non-PCI slot presence

Provides information only.


DRAC III LED Indicators

The DRAC III has two LEDs located on the back of the card connector. The top LED is green, and is called the heartbeat LED. The amber LED is below the green, and is called the error LED.

The following are conditions indicated by the DRAC III LEDs:

Nonrecoverable POST Error

If the amber LED is solid, it indicates a nonrecoverable error. A nonrecoverable error occurs when a POST memory test or core operation has failed, and the DRAC III cannot proceed with a boot process. The DRAC III must be replaced.

Summary for this condition:

Repair Mode

If the amber LED is flashing at .5-second intervals, it indicates that the core, firmware, database, or production sector in the DRAC III flash is corrupted. A field technician must replace the DRAC III.

Summary for this condition:

Self-Test Error Blink Codes

The following sections define the blink codes that are produced by the amber error LED if an error is detected by any of the self-tests or extended self-tests.

The blink code repeats about every 10 seconds. For example, a code of 3114 (a problem in the uart loopback test) causes the amber LED to flash three times, pause, flash one time, pause, flash one time, pause, flash four times. The sequence then repeats after 10 seconds.

Internal DRAC III Operating System Problems

1111 = Unable to create a self-test task.

1112 = A self-test task is currently running. (Multiple self-tests cannot be started.)

1113 = Failure creating a self-test visual signal.

1114 = Failure to allocate required DRAC III system memory.

1115 = Failure writing the D_selftest_BDSTATUS.

1116 = Failure attempting to send a debug message.

1117 = Error when accessing the DRAC III database.

Memory Test Problems

2111 = Failure in extended memory testing — Read verify, write.

2112 = Failure in extended memory testing — Read verify write high memory to low.

2113 = Failure in extended memory testing — Read verify, write, write, low-to-high.

2114 = Failure in extended memory testing — Read verify, write, write, high-to-low.

2115 = Failure in extended memory testing — Read verify, high-to-low.

2116 = Failure in extended memory testing — Read verify, low-to-high.

2117 = Failure in marching memory test — Read verify, write.

2118 = Failure in marching memory test — Read verify in low-to-high memory.

VT-100 Uart Loopback Test

3111 = Failure opening uart for external loopback.

3112 = Failure in I/O control to uart driver.

3113 = Failure writing data to the uart.

3114 = Failure reading data from the uart.

3115 = Transmit/receive data miscompare.

3116 = Failure trying to suspend VT-100 task.

GPIO Test

4111 = Failure in the GPIO green LED test.

4112 = Failure in the GPIO LED test.

4113 = SMI connector GPIOs not reading inactive values.

On-Board Hardware Monitor

5111 = More than one power source is selected (internal DRAC III problem).

5112 = No power source is shown to be driving (internal DRAC III problem).

5113 = Failure in the onboard hardware monitor sensors/logic. (The managed system must be powered up or the PCI voltage tests fail.)

5114 = Failure accessing data base for hardware monitor parameters.

5115 = Failure in accessing the onboard hardware monitor.

5121 = DRAC III battery voltage is out of range.

5122 = DRAC III external power adapter voltage is out of range.

5123 = PCI AUX 3.3 voltage is out of range.

5124 = PCI +5 voltage is out of range.

5125 = PCI -12 voltage is out of range.

5126 = PCI +12 voltage is out of range.

5127 = DRAC III temperature monitor is out of range.

5128 = DRAC III battery presence is not detected.

5129 = DRAC III external power adapter presence is not detected.

IPMI Tests

6111 = No IPMI connector is detected.

6112 = IPMI Get Chassis Status command to the BMC failed.

EXPROM Tests

7111 = Failure when loading the EXPROM image from the database into shared memory.

7112 = Failure when loading the EXPROM header from the database into shared memory.

7113 = Invalid EXPROM header signature.

7114 = Invalid EXPROM vendor or device ID.

Flash Test

8111 = Failure erasing U16 (Firmware) diagnostic sector.

8112 = Failure writing U16 (Firmware) diagnostic sector.

8113 = Failure read/verify U16 (Firmware) diagnostic sector.

8114 = Failure erasing U17 (DataBase) diagnostic sector.

8115 = Failure writing U17 (DataBase) diagnostic sector.

8116 = Failure writing U17 (DataBase) diagnostic sector.

PCMCIA Tests

9111 = Failure in PCMCIA to DRAC III interface.


Back to Contents Page