Alpha Servers
Rob Urban

I'm trying to configure an ES40 (not sure if model 1 or 2) for use in a TruCluster via the serial console. Unfortunately access to hardware is problematic (15-minute walk, request access-card for machine-room).  The following happens:

I power on the machine, it successuflly goes through initialization, and finally arrives at SRM prompt "P00>>>". I can type commands etc, all looks OK.  A while later I go back to that terminal and there is no response.  Because of access problems, I don't know at this point what the system display is saying.  I can get to the RMC console and that works fine. "status" and "env" both produce results that one would expect for a healthy system, with the possible exception that the last alert was "AC loss", as you can see below. Also "show power" at SRM console shows no problems. SInce I don't yet have Tru64 running on this machine, I don't know if only the console is failing, or it the console is just a symtom...


I upgraded the Firmware to 7.2-1.


if I use the RMC function "reset" to re-init the system, the SRC console "wakes up" again for a while, until the next time it decides to die.  The timing seems to be unpredictable.


A RMC "halt in", "halt out" does not help.  I have cleared the last RMC alert and am waiting for the next console death.


Does anyone have an idea what the cause of this problem might be?




------------------------------- RMC env and status ------------------------------


On-Chip Firmware Revision: V1.0
Flash Firmware Revision: V2.7
Server Power: ON
System Halt: Deasserted
RMC Power Control: ON
Escape Sequence: ^[^[RMC
Remote Access: Disabled
RMC Password: not set
Alert Enable: Disabled
Alert Pending: YES
Init String:
Dial String:
Alert String:
Com1_mode: THROUGH
Last Alert: AC loss
Logout Timer: 20 minutes
User String:


       System Hardware Monitor

Temperature (warnings at 45.0 C, power-off at 50.0 C)
    CPU0: 24.0 C    CPU1: 24.0 C    CPU2: 27.0 C    CPU3: 22.0 C
    Zone0: 27.0 C    Zone1: 29.0 C    Zone2: 28.0 C
    Fan1: 2205    Fan2: 2205    Fan3: 2235
    Fan4: 2149    Fan5: OFF     Fan6: 2020
Power Supply(OK, FAIL, OFF, '----' means not present)
    PS0 : OK      PS1 : OK      PS2 : ----
    CPU0: OK      CPU1: OK      CPU2: OK      CPU3: OK 
CPU CORE voltage
    CPU0: +1.600V    CPU1: +1.600V    CPU2: +1.616V    CPU3: +1.616V
CPU IO voltage
    CPU0: +1.488V    CPU1: +1.488V    CPU2: +1.504V    CPU3: +1.504V
Bulk voltage
    +3.3V Bulk: +3.328V    +5V Bulk: +5.103V   +12V Bulk: +12.160V
         Vterm: +1.776V       Cterm: +2.000V   -12V Bulk: -12.288V


-------------------------- show power from SRM --------------------------------

P00-top>>>show power
  Power Supply 0         Good
  Power Supply 1         Good
  Power Supply 2         Not Available
  System Fan 1           Good
  System Fan 2           Good
  System Fan 3           Good
  System Fan 4           Good
  System Fan 5           Good
  System Fan 6           Good
  CPU 0 Temperature      Good
  CPU 1 Temperature      Good
  CPU 2 Temperature      Good
  CPU 3 Temperature      Good
  Zone 0 Temperature     Good
  Zone 1 Temperature     Good
  Zone 2 Temperature     Good

...and some more information: the console is just a symptom. The machine is dying, and will be swapped for another ES40.