ProLiant Servers (ML,DL,SL)
1752282 Members
5007 Online
108786 Solutions
New Discussion юеВ

Re: proliant DL585 G7 Opteron 6238, vSphere 4.1 U3 die with PSOD PF 14

 
chr3
Occasional Contributor

proliant DL585 G7 Opteron 6238, vSphere 4.1 U3 die with PSOD PF 14

Hello,

we lost 2 servers last week and are waiting for a solution from HP/VMWare, there seem to be a lot of PSODs coming around.

 

cheers

chr3kdrs

3 REPLIES 3
xmate
Valued Contributor

Re: proliant DL585 G7 Opteron 6238, vSphere 4.1 U3 die with PSOD PF 14

Hi

 

When new worlds are created, an integer is incremented. When this integer overflows, the kernel panics and fails with a purple diagnostic screen. In ESXi, new worlds are created for all processes because there is no service console operating system. Therefore, this number increments much faster in ESXi than ESX. In ESXi, this issue occurs when the system has been running for a very long time (over a year) without a reboot and has been actively creating processes. In ESX classic, it is almost impossible to hit this threshold.

 

It seems to be strange. VmWare is aware of this problem and published that it is solved in ESX/ESXi 4.1 U3 (the same as you have).
They also write that the issue does not affect ESXi 5.1. (May be it's to update?)

 

Anyway you can try this steps:

Rebooting the host restarts the counter and eliminates the risk of any failure.

 To avoid a reboot, you can check to determine if your system is at risk.
 To determine if your system is at risk, run this script in the ESXi command line:
highWID=$(vsish -e ls world | sed 's!/$!!' | sort -n | tail -n 1)
let microFull=highWID/7400
echo ${microFull}

 If this script returns a value close to 100,000 it is recommended to schedule a reboot.

 

It's not a good solution, but it's better than nothing.

 

More info here.

Was the post useful? Click on the white KUDOS! Star.
chr3
Occasional Contributor

Re: proliant DL585 G7 Opteron 6238, vSphere 4.1 U3 die with PSOD PF 14

Hello xmate,

thank you for your response. I noticed I was not accurate: the vsphere version is ESX.

 

The other servers with this hardware configuration gave me a zero back from your script.

 

The two failed servers had their BIOS updated to the newest version from a HP tecnician and

started to crash with the PSOD.

 

regards

 

chr3

chr3
Occasional Contributor