General
cancel
Showing results for 
Search instead for 
Did you mean: 

Redhat VMWARE Guest hangs or is powered off.

Robert Walker_8
Valued Contributor

Redhat VMWARE Guest hangs or is powered off.

Hi,

We have a small setup of two RHEL V4 virtual machines running VMWARE Server (advised to upgrade from GSX by VMWARE) on a Proliant DL380 (no VT hardware assistance).

On occasions one RHEL Virtual guest will hang/freeze and possibly get powered off, when doing large IO (ie linux dump or rman backup command).

Today we lost the lot - Physical host reported Load Average of 91.99 while 98% idle! Other times doing a dump on Virtual machine of a disk to another disk (on same virtual) has resulted in guest freezing and then shutting down. A call has been placed with VMWARE however results havent been promising at this time. Anyone else seen the problem?

Our proliant has 5x72GB disks in a raid 5 format with write back cache. All systems have been upgraded from V4 Update 1 to Update 4.

We have also done other things suggested by VMWARE including using virtual networking and virtual scsi busses to distribute the "virtual" loads of the guests. The host has dual 3.4ghz intel processors. Server was purchased way before dual core and virtualisation technology - unfortunately and also before RHEL5 and XEN.

Any ideas would be good,

Robert.
8 REPLIES
Vitaly Karasik_1
Honored Contributor

Re: Redhat VMWARE Guest hangs or is powered off.

1) does host linux system have problems with big disk I/O itself?
2) what is output of iostat/vmstat/top when you get "Load Average of 91.99 while 98% idle"?
Robert Walker_8
Valued Contributor

Re: Redhat VMWARE Guest hangs or is powered off.

Gday,

The host running iostat had little happening - this was first thought at the time since it was expected that a backup/rman (some oracle io) was happening - however iostat was quiet. We have been caught out by our DBAs doing large exports locking the system up. However this time the large IO would have been some 36 hours ago and the system never recovered.

As long as we didnt do any ps type commands we appeared to have slow but some response. While doing a ps command resulted in session hang on the physical server.

Unfortunately it is not always possible to reproduce the problem. With the dump it failed almost everytime - in the end shutdown the second virtual machine (even though it was idle) to get the copy. However our oracle backups happen daily with most times succeeding. Even are most reliable (just after a downtime slot where one/two/all virtual machines/host were rebooted and a bit later there would be hang/freeze or a power off of a guest) time isnt so these days!

Robert.
Stuart Browne
Honored Contributor

Re: Redhat VMWARE Guest hangs or is powered off.

This may sound like an odd question, but what firmware revision is the RAID controller?

I had this exact behaviour (without the virtual-servers) on a few of our servers (DL380's of some generation, not sure), runnin a 2.58 firmware on the controllers.

Got my datacenter guy's to update it to the latest 2.78, and it's been stable ever since.
One long-haired git at your service...
Robert Walker_8
Valued Contributor

Re: Redhat VMWARE Guest hangs or is powered off.

Gday Stuart,

Yes ran up hpacucli and its 2.34 (extremely old) will upgrade this to see if it helps. Thanks for the tip.

Robert.
Robert Walker_8
Valued Contributor

Re: Redhat VMWARE Guest hangs or is powered off.

Gday,

It seems upgrading the Smart Array 6400 controller firmware hasnt solved our problem. The two virtual machines - one which does an oracle backup at 1am is still freezing the two virtuals.

Robert.
Robert Walker_8
Valued Contributor

Re: Redhat VMWARE Guest hangs or is powered off.

Stuart,

Upgrading the firmware on the controller now causes more problems it is reporting server down on at least one virtual host for the past few nights. Of course the server doesnt actually crash or hang completely - just for a period (enough to not respond to our ping checker monitor).

Any ideas?

Robert.
skt_skt
Honored Contributor

Re: Redhat VMWARE Guest hangs or is powered off.

Below is details from a problem(server went in read only state) we had on vmware linux boxes, from my old communication.Just giving a new direction to look up..not exacty relevent though..

"All our ESX servers use multipathing for shared storage (SAN environment) and do path failover in the event of path failure. If it happens in the middle of a disk write , ESX notifies the VM's virtual SCSI controller and instruct the controller to wait. The VM interprets it as the disk is inaccessible and the disk write faults, causing an error.

There is a supported fix available online from VmWare for Red Hat Enterprise Linux 4 virtual machines. Please refer to the following link for details.

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=51306&sliceId=2&docTypeID=DT_KB_1_1&dialogID=10922410&stateId=0%200%2010918747 "
Robert Walker_8
Valued Contributor

Re: Redhat VMWARE Guest hangs or is powered off.

Try again - this is the one I was suppose to close saying no longer relevant - moved away from VMWARE Server to XEN.