Operating System - Linux
1753762 Members
4931 Online
108799 Solutions
New Discussion юеВ

Re: Red Hat and Proliant lock up's

 
Ross Minkov
Esteemed Contributor

Re: Red Hat and Proliant lock up's

Andre,

What runlevel do you run these servers at?
If these are real servers make sure you have
id:3:initdefault:
in /etc/inittab. This way you eliminate X alltogether.
Also make sure that you have the latest firmware and PSP.
Can you login through the iLO/RILO card when these lock-ups happen? Is there anything to indicate problems in the IML?

Regards,
Ross

Vitaly Karasik_1
Honored Contributor

Re: Red Hat and Proliant lock up's

Danny, start from upgrading your RHEL to the latest update level.
Andre ten Bohmer
Occasional Advisor

Re: Red Hat and Proliant lock up's

Danny, the problem is not solved yet, sorry. It's since December 2004 an official call at HP Europe. All firmware is at the latest level, w're using the HP broadcom driver, latest PSP but to no avail. Netdump and SysRQ or not functioning when the server hangs so still no crash dump to work on for the support people. One server is running at runlevel 3 (all other indeed at 5) after a hang up (again) last week. ILO is not configured. Memory of one server was swapped by HP to make sure not imitation memory is in use (also last week).
Cheers,
Andre
danny_76
New Member

Re: Red Hat and Proliant lock up's

The servers are running the latest firmware and the latest PSP. Red Hat has been updated to latest software and kernel.

They are being run at run level 3.
Not sure how to connect using the iLO/RILO card. No connection through network or kvm, and all logs stop at time hard lock appears to happen.

When talking to HP they had us install the insight manager agents. This changed the behaviour from lockups to rebooting. We were able to get some dumps, but they have now come back and said it is a software issue. No problem with the hardware.

We are now talking with Red Hat and have sent them a vmcore dump, and they have requested another dump to do some comparisons but we have not got another successful dump yet.

Do your servers run with the smp kernel. Have you tried to run them with the non-smp kernel? Have you tried disabling the hyper-threading?

Thanks.
Dan

Andre ten Bohmer
Occasional Advisor

Re: Red Hat and Proliant lock up's

All servers are indeed running the SMP kernel, once tried without HT but to no avail. Running without SMP is not an option because this is all production servers which depend on enough process power. But we have dozens of servers with the same configuration running more than stable. Last week a HP technician was a witness of a server "hang up" and he was stunned. He modified some BIOS settings: disabled USB and set the interrupt for both NICS on the same value, so lets see what now happens. HP is now in contact with Red Hat regarding this problem, so lets be patient for a few days. This problem is a show stopper regarding moving Oracle from OpenVMS to Linux on HP, so the pressure is on.
Andre
danny_76
New Member

Re: Red Hat and Proliant lock up's

Were the problem servers bought near the same time? Similarly, we have the same servers bought in Nov 2004, without any issues, but the servers we bought in ~ March 2004 are having this issue.
Andre ten Bohmer
Occasional Advisor

Re: Red Hat and Proliant lock up's

No sorry, no connection to what kind of batch of servers. Experienced hang-ups on dl-380-g2, dl-380-g3, dl-360-g3 and ml-530 servers. Some servers got stable after a firmware upgrade (like the ml-530 and some dl-380-g2's), others still go down.
Rob Leadbeater
Honored Contributor

Re: Red Hat and Proliant lock up's

Hi Andre,

Are all the machines that are locking up running Oracle ? What version ? App Server or Database ?

We've seen similar issues on a number of DL380 G3's all running RHEL 3, and Oracle Application Server 10g (9.0.4.0.0)

Every now and then the machines will just lock up - they'll normally drop off the network, but occasionally they'll stay on the network but they can't be logged into, either on the console or via SSH.

If I use iLO to look at the X console, I see the time at which the lock up happened, but there's no response from the keyboard or mouse.

Its very frustrating to say the least !

Cheers,

Rob
Andre ten Bohmer
Occasional Advisor

Re: Red Hat and Proliant lock up's

Hi Rob,
Some of them are running Oracle Database Server (8.1.7) but others only run Apache or Amavis/Spamassasin, some are connected to a MSA1000 SAN others just DAS, so for us there is no clear lead. When a hang-up occurs, the console is totaly black and there is no network connectivity (no ssh, sometimes a ping is possible).
Thanks and cheers,
Andre
Don_89
Trusted Contributor

Re: Red Hat and Proliant lock up's

We've had the exact same problems. Server locks-up and doesn't respond to anything except you can still ping the server. All of the server are running Oracle.

The problem turned out to be the HP Insight Manager agents causing this. I disabled the agents and haven't had a lockup since.

I have an open case with HP for the past 3 months but the technician basically gave up trying to figure out the problem..