- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- rhel as 4 update 1 or 2 random crashes
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-17-2005 09:08 PM
10-17-2005 09:08 PM
rhel as 4 update 1 or 2 random crashes
Random crashes on the few linux x86_64 proliant DL380G4 RHEL AS with update 1 or update 2 without any messages in syslog and without crash dumps. (crash dump not possible).
On both machines latest proliant support pack are installed. I'm use bcm5700 network driver instead of tg3.
See crash log at ilo console in attachment.
While investigating this issue I found:
1) strange entry in dmesg while hpasm start:
(service hpasm start or service hpasm start hpasmd)
Losing some ticks... checking if CPU frequency changed.
warning: many lost ticks.
Your time source seems to be instable or some driver is hogging interupts
rip __do_softirq+0x4d/0xd0
It looks similar with the do_softirq crash entry in the saved ilo crash log.
This messages appear in dmesg after hpasmd daemon start with or without "notaint" option in /opt/compaq/cma.conf.
2) It look strange for me that this user-level daemon can do such a bad thing, but when I do "file" on it - it looks very old and not a 64 bit binary.
file /opt/compaq/hpasmd/bin/hpasmd
/opt/compaq/hpasmd/bin/hpasmd: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), stripped
This issue must be investigated and fixed.
Best regards,
Boris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-26-2005 06:19 AM
10-26-2005 06:19 AM
Re: rhel as 4 update 1 or 2 random crashes
Your number 2 is correct. Even on the x86-64 system, hpasmd and all the other agent in the hpasm package are 32bit apps.
As for the randam crashes. Please provide a little more info. It might be helpfull to provide the output of "hplog -v"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2005 06:44 PM
10-30-2005 06:44 PM
Re: rhel as 4 update 1 or 2 random crashes
ASR Detected by System ROM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2005 12:54 AM
10-31-2005 12:54 AM
Re: rhel as 4 update 1 or 2 random crashes
my guess apic is screwing up your system.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2005 02:25 AM
10-31-2005 02:25 AM
Re: rhel as 4 update 1 or 2 random crashes
Try turning ASR OFF and see if you notice any system hangs or hpasmd dieing. If you don't see either of those situations then maybe try increasing the timeout to ten minutes. It might be possible that another process is taking up so much CPU time that hpasmd doesn't get a chance to update the counter.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2005 01:50 AM
11-02-2005 01:50 AM
Re: rhel as 4 update 1 or 2 random crashes
Server crashed with or without ASR.
hpasmd works fine in all cases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2005 02:46 AM
11-02-2005 02:46 AM
Re: rhel as 4 update 1 or 2 random crashes
thanks
Shannon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2005 09:50 PM
11-02-2005 09:50 PM
Re: rhel as 4 update 1 or 2 random crashes
- Yes (when not running hpasm packages).
What HW devices do you have in the system(ie HBA, Nics, etc)?
DL380G4 devices and HBA FCA2214. Two CPU, 4 GB memory.
- What SW are you running?
RHEL 4 AS Update 1
HP Proliant Support Pack 7.40
Cyrus IMAP server
DHCP server
Apache WEB server
SQUID proxy server
OpenAFS client and server
- Is this a standard Red Hat kernel?
Yes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-06-2006 11:20 PM
06-06-2006 11:20 PM
Re: rhel as 4 update 1 or 2 random crashes
Details:
Server: Proliant DL385 (G1)
CPU: 1x Opteron 275 dual-core
Memory: 16 GB
OS: RedHat ES4 Update 3 (64-bit x86_64 version)
Kernel version: 2.6.9-34.0.1.ELsmp
After the restart, the server generally runs just fine for a day, then server is usually found hung on the next morning. (Actually not quite hung: it still responds to pings, but does not accept network connections. The console is frozen too.)
Our application development guys seem to be running performance tests at night-time, so server load might be a factor.
The 32-bit versions of RedHat ES4 don't suffer from this problem: for application support reasons, we are running some DL385s with 64-bit RHES4 and some with 32-bit RHES4.
I did some googling on this: based on comments on the Linux-kernel mailing list, it looks like the problem might be with AMD 8111 chipset (inaccurate timer implementation?) and/or the fact that Opteron dynamically changes the CPU frequency according to the needs. The hpasmd daemon might be only marginally related.
In the 32-bit Linux kernel, there are several options for OS real-time clock, because of the history of PC hardware evolution. There are fallback mechanisms in case the "best" available method is found unreliable.
In the x86_64 Linux kernel, some of these fallback mechanisms are different or not implemented... probably because it was assumed that a server with 64-bit CPU would not need to fall all the way back to original IBM PC/AT timer technology :-)
The code in question can be found in Linux kernel source:
and
Some of those timing methods need to be aware of CPU speed changes, some use a hardware timer in the Power Management subsystem or something similar.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2006 12:37 AM
06-08-2006 12:37 AM
Re: rhel as 4 update 1 or 2 random crashes
However, my DL385 keeps crashing still, and the crashes seem to be getting more frequent, regardless of which kernel I'm using.
Now I'm beginning to suspect a faulty CPU.
One of our Proliant DL385s with a 32-bit RHES4 had a similar situation: the CPU seemed fine on low load, but running a performance test brought the machine down consistently within 10 minutes of starting the test. There was no message at all in the hardware log, nor on the console: the system just froze. Changing the CPU fixed the problem.
I'm beginning to think that at least in my case the "many lost ticks" message and the crashes are two separate problems, perhaps completely unrelated to each other.
Based on the similarity with the 32-bit RHES4 case, I've opened a hardware call for my troubled 64-bit RHES4 server. Tomorrow I should have the server up with a new CPU, and then we'll see whether it helps or not...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2006 06:42 AM
06-20-2006 06:42 AM
Re: rhel as 4 update 1 or 2 random crashes
The server has been working for almost a week now. Our developers ran some stress tests on it (while I was busy on other projects), including running a "cpuburn" utility over a weekend.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-20-2006 12:59 PM
06-20-2006 12:59 PM
Re: rhel as 4 update 1 or 2 random crashes
I've checked your attachment and here are some suggestions:
1.co-work with HP service , double check the server's hardware status, especially the processors, memory and System ROM setting.
2. you didn't mentioned that your linux is x86-64. If not, use it instead. RHEL4U2(x86-64) is very stable on our hp servers. it should be the same on yours.
3. don't install anything except the linux OS. No HP PSP, no additional driver and application.
4. stress test the above "clean" server with certain tools such as LTP kit, or you can run stress test by individual tools(processor, IO, memory subsystem, disk system...)
I've handled lots of similar cases during the past two years and use the above steps to quickly isolate the potential issues and troubleshoot the problem.
finally,before you test the OS with those servers, hardware healthy verification is always the first priority task to be done. And most issues are hardware related.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-09-2006 08:40 AM
08-09-2006 08:40 AM
Re: rhel as 4 update 1 or 2 random crashes
0002 Critical 15:03 08/09/2006 15:03 08/09/2006 0001
LOG: ASR Detected by System ROM
Redhat ES4 / HP320.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-11-2006 12:43 AM
08-11-2006 12:43 AM
Re: rhel as 4 update 1 or 2 random crashes
Sorry I could not respond earlier: I had a very busy time and then my summer vacation immediately after that.