- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: memory error question
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 05:11 AM
05-31-2006 05:11 AM
The server is a D370 running hp-ux 11.0 and Openmail. Openmail is failing to start after a power failure.
Thank you
Gary
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 05:53 AM
05-31-2006 05:53 AM
Re: memory error question
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 05:56 AM
05-31-2006 05:56 AM
Re: memory error question
The errors may caused by bad DIMMs or even a bad memory controller. Some sources are telling about "electronic smog" as a cause.
The DIMM 3A has a high count of errors.
Some areas of your memory are already marked bad and no longer used (PDT). Normally a replacement of the DIMMs is needed.
Is this system really up since 2003 without reboot? Please check "uptime".
Hope this helps!
Regards
Torsten.
__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 06:02 AM
05-31-2006 06:02 AM
Re: memory error question
Hope this helps!
Regards
Torsten.
__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 06:03 AM
05-31-2006 06:03 AM
Re: memory error question
Two things I notice about the report:
1) The counts on *all* the multi-bit are zero.
2) None of the addresses for those errors are in the PDT.
This could indicate that they're old errors & those DIMMS have been replaced & the PDT reset.
Or those are bogus messages.
I supect the former since the Memory Error Log History contains no dates for those errors.
Remember that *any* multi-bit error while the system is up & running will cause a panic. If a multi-bit is detected during POST then the address will be added to the PDT & bootup will proceed.
Rgds,
Jeff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 06:09 AM
05-31-2006 06:09 AM
Re: memory error question
Try
# echo "selclass qualifier cpu;info;wait;il"|cstm|grep "PDC Firmware"
Hope this helps!
Regards
Torsten.
__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 07:14 AM
05-31-2006 07:14 AM
Re: memory error question
PDC Firmware Revision: 42.11 IODC Revision: 0
PDC Firmware Revision: 42.11 IODC Revision: 0
I've resat the memory in the system that was causing the errors and my problem remains. I'm going to remove the ones that were throwing the errors and see if that helps.
Something else that may be of issue is this message in the syslog.log file:
May 31 13:47:57 gomail vmunix: SCSI: Unexpected Disconnect -- lbolt: 23946, dev:
cb05f002, io_id: 500002d
There are dozens of these. I know it's a scsi device, but how do I find out which one is causing the messages? This is a d-class server with 3 SCSI disks and it's connected to a VA7410 disk array. All of the lbolts read the same. I have noticed that one of the disks 8/4.4.0 has had a number of retries as listed by cstm in the information display.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 07:26 AM
05-31-2006 07:26 AM
Re: memory error question
Regarding the SCSI errors, looks like a bad disk, but a closer look (more information) is needed.
Hope this helps!
Regards
Torsten.
__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 07:35 AM
05-31-2006 07:35 AM
Re: memory error question
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 07:44 AM
05-31-2006 07:44 AM
Re: memory error question
With regards to the lbolt. any ideas on what to look for? Here's what I've tried:
1. All disks come up clean in ioscan.
2. ran the echo 2400?20X | adb /dev/dsk/c0txd0 againt the drives and they return normally.
3. Ran stm info against the disks and they all com back clean except the disk at 8/4.4.0 displays a total of 66 retries.
I'm not sure what else to check.
While the system is down I'll reseat the the disks and double check the cabling.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 07:56 AM
05-31-2006 07:56 AM
Solution# dd if=/dev/rdsk/cxtydz of=/dev/null
against it. If you get I/O errors or see errors in stm, consider to replace this drive.
If you remove a pair of DIMMs, re-sort the others. There must not be any gaps - slots 0, 1 ,2 ... have to be filled.
I would clear the PDC in the service menu in BCH after re-sorting the DIMMs and run several loops of exercise test from stm. Be aware this can crash your system if the DIMMs are really bad. Do this only without running a productive software. After the tests, have a look into the stm logs.
Hope this helps!
Regards
Torsten.
__________________________________________________
There are only 10 types of people in the world -
those who understand binary, and those who don't.
__________________________________________________
No support by private messages. Please ask the forum!
If you feel this was helpful please click the KUDOS! thumb below!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 08:02 AM
05-31-2006 08:02 AM
Re: memory error question
have you tested the memory with STM and run the exerciser tests? this would be agood place to start. You could also clear the PDT and then run the exercise tests in STM and then the logtool. See what comes up then.
Also to see where the lbolt error has come from check with
#ll /dev ¦ grep 05f002
see what this comes back with. It looks more likely a controller timing out.
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-31-2006 01:05 PM
05-31-2006 01:05 PM
Re: memory error question
I dont find any issue with the memory as seen from the logs collected,
System start: Thu Nov 13 17:52:25 2003.
Last error check: Wed May 31 11:40:44 2006.
Logging interval: 3600 seconds.
15 address(es) with errors logged by memory logging daemo
If you notice, the last error check was May31, however the logging interval is still 3600 secs which the default time, in situations where there is really a memory fault, you will notice that the logging time will go on decreasing ex 15sec...
Every memory error does not mean a h/w fault, it could also be induced by an application as well.
As Jeff has already confirmed, the PDT entries dont match the ones which are having issues, which implies that none of the memory pages are marked bad, which itself is an indication that it is safe to leave the machine as it is and clear teh PDT table on the next available chance.
regards
Albert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-02-2006 06:36 AM
06-02-2006 06:36 AM
Re: memory error question
The original problem was that Openmail appeared to be unresponsive after startup after a power failure. I was trying to find a problem. As it turns out leaving it alone overnight while awaiting a pupport tech resolved the problem.
So it appears that there was no problem to begin with other than the system was trying to get caught up with it's processing. This seems to have been a reocurring theme of my questions lately. :-)
Thanks to everytone for their help
Gary