- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: How does the kernel detect a bus check HPMC?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2010 06:08 AM
тАО12-07-2010 06:08 AM
So this got me wondering how the kernel detects a bus check HPMC - what is the HPMC handler watching for?
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2010 02:45 PM
тАО12-07-2010 02:45 PM
Solutionhttp://www.dectrader.com/docs/set3/emr_na-c01037168-2.pdf
"A hardware crash event can be High Priority Machine Check (HPMC), Low Priority Machine Check (LPMC) or Transfer of Control (TOC). The machine checks are typically caused by hardware malfunctions or certain classes of bus errors. TOC on the other hand is usually initiated by the operator in response to system software being stuck in an error state.
When a hardware crash event occurs, the processor immediately branch to PDC entry point; PDCE_CHECK for HPMC and LPMC faults, and PDCE_TOC for TOC. *The implementation details of these PDC entry points are processor dependant.* Fundamentally they save the processor├в s state (general, control, space and interruption registers) into Processor Internal Memory (PIM). The processor then vectors back into the operating system entry points; HPMC_Vector or TOC_Vector. These entry points are defined in the IVA (Interruption Vector Table) and MEM_TOC in Page Zero respectively.
On entry into the kernel, a crash event entry is created. The operating system makes a pdc call (PDC_PIM) to read the processor├в s state information from PIM into a Restart Parameter Block (RPB). As such the RPB structure contains information pertinent to the understanding of the crash. For example, the Program Counter (PC) in the RPB would indicate what routine was executing at the time of HPMC/TOC event. Once the state has been saved, the operating system continues to dump physical memory to the dump device."
http://book.soundonair.ru/hall2/ch10lev1sec3.html
http://sequoia.ict.pwr.wroc.pl/~iro/RISC/sm/www.hp.com/acd-38.html
http://ftp.parisc-linux.org/docs/arch/pa11_acd.pdf
http://ftp.parisc-linux.org/docs/arch/parisc2.0.pdf
Rgds,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-07-2010 11:44 PM
тАО12-07-2010 11:44 PM
Re: How does the kernel detect a bus check HPMC?
I'll try to reduce the jargon level of cnb's quote a bit:
The HPMC handler is not "watching for" anything: the actual hardware chips on the system board do the watching. They check the ECC bits on any data that is being read from RAM, and also watch for similar data transmission error detection signals on other system buses.
If an error is detected, the piece of hardware activates a signal that triggers a "Group 1 interrupt" on the CPU(s). On the PA-RISC architecture, this is the most serious interrupt signal that exists. On Itanium, the terminology may be different but the event is equally severe.
The interrupt signal makes the CPU immediately stop what it's doing and mark its place in the CPU's internal registers designed for that purpose, and then check for instructions in a memory address defined at CPU design time. (Think of it as like a pre-arranged location for storing a building's evacuation and disaster recovery plans, as might be required in large office buildings by the National Building Code.)
When the system was booted up, the firmware initialized that memory address to point to the HPMC handler routine. First, the firmware HPMC handler does whatever model-specific things are required to get the error information from the system board chips and store it in standard format in the location where the kernel expects to find it.
Then the firmware checks a table of jump vectors the kernel has prepared in advance: "In case HPMC, TOC or other major sh*t happens". In the case of HPMC, this tells the firmware to run the kernel's HPMC handler. The HPMC handler will then output a message to the system console and execute the system panic procedures.
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2010 01:39 AM
тАО12-08-2010 01:39 AM
Re: How does the kernel detect a bus check HPMC?
1) it may be a timing problem with the interfaces,
2) it can also be a driver problem which access to address where it should not, and which causes the HPMC.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2010 07:40 AM
тАО12-08-2010 07:40 AM
Re: How does the kernel detect a bus check HPMC?
This possibility definitely lends weight to my suspicion that a driver problem is at the root of the issue we've been having, since we've actually been able to reproduce a "Bus Check" HPMC on a different system, which suggests something other than a hardware problem.
The HPMCs we've seen will point to seemingly random LBA numbers, all of which have been third-party cards with either dynamic or static driver modules.
This has been extremely baffling for quite a while since we were always focused on some sort of hardware issue, and we had a lot of trouble reproducing or characterizing it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2010 09:16 AM
тАО12-08-2010 09:16 AM
Re: How does the kernel detect a bus check HPMC?
> Simplified answer!
The kernel has it's own view of the operating system called as the virtual address space. This virtual address space has to be mapped to the physical address space which is typically called as the TLB. TLB is a finite registry and hence a TLB miss can occur. When a device tries to access a physical address that is actually not there, it triggers a high priority machine check or otherwise called as a machine check abort on other architectures.
Regards
Ismail Azad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2010 09:38 AM
тАО12-08-2010 09:38 AM
Re: How does the kernel detect a bus check HPMC?
Yes they can be software as per the crashinfo statement:
"Note: This appears to be a BUS check hpmc. BUS Checks are often caused by
hardware problems, but there are many software causes as well.
To progress a BUS Check HPMC you will normally need to obtain the
hardware TOMBSTONE and analyse it."
Rgds,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2010 09:44 AM
тАО12-08-2010 09:44 AM
Re: How does the kernel detect a bus check HPMC?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2010 10:04 AM
тАО12-08-2010 10:04 AM
Re: How does the kernel detect a bus check HPMC?
Reading your first post again, I understand that you have been experiencing multiple HPMC panics and you could probably have similar kernel configurations on the various servers (if youv'e used ignite) . The root cause could be a parameter that controls PCI recovery. Please check the value of pci_eh_enable as this MIGHT be the cause of your problem. Since you were talking about "kernel detection", pci_eh_enable is the ultimate parameter that can cause this disaster in most cases if configured wrongly.
Regards
Ismail Azad
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО12-08-2010 10:45 AM
тАО12-08-2010 10:45 AM