HP 9000
cancel
Showing results for 
Search instead for 
Did you mean: 

Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

mvpel
Trusted Contributor

Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Under HP-UX 11.11, we've been having some PCI cards time out occasionally.

According to the HPMC, they apparently fail to assert DEVSEL# when their address is posted to the PCI bus. Since it's 11.11, this of course results in an HPMC.

In the trace, I'm able to see where it identifies the "Requestor ID" and the "PCI_BUS_TARGET_ID," but I'm wondering if it's possible to find out those PCI target addresses on a system that's up and running, or even one that's been TOCed.

The one on the timed-out card looks a bit odd (0xec000174) since it's not aligned, so I'm wondering if that may have something to do with the timeouts.

Suppose the PCI bus is sending 0xec000174 but the card is expecting, say, a doubleword aligned address of 0xec174000, for example, and so is not responding to that address.

This unaligned target ID seems to be a common thread through a few different tombstone files, so I'd like a way to check some known-good cards to help figure out whether or not an unaligned target address is normal.
17 REPLIES
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

What type of card are you referring to? There is an issue with some U320 cards not handling device failures well and causing an HPMC to occur. I beleive this was for 11.11.

That could be related to what you are asking about. The fix is to update the "mpt u320" software (located at software.hp.com).

-Bob
----------------
Was this helpful? Like this post by giving me a thumbs up below!
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

After many, many hours of study I was able to drill down into this and figure it out - the HPMC occurs as a result of an invalid DMA access via a page mapped for the PCI card.

The PCI Local Bus Specification 2.1 allows a card to ignore PCI bus address assertions that are beyond the range of its registers.

When a card's registers are mapped to host memory, the minimum is one 4k page. If a memory access is made to any part of this page other than the range of addresses which the card accepts, the card ignores it as it is permitted to and does not assert DEVSEL#, and then the bus times out.

Now we're trying to figure out where the bad access is coming from.
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Did you get any further with this? I ask as I am seeing occasional issues as well with 11.11 systems (rp8420 vpars specifically).

The string I see is similar to as follows:
LBA received no DEVSEL# when mastering the I/O bus.

This is causing a timeout as you describe and an HPMC occurs. Replacing the associated card did not resolve this for us.

-Bob
----------------
Was this helpful? Like this post by giving me a thumbs up below!
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

The problem can arise when the card's registers are memory-mapped, and something (driver, or an application for public IO maps) pokes the map in the wrong place.

You can see if there's anything iomapped on your system by examining the kernel's iomappers linked list.

With Q4, I think the syntax is:

q4> load struct iomapper from iomappers next next max 50
q4> print

This will show you all the IO mappers associated with active processes on the system. The p_vaddr address is the virtual address of the map in quadrant 4. The "iomapper" address contains a pointer to the head of the iomappers list.

A tip 'o the hat goes to WTEC S.A. for this information having soaked deep into my brain during our six months worth of work leading to PHKL_41910.

A more comprehensive approach using P4 instead of Q4 is:

p4> Pregions -A
p4> keep p_type==PT_IO
p4> pview -v

For example:

p4> pview -v
Pregion p_type p_space p_vaddr p_count p_reg
0x41280240 IO 0x845e400 0xfffffffffa200000 2560 0x4123f040
0x41280140 IO 0x845e400 0xfffffffffb000000 4096 0x4123f040
0x41280040 IO 0x845e400 0xfffffffffa000000 32 0x4123f040
0x4acd5cc0 IO 0x9e6dc00 0xfb000000 4096 0x4123f200
0x4ac17e80 IO 0x9e6dc00 0xfa200000 2560 0x4123f200
0x4ac5b2c0 IO 0x9e6dc00 0xfa000000 32 0x4123f200
0x41280540 IO 0x9e6dc00 0xfa200000 2560 0x4123f200
0x41280440 IO 0x9e6dc00 0xfb000000 4096 0x4123f200
0x41280340 IO 0x9e6dc00 0xfa000000 32 0x4123f200
p4>

This list shows memory maps for this B2600 workstation's video card. The first three are 64-bit maps of one region (referenced by three 64-bit pseudo-regions), and the next six are 32-bit maps of another region (p_reg) of three different sizes:

p4> Region 0x4123f200
Loaded 1 reg_t entries in 'DefaultView'
p4> pview
Region r_type refcnt r_pgsz r_nvalid r_fstore r_bstore r_flags
0x4123f200 RT_SHARED 6 0 0 0 0 RF_ALLOC
p4>

Note the reference count of 6, as we'd expect.

In the above example, we have three IO maps in 32-bit land, one at 0xfa200000, another at 0xfa000000, and a third at 0xfb000000. The 32-bit addresses above 0xf0000000 in quadrant 4 are reserved for IO maps.

The space ID is the same because all 32-bit processes share a global space for quadrant 4. In addition, all IO map shared memory segments have a common protection key which is determined at boot.

So, a process which is allowed to access virtual address 0x9e6dc00.0xfa000000, for example, can cause PCI bus activity, and if that activity is not acknowledged by the card, you'll get the bus timeout and HPMC.

Take a close look at the raw tombstone file once you have the mapped address - searching down to an ERR_TIMEOUT in the ts99 file will show you the target PCI bus address which timed out, such as 0x8c001704.

This address will depend on which bus and slot the card is in. It is not a system memory address, but rather a 32-bit value used by the PCI bus to select a given card by asserting that value on the PCI bus address pins at the right time.

This PCI bus address will correspond to a memory-map address. Let's say we found out that it's 0xfa000000, based on research about how the card functions and how the driver maps it.

So if you search the ts99 file for 0xfa001704, you may very well find it in one or more of the registers on one or more of the CPUs listed in the tombstone. Knowing what thread was running on that CPU might lead you to the culprit which made the invalid access to the card.
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Thats great info. Thanks. Do you know where I can get more info on using P4 by chance? I have a basic understanding, but documentation is scarce.
----------------
Was this helpful? Like this post by giving me a thumbs up below!
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Since P4 is an HP-confidential internal tool, I suppose it's not surprising docs are scarce outside of HP. The NDAed copy WTEC sent me for use on our air-gapped secure systems during PHKL_41910 and PHKL_42072 included man pages... but it still can get brain-bruising arcane even if you do have the man pages, a copy of "HP-UX 11i Internals," and a WTEC holding your hand.

I found this a few months ago, which may be helpful:

www.dectrader.com/docs/set3/emr_na-c01037168-2.pdf

But your best bet will probably be to open a software support case for some debugging guidance. Unless it's a non-HP card/driver in which case they'll probably tell you to talk to the vendor, who may be able to find someone who knows Q4.
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Robert, if you can disclose it I'd be interested to know the type of card and driver on which you're getting the HPMC.
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Sure....rp8420 running the mpt driver on AB290A cards. We updated mpt to the latest published version (B.11.11.0911) on each of the vPars (11.11 on all of them).

I did find an issue documented stating that the U320 cards should be forced to run at U160 when using ds2120 storage systems (which we are). This has been done and we are now waiting to see if things stabilize.

----------------
Was this helpful? Like this post by giving me a thumbs up below!
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Thanks! And good luck! How long does it usually take to manifest?

And a rather arcane set of precipitating conditions coupled with certain not-universally-valid assumptions in the LIBCL stack unwinding library appears to be at the root of our own HPMC problem.
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Oof! A bit of work for you there, eh?

Our problem seems to manifest itself in droves...a series of HPMC's and then we were good for a month, then more crash events. Hopefully we are good to go at this point though with the changes as advised.

Thanks,
Bob
----------------
Was this helpful? Like this post by giving me a thumbs up below!
Dennis Handly
Acclaimed Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

>not-universally-valid assumptions in the LIBCL stack unwinding library appears to be at the root of our own HPMC problem.

(So you are that customer with that PROBE problem.)
This is either a kernel or a driver issue since a non-privileged program shouldn't be able to crash the system.
That's the assumption of the Trap/Unwind lib. It uses PROBEW so it doesn't crash the application.
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Gene B. (WTEC) and one of my engineering-team colleagues here have the problem by the tail, so at this point I think it's just a matter of determining the proper strategy to correct it.
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Let's just say it's going to be a VERY interesting white paper. Hopefully we'll be permitted to release it outside of the company.
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Gene added one line of code to the U_get_previous_frame() call (I think), in the LIBCL library, to initialize the prev_r19 value in the prev_frame structure to zero, and the problem disappeared.

SW engineering is testing, and so far, so good. Look for an upcoming patch to supersede PHSS_40802.
Michael_Pelleti
Occasional Advisor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

One key piece of information I omitted above is that the PCI Local Bus Specification v2.1 says that a PCI card only has to respond to addresses which it supports, not the entire 4096-byte page that may be mapped. The timeout occurs when the access falls within the mapped 4k page but outside of the range of the card's actual registers.

 

  • “Devices that do consume more address space than they use are not required to respond to the unused portion of that address space.”

The LIBCL patch including this fix is due out in the November 2011 11i v1 patch bundle.

 

Certain compilers bundle the libcl.a in the compiler install directory, however, and static-link it into their executables, so simply applying the patch won't fix the problem - any executable built with the old library would need to be rebuilt with the new one.

Dennis Handly
Acclaimed Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

>Certain compilers bundle the libcl.a in the compiler install directory, however, and static-link it into their executables,

 

This is a core patch, nothing to do with compiler paths.  And it is the user that can chose to link in the archive version

mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

One last update - the fix for our issue turned out to adversely affect the stack-unwinding functionality of Pascal programs, and as a result it was not incorporated into the official LIBCL patch tree but released as a customer-specific patch instead.