HPE 9000 and HPE e3000 Servers
1748261 Members
3742 Online
108760 Solutions
New Discussion юеВ

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

 
mvpel
Trusted Contributor

Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Under HP-UX 11.11, we've been having some PCI cards time out occasionally.

According to the HPMC, they apparently fail to assert DEVSEL# when their address is posted to the PCI bus. Since it's 11.11, this of course results in an HPMC.

In the trace, I'm able to see where it identifies the "Requestor ID" and the "PCI_BUS_TARGET_ID," but I'm wondering if it's possible to find out those PCI target addresses on a system that's up and running, or even one that's been TOCed.

The one on the timed-out card looks a bit odd (0xec000174) since it's not aligned, so I'm wondering if that may have something to do with the timeouts.

Suppose the PCI bus is sending 0xec000174 but the card is expecting, say, a doubleword aligned address of 0xec174000, for example, and so is not responding to that address.

This unaligned target ID seems to be a common thread through a few different tombstone files, so I'd like a way to check some known-good cards to help figure out whether or not an unaligned target address is normal.
17 REPLIES 17
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

What type of card are you referring to? There is an issue with some U320 cards not handling device failures well and causing an HPMC to occur. I beleive this was for 11.11.

That could be related to what you are asking about. The fix is to update the "mpt u320" software (located at software.hp.com).

-Bob
----------------
Was this helpful? Like this post by giving me a thumbs up below!
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

After many, many hours of study I was able to drill down into this and figure it out - the HPMC occurs as a result of an invalid DMA access via a page mapped for the PCI card.

The PCI Local Bus Specification 2.1 allows a card to ignore PCI bus address assertions that are beyond the range of its registers.

When a card's registers are mapped to host memory, the minimum is one 4k page. If a memory access is made to any part of this page other than the range of addresses which the card accepts, the card ignores it as it is permitted to and does not assert DEVSEL#, and then the bus times out.

Now we're trying to figure out where the bad access is coming from.
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Did you get any further with this? I ask as I am seeing occasional issues as well with 11.11 systems (rp8420 vpars specifically).

The string I see is similar to as follows:
LBA received no DEVSEL# when mastering the I/O bus.

This is causing a timeout as you describe and an HPMC occurs. Replacing the associated card did not resolve this for us.

-Bob
----------------
Was this helpful? Like this post by giving me a thumbs up below!
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

The problem can arise when the card's registers are memory-mapped, and something (driver, or an application for public IO maps) pokes the map in the wrong place.

You can see if there's anything iomapped on your system by examining the kernel's iomappers linked list.

With Q4, I think the syntax is:

q4> load struct iomapper from iomappers next next max 50
q4> print

This will show you all the IO mappers associated with active processes on the system. The p_vaddr address is the virtual address of the map in quadrant 4. The "iomapper" address contains a pointer to the head of the iomappers list.

A tip 'o the hat goes to WTEC S.A. for this information having soaked deep into my brain during our six months worth of work leading to PHKL_41910.

A more comprehensive approach using P4 instead of Q4 is:

p4> Pregions -A
p4> keep p_type==PT_IO
p4> pview -v

For example:

p4> pview -v
Pregion p_type p_space p_vaddr p_count p_reg
0x41280240 IO 0x845e400 0xfffffffffa200000 2560 0x4123f040
0x41280140 IO 0x845e400 0xfffffffffb000000 4096 0x4123f040
0x41280040 IO 0x845e400 0xfffffffffa000000 32 0x4123f040
0x4acd5cc0 IO 0x9e6dc00 0xfb000000 4096 0x4123f200
0x4ac17e80 IO 0x9e6dc00 0xfa200000 2560 0x4123f200
0x4ac5b2c0 IO 0x9e6dc00 0xfa000000 32 0x4123f200
0x41280540 IO 0x9e6dc00 0xfa200000 2560 0x4123f200
0x41280440 IO 0x9e6dc00 0xfb000000 4096 0x4123f200
0x41280340 IO 0x9e6dc00 0xfa000000 32 0x4123f200
p4>

This list shows memory maps for this B2600 workstation's video card. The first three are 64-bit maps of one region (referenced by three 64-bit pseudo-regions), and the next six are 32-bit maps of another region (p_reg) of three different sizes:

p4> Region 0x4123f200
Loaded 1 reg_t entries in 'DefaultView'
p4> pview
Region r_type refcnt r_pgsz r_nvalid r_fstore r_bstore r_flags
0x4123f200 RT_SHARED 6 0 0 0 0 RF_ALLOC
p4>

Note the reference count of 6, as we'd expect.

In the above example, we have three IO maps in 32-bit land, one at 0xfa200000, another at 0xfa000000, and a third at 0xfb000000. The 32-bit addresses above 0xf0000000 in quadrant 4 are reserved for IO maps.

The space ID is the same because all 32-bit processes share a global space for quadrant 4. In addition, all IO map shared memory segments have a common protection key which is determined at boot.

So, a process which is allowed to access virtual address 0x9e6dc00.0xfa000000, for example, can cause PCI bus activity, and if that activity is not acknowledged by the card, you'll get the bus timeout and HPMC.

Take a close look at the raw tombstone file once you have the mapped address - searching down to an ERR_TIMEOUT in the ts99 file will show you the target PCI bus address which timed out, such as 0x8c001704.

This address will depend on which bus and slot the card is in. It is not a system memory address, but rather a 32-bit value used by the PCI bus to select a given card by asserting that value on the PCI bus address pins at the right time.

This PCI bus address will correspond to a memory-map address. Let's say we found out that it's 0xfa000000, based on research about how the card functions and how the driver maps it.

So if you search the ts99 file for 0xfa001704, you may very well find it in one or more of the registers on one or more of the CPUs listed in the tombstone. Knowing what thread was running on that CPU might lead you to the culprit which made the invalid access to the card.
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Thats great info. Thanks. Do you know where I can get more info on using P4 by chance? I have a basic understanding, but documentation is scarce.
----------------
Was this helpful? Like this post by giving me a thumbs up below!
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Since P4 is an HP-confidential internal tool, I suppose it's not surprising docs are scarce outside of HP. The NDAed copy WTEC sent me for use on our air-gapped secure systems during PHKL_41910 and PHKL_42072 included man pages... but it still can get brain-bruising arcane even if you do have the man pages, a copy of "HP-UX 11i Internals," and a WTEC holding your hand.

I found this a few months ago, which may be helpful:

www.dectrader.com/docs/set3/emr_na-c01037168-2.pdf

But your best bet will probably be to open a software support case for some debugging guidance. Unless it's a non-HP card/driver in which case they'll probably tell you to talk to the vendor, who may be able to find someone who knows Q4.
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Robert, if you can disclose it I'd be interested to know the type of card and driver on which you're getting the HPMC.
Robert_Jewell
Honored Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Sure....rp8420 running the mpt driver on AB290A cards. We updated mpt to the latest published version (B.11.11.0911) on each of the vPars (11.11 on all of them).

I did find an issue documented stating that the U320 cards should be forced to run at U160 when using ds2120 storage systems (which we are). This has been done and we are now waiting to see if things stabilize.

----------------
Was this helpful? Like this post by giving me a thumbs up below!
mvpel
Trusted Contributor

Re: Possible to obtain PCI_BUS_TARGET_ID prior to HPMC?

Thanks! And good luck! How long does it usually take to manifest?

And a rather arcane set of precipitating conditions coupled with certain not-universally-valid assumptions in the LIBCL stack unwinding library appears to be at the root of our own HPMC problem.