1753370 Members
4984 Online
108792 Solutions
New Discussion юеВ

Re: High load

 
Rob Leadbeater
Honored Contributor

Re: High load

The only application installed was the management agents for 10gR2.

These aren't essential to the database running, so as a first point of call, uninstall them and/or disable them and see if things improve...

Cheers,

Rob
Consty
Frequent Advisor

Re: High load

Hi all,

Rob,
We disabled everything but no improvement, I don't know if the kernel has changed.

Ivan,
-ecallprog is a program processing mobile telephone calls.
-Yes, the programs are accessing the database information frequently
-Output of drdmgr attached

Thanks
Regards
Consty
Hein van den Heuvel
Honored Contributor

Re: High load

Consty,,

Thanks for the COllect output in text format as well as the statspack.

There is heavy Oracle load, but not excessive it seems. Oracle, and its usage can probably be improved
- review /recode the select count(*) queries
- double the SGA buffer space, as it can use it, and the memory is there.

But that would not have changed with a reboot, nor would it cause the high system time. A common cause for this is paging and swapping but that does not seem to be an issue here.

Be sure to scan the boot-records (UERF -R ? /var/adm/messages? for errors during the boot. Did you keep a (virtual) console log? Maybe some sysconfigtab setting was editted erroneously and did not take?

I would recommend diving into that, using tools to see exactly where the system time is used.

For example, kprofile:

http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/MAN/MAN1/0658____.HTM

Or better still, DCPI:
http://h30097.www3.hp.com/dcpi

I would probably also use a 'truss' (SYS V extentions CD), or (s)trace to get a system call trace for one of those 'sl' processes.

Finally, my WAG is that something is amiss with the network settings.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

Rob Leadbeater
Honored Contributor

Re: High load

Hi Consty,

> I don't know if the kernel has changed.

Take a look at the time stamp on /vmunix to see when the kernel was built. Note that on a cluster this should be a CDSL (cluster dependant symbolic link) to the boot partition, so you'll have to follow that link to get to the actual vmunix file...

Cheers,

Rob
Consty
Frequent Advisor

Re: High load

Thanks,
In addition, do you advice me to restore the
system in case we do not see anything ? I know other Unix but not Tru64, how to retore a system on a TruCluster ?
Regards
Consty
Consty
Frequent Advisor

Re: High load

In addition,
-I checked /vmunix and link date, they are all from october 2004
-There is traffic in the interconnect because the filesystems are mounted on node B and the programs (oracle processes) are running on node A.
-The "sysconfigtab" was restored at the version before last software installation.

Thanks

Consty
Frequent Advisor

Re: High load

Hein van den Heuve,

I have attached the content of /var/adm/messages to help me analyse what is inside.

Regarding interconnect, what I have just been told is that when you start nodeA first then nodeB, the system halts, we should start nodeB first then nodeA (not normal), in addition as nodeB cannot support all the load the filesystems are held by nodeB and the applications are in nodeA. This seems not good for Oracle if SqlNet is not set. I do not know how TruCluster manage it.

Taking into account all this can you help me explain what is going on ?

Thanks
Consty
Hein van den Heuvel
Honored Contributor

Re: High load

The last boot entry for nodea shows:

Nov 21 10:39:27
emx0: Using console topology setting of : Loop
***** HARDWARE ERROR *****
status 0x80000000, 0xA8=0x000025bc, 0xAC=0x00000000: HW ERR:EBUS Parity Error
EMX DRIVER ERR: emx0: emx_log_adap_err - status 0x80000000, 0xA8=0x000025bc, 0xAC=0x00000000: HW ERR:EBUS Parity Error


Seems to me that node A lost connection to the SAN. The disk will be transparently served through node B... at a price though.

It woudl explain you have to boot B first, and considering that Oracle is running on A, and is likely to do the bulk of the disk IO this is bad.

You can explore further with cfsmgr and drd (Distributed Raw Device) tools, but I'd go after the hardware asap, calling in support as needed.

In the mean time, you may want to consider to swap the node assigments. Boot what always was node B as node A and visa versa, bringing the storage close to its main usage.

hth,
Hein.

Consty
Frequent Advisor

Re: High load

Thanks Hein, Thanks all of you,
I'll check everything and let you know.
Regards
Consty
Rob Leadbeater
Honored Contributor

Re: High load

Hi Consty,

You would also be advised to check the various firmware revisions. If you do get HP Support involved, one of the first things they'll ask you to do is to get everything up to supported revisions.

From a quick flick through the messages files, you would need to update the firmware on you fibre HBAs, and the MSA1000.

The firmware flashing of the FCA2354 *may* solve the hardware error that Hein pointed out, although if you go back through that messages file, you'll see that this error has been happening for a *long* time - at least a couple of years if not longer. Swap the HBA.

The MSA1000 is also running a rather old firmware 4.32. You should probably upgrade this to at least 4.48, if not 5.20 which should give some performance boosts.

Cheers,

Rob

P.S. Don't forget to assign points