Operating System - HP-UX
1836446 Members
2439 Online
110100 Solutions
New Discussion

Diagnostic log file grows huge

 
Stuart Powell
Super Advisor

Diagnostic log file grows huge

When I run "/sbin/init.d/diagnostic start" I find that the log file diaglogd_activity_log in directory /var/stm/logs/sys starts growing very large, rather quickly. The contents from the head of the file are shown below.

th9kkux.thcvnet.hercfilm
root
A.03.00
diaglogd
driver_name
A.01.02
tllibio
driver_name
A.03.00
diaglogd
driver_name
A.01.02
tllibio
driver_name

It continues this patter through out the files. Any Ideas?

Stuart
Sometimes the best answer is another question
12 REPLIES 12
Bill Hassell
Honored Contributor

Re: Diagnostic log file grows huge

A runaway diagnostic log seems to occur right after installing the diags -- they need some time to 'cook', to prepare internal files. If this is interrupted, runaways seem to occur. Download the latest version of the diags (not from the CD) and reinstall. Once installed, don't reboot for a few hours.


Bill Hassell, sysadmin
Stuart Powell
Super Advisor

Re: Diagnostic log file grows huge

I have downloaded the Support tools and have installed version A.52.00. I'm still seeing the same problem with these new entries in the /var/stm/logs/sys/diaglogd_activity_log file every minute:
A.01.02
tllibio
driver_name
A.03.00
diaglogd
driver_name
There are not patches listed for HP-UX 11.11.

Any ideas on what I might be missing.
Sometimes the best answer is another question
Andrew Merritt_2
Honored Contributor

Re: Diagnostic log file grows huge

Hi Stuart,
That's not a plain text file. To see what message is being logged (which you'll need to do to know how to stop it), run STM and view the diaglogd activity log.

If you use 'cstm', use 'dacl' and select 'diaglogd'. In 'mstm' or 'xstm', select "System | Daemons | Daemon Activity Log ...| diaglogd".

Andrew
Stuart Powell
Super Advisor

Re: Diagnostic log file grows huge

I ran cstm on the diaglog file and got the following entry repeating endlessly:

Tue Aug 29 13:04:27 2006: Diaglogd daemon failed to get hardware path for I/O error entry to be logged. Entry will be logged with NULL hardware path.

Tue Aug 29 13:04:27 2006: The io_query call failed with io_errno (15) when attempting to get the Context Dependent I/O module name for the device. An STM_KEY_TOKEN_DEF (15) io_errno indicates that the specified item does not exist in the I/O tree node.

Possible Causes/Recommended Action:

The driver is not placing the data in the I/O tree node. This could be a defect or it may merely mean the the driver does not implement this feature.

The device is misconfigured. Reconfigure the
device making sure that the correct device driver is utilized.

However, this doesn't help me solve the problem situation. Any ideas.

Stuart
Sometimes the best answer is another question
Andrew Merritt_2
Honored Contributor

Re: Diagnostic log file grows huge

Hi Stuart,
I appreciate that you don't yet have the answer, but we can make some progress now we know what the messages are.

Check you have a recent version of OnlineDiags installed; if not, upgrade.

If you have a support contract with HP, I would recommend thinking about opening a call with them at this point.

The other thing to do is to work out which devices are referred to. The lines immediately before the ones you've quoted will include the device paths which are causing the problem. You'll need to correlate these with an ioscan output to see if there's something odd about them. What are they for, when were they created, does rerunning 'ioscan' (without the -k option) clear things? When did the problem start? Has new hardware been added?

Sorry I can't give a definite answer at this stage, but basically STM is trying to access the hardware, and is getting an unexpected response when it does so. It should cope with any supported devices, so the most likely explanations are either you've got an old version of OnlineDiags and some new hardware, or there's something bogus about the device paths.

Andrew
Andrew Merritt_2
Honored Contributor

Re: Diagnostic log file grows huge

Ok, just realised you do have A.52.00, so ignore the comments about upgrading.

Andrew
Stuart Powell
Super Advisor

Re: Diagnostic log file grows huge

Andrew;
Your thoughts were very helpful in troublshooting. A few weeks ago I had to change the PID setting on the switches in our SAN to accomidate a newer switch. When I changed that value and reset the switches all of the SAN hardware was assigned new instance numbers. I had not cleaned everything up on the systems that are experienceing the hardware problem. So I went in and got rid of most of the special files and addresses associated with the NO_HW.
Now I have four entries for each switch, one for each port that did have array device:
ext_bus 26 8/4/1/0.3.30.0.0 fcparray NO_HW INTERFACE FCP
Array Interface
ext_bus 16 8/4/1/0.3.30.255.0 fcpdev NO_HW INTERFACE FCP
Device Interface
ext_bus 0 8/4/1/0.3.31.0.0 fcparray NO_HW INTERFACE FCP
Array Interface
ext_bus 2 8/4/1/0.3.31.255.0 fcpdev NO_HW INTERFACE FCP
Device Interface
I will begin looking how to get rid of these ioscan entries. I can't reboot the systems at this time.
The diaglogd_activity_log is still growing.
Stuart
Sometimes the best answer is another question
Andrew Merritt_2
Honored Contributor

Re: Diagnostic log file grows huge

Hi Stuart,

Just running 'ioscan -fn' should do what you need.

If it doesn't, it's safe to delete the dialogd_activity_log file if it's getting too large, that shouldn't break anything.

Andrew
Andrew Merritt_2
Honored Contributor

Re: Diagnostic log file grows huge

The other point I just realised I haven't mentioned is that the reason diaglogd is logging the errors is because it is dealing with events passed by the device driver, and doesn't recognise the path name. The point being that one of the drivers is detecting some sort of hardware problem, most likely when a device is being accessed. You may be able to see what these errors are by using 'logtool' in STM (I'm not sure exactly whether the problem with the unrecognised paths would prevent the events appearing in logtool or not).

It could be a problem due to the ioscan data being out of sync, so could resolve itself when that is sorted.

Andrew
Stuart Powell
Super Advisor

Re: Diagnostic log file grows huge

Andrew,

Here is the output of a new ioscan:
$ sudo ioscan -fnC ext_bus
Class I H/W Path Driver S/W State H/W Type Description
==============================================================================
ext_bus 35 8/4/1/0.3.14.0.0 fcparray CLAIMED INTERFACE FCP Array
Interface
ext_bus 36 8/4/1/0.3.14.255.0 fcpdev CLAIMED INTERFACE FCP Device
Interface
ext_bus 37 8/4/1/0.3.15.0.0 fcparray CLAIMED INTERFACE FCP Array
Interface
ext_bus 38 8/4/1/0.3.15.255.0 fcpdev CLAIMED INTERFACE FCP Device
Interface
ext_bus 26 8/4/1/0.3.30.0.0 fcparray NO_HW INTERFACE FCP Array
Interface
ext_bus 16 8/4/1/0.3.30.255.0 fcpdev NO_HW INTERFACE FCP Device
Interface
ext_bus 0 8/4/1/0.3.31.0.0 fcparray NO_HW INTERFACE FCP Array
Interface
ext_bus 2 8/4/1/0.3.31.255.0 fcpdev NO_HW INTERFACE FCP Device
Interface
ext_bus 39 8/4/1/0.4.11.0.0 fcparray CLAIMED INTERFACE FCP Array
Interface
ext_bus 40 8/4/1/0.4.11.255.0 fcpdev CLAIMED INTERFACE FCP Device
Interface
ext_bus 41 8/4/1/0.4.14.0.0 fcparray CLAIMED INTERFACE FCP Array
Interface
ext_bus 42 8/4/1/0.4.14.255.0 fcpdev CLAIMED INTERFACE FCP Device
Interface
ext_bus 3 8/4/1/0.4.27.0.0 fcparray NO_HW INTERFACE FCP Array
Interface
ext_bus 6 8/4/1/0.4.27.255.0 fcpdev NO_HW INTERFACE FCP Device
Interface
ext_bus 20 8/4/1/0.4.30.0.0 fcparray NO_HW INTERFACE FCP Array
Interface
ext_bus 18 8/4/1/0.4.30.255.0 fcpdev NO_HW INTERFACE FCP Device
Interface
ext_bus 31 8/8/1/0.1.11.0.0 fcparray CLAIMED INTERFACE FCP Array
Interface
ext_bus 32 8/8/1/0.1.11.255.0 fcpdev CLAIMED INTERFACE FCP Device
Interface
ext_bus 33 8/8/1/0.1.14.0.0 fcparray CLAIMED INTERFACE FCP Array
Interface
ext_bus 34 8/8/1/0.1.14.255.0 fcpdev CLAIMED INTERFACE FCP Device
Interface
ext_bus 7 8/8/1/0.1.27.0.0 fcparray NO_HW INTERFACE FCP Array
Interface
ext_bus 8 8/8/1/0.1.27.255.0 fcpdev NO_HW INTERFACE FCP Device
Interface
ext_bus 24 8/8/1/0.1.30.0.0 fcparray NO_HW INTERFACE FCP Array
Interface
ext_bus 14 8/8/1/0.1.30.255.0 fcpdev NO_HW INTERFACE FCP Device
Interface
ext_bus 27 8/8/1/0.2.3.0.0 fcparray CLAIMED INTERFACE FCP Array
Interface
ext_bus 28 8/8/1/0.2.3.255.0 fcpdev CLAIMED INTERFACE FCP Device
Interface
ext_bus 29 8/8/1/0.2.15.0.0 fcparray CLAIMED INTERFACE FCP Array
Interface
ext_bus 30 8/8/1/0.2.15.255.0 fcpdev CLAIMED INTERFACE FCP Device
Interface
ext_bus 21 8/8/1/0.2.19.0.0 fcparray NO_HW INTERFACE FCP Array
Interface
ext_bus 11 8/8/1/0.2.19.255.0 fcpdev NO_HW INTERFACE FCP Device
Interface
ext_bus 9 8/8/1/0.2.31.0.0 fcparray NO_HW INTERFACE FCP Array
Interface
ext_bus 10 8/8/1/0.2.31.255.0 fcpdev NO_HW INTERFACE FCP Device
Interface
ext_bus 1 10/0 c720 CLAIMED INTERFACE GSC built-in F
ast/Wide SCSI Interface
ext_bus 5 10/12/0 CentIf CLAIMED INTERFACE Built-in Paral
lel Interface
/dev/c5t0d0_lp
ext_bus 4 10/12/5 c720 CLAIMED INTERFACE Built-in SCSI

I relaize that this information is very busy, but I best shows my situation. The top of the ioscan output shows the connections for the switch domain 3 on our SAN. Notice that there are two entries on port 14 and two on port 15 that are claimed. Immediately below that is entries for ports 30 and 31 that have NO_HW. The ports 14 & 15 lists are for the current-valid SAN devices with the switch PID value set to 1. The 30 & 31 lists are the same physical ports when the PID on the switch was set to 0.

I don't have a problem on my production servers, but they were rebooted after I made the SAN change and these development servers have not been rebooted.

Any other thoughts?

Stuart
Sometimes the best answer is another question
Stuart Powell
Super Advisor

Re: Diagnostic log file grows huge

I found this expanation on the Technical Knowlege base at Document ID: 8606422040
CR# JAGaf81864 problem
Even with rmsf(1M) from PHCO_32202 it is not possible to remove ext_bus entries for interfaces that no longer exist (show "NO_HW" in ioscan(1M)output) so one cannot get rid of them without rebooting.

It would be useful to be able to remove ext_bus entries for interfaces that no longer exist without rebooting, for instance on a superdome which is being maxed out of instance numbers.

Here is what was tried to do:

# rmsf -k -H 5/0/14/0/0.42.5.255.0
WARNING: The specified hardware path is BUS_NEXUS/INTERFACE type.
This will remove all the devices connected to it.
rmsf: Specified hardware path has no devices

# ioscan -fk -H 5/0/14/0/0.42.5.255.0
Class I H/W Path Driver S/W State H/W Type Description
=======================================================================
ext_bus 228 5/0/14/0/0.42.5.255.0 fcpdev NO_HW INTERFACE
FCP Device Interface
target 530 5/0/14/0/0.42.5.255.0.0 tgt NO_HW DEVICE

Rmsf(1m) does not remove the reference to the bus structures if they are not currently installed.

So it looks like a reboot is the correct next step.

Stuart
Sometimes the best answer is another question
Stuart Powell
Super Advisor

Re: Diagnostic log file grows huge

Looks like a reboot is the correct solution.
Sometimes the best answer is another question