Insight Remote Support
cancel
Showing results for 
Search instead for 
Did you mean: 

ISEE is confused

SOLVED
Go to solution
Darrell Tschakert
Regular Advisor

ISEE is confused

We have four rx4640's running 11.23 and two rx2620's running 11.23. Each of the HP's run ISEE A.03.95.026. ISEE ran fine until a few weeks ago. At that time my co-worker wanted to
upgrade to JDK 1.4.2.12. She did the upgrade, but then the she didn't like the way the output of swlist looked, so she swremoved JDK 1.4.2.12. She also swremoved JDK14 1.4.2.09.02 and JDK14 1.4.3.03.04. JDK 1.4.2.12.00 was removed last and it gave her some warnings, but I do not have there error
messages at this time. After the swremoves, she swinstalled JDK 1.4.2.12.00 again. Things went along okay, until the next Monday at Midnight when cron reastarted the ISEE runner.
This is the line from root's crontab, which seems to reset the ISEE sofware.

# Entry(s) in /opt/hpservices/RemoteSupport are for HP Instant Support Enterprise Ed
ition
0 0 * * 1 /opt/hpservices/RemoteSupport/config/pruneIncidents.sh > /dev/null 2
>&1


From that point on, the logs in /var/opt/runner/logs started to grow until
/var ran out of space a couple days later..

I shut the runner down by entering "/sbin/init.d/runner stop". Then I saved the logs elsewhere to get my /var file system back.

I still have runner shut down the rx4640's. On the rx2620's she did a straight update of JDK to 1.4.2.12.00 and they still work fine.

Before I start learning more than I want to know about ISEE, I wanted to ask if anyone knows of some simple obvious thing that I am overlooking. Like the press of a button or running something like "isee on". Does anyone have such a silver bullet for me to shoot?

I can still use a browser to access the rx4640's over the LAN to update my contanct info etc., but I can't start up runner.

Thanks.

Darrell Tschakert


I'll add a quote when I think of one.
12 REPLIES
Liem Nguyen
Honored Contributor

Re: ISEE is confused

Darell,

Take a look at JVM_LOC in /opt/hpservices/etc/motprefs.

Does it still point to a valid Java path? If not, you may want to change it to where you had Java 1.4.2.12, i.e. /opt/java1.4/jre/bin/java.

Then stop/start hpservices (/sbin/init.d/hpservices stop/start), then start Runner.

Are these Mission Critical Servers? If they are not, you can disable Runner (/etc/rc.config.d/runner.conf).

If these are MC Servers aand you are still seeing the same problem with Runner filling up /var, please contact your Account Support Team and ask for Runner upgrade.

Regards,


Darrell Tschakert
Regular Advisor

Re: ISEE is confused

Leim,
JVM_LOC points to the new Java. Changing the value of JVM_LOC in motprefs doesn't really get you anything anyway, because the first time you reboot, or just restart the hpservices, the motprefs file is remade and the old values are back in place. We tried this on a server that pointed to Java 1.3 when it should have pointed to 1.4, but every time we rebooted motprefs went back to 1.3 again.

These are mission critical systems, so we would like to correct the problem. I don't think that the problem is with runner, but with some config file or pointer that runner uses. It got disconnected when she removed all of the Java's and never got put back together properly.

Thanks,

Darrell Tschakert


I'll add a quote when I think of one.
Liem Nguyen
Honored Contributor

Re: ISEE is confused

Darrell,

That's odd.
I've never seen JVM_LOC entry goes back to the default value, unless:

1. You have incorrect value (wrong path, wrong version, typo).
2. The Server can't communicate with it's Server (either HP Backend or SPOP).

Please copy the path and run it with
-version at the end, i.e. /opt/java1.4/jre/bin/java -version

Send me the output of the above command.

Regards,
Darrell Tschakert
Regular Advisor

Re: ISEE is confused

Leim,
Sorry, it was valuePairs that gets changed every time we start runner and not motprefs. Motprefs points to 1.4. The Java version is 1.4.2.12.

java version "1.4.2.12"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2.12-061213-02:03)
Java HotSpot(TM) Server VM (build 1.4.2 1.4.2.12-061213-07:34-IA64N IA64, mixed mode)

I don't know if upgrading the runner is going to get me anything. We have six HP-UX 1.23 computers. The all have exactly the same runner.jar file as indicated by md5sum. Two of the six servers are working fine. Four of them have problems. The four that have problems were the ones that my supervisor removed all java versions from and then reinstalled it.

md5sum /opt/runner/java/runner.jar
b2978a61c3efe03be290d2d27ec1c348 /opt/runner/java/runner.jar


I am still thinking that there is some link between runner and what runner is supposed to get it's info from like EMS or some such thing.

Below are some of the errors that were being generated with runner was up on the problem computers:

Mon Mar 26 16:24:53 EDT 2007
tails>ERROR: sleep error: Java.lang.IllegalArgumentException: timeout value is negative timeout value is negative


Mon Mar 26 16:24:53 EDT 2007
ERROR: ps -ef java.io.IOException: Too many open files (errno:24) Too many open files (errno:24)


Mon Mar 26 16:24:53 EDT 2007
ERROR: error returned from Heartbeat: java.lang.NullPointerException null


Mon Mar 26 16:24:53 EDT 2007
tails>ERROR: sleep error: java.lang.IllegalArgumentException: timeout value is
negative timeout value is negative


Mon Mar 26 16:24:53 EDT 2007
tails>ERROR: ps -ef java.io.IOException: Too many open files (errno:24) Too man
y open files (errno:24)

Mon Mar 26 16:24:53 EDT 2007
And some more errors from the same ruunerError log:

ERROR: readLine error returned: java.io.IOException: Bad file number (errno:9) Bad file number (errno:9)
Thu Jul 13 00:07:52 EDT 2006
tails>ERROR: readLine error returned: java.io.IOException: Bad file number (err
no:9) Bad file number (errno:9)

ERROR: /opt/hpsmc/common/bin/hpssidgen -outdir /var/opt/runner/logs Command Tim
eout
ERROR: /opt/hpsmc/common/bin/hpssidgen -outdir /var/opt/runner/logs Command Tim
eout

Then there are these types of errors:

ERROR: /bin/ps -fu oracle java.io.IOException: Too many open files (errno:24) Too many open files (errno:24)
ERROR: /bin/ps -fu oracle java.io.IOException: Too many open files (errno:24) T
oo many open files (errno:24)
Sat Mar 24 23:53:13 EDT 2007
tails>ERROR: /bin/ps -fu oracle java.io.IOException: Too many open files (errno:24) Too many open files (errno:24)

ERROR: /bin/ps -fu oracle java.io.IOException: Too many open files (errno:24) Too many open files (errno:24)
:

If need be, we will start reloading ISEE software, but I am hoping that there is a easier way to fix this.

Just for the record, I was not for removing all of the Java software and then reloading. Things were okay except that the output of swlist was not perfect, or completely clear.

Thanks,

Darrell T.


I'll add a quote when I think of one.
Liem Nguyen
Honored Contributor

Re: ISEE is confused

Darrell,

Going back to your first post, let me say a few things about ISEE and Runner.

1. /opt/hpservices/RemoteSupport/config/pruneIncidents.sh is a cron job that runs once/week to clean up old incidents, this cron job is created when you installed ISEE. This has no relationship to Runner.

2. Runner is an independent application to collect system up & down time, it sends the reports to HP for the Account Team to create reports. Only data from Mission Critical Servers are being evaluated.

3. Both ISEE and Runner use the same Java as configured in /opt/hpservices/etc/motprefs.

My recommendation is to download ISEE A.03.95.500.xxx and install on your Servers, if you're still seeing the problem, please contact HP Response Center or your ASE.

Regards,




Darrell Tschakert
Regular Advisor

Re: ISEE is confused

Leim,
I am having trouble following. If ISEE and runner are independent and it appears that it is runner that is having problems, then why would I reload ISEE? It is runner that has the problems since the huge log files are found in /var/opt/runner/logs. Also, running "/sbin/init.d/runner stop" causes the logs to stop growing out of control. Perhaps you want us to load the latest ISEE just to eliminate any problems that ISEE A.03.95.026 might have. But we have ISEE A.03.95.026 running on two 11.23 systems with the same runner and these two servers run fine.

The reason that I thought the pruneincidents.sh was involved is because it runs at 00:00 and certain of the logs appeared to indicate that the problem started at exactly 00:00. I am no longer convinced of this now, however. I am currnetly at home and don't have all of my notes with me so I cannot say exactly how I tied the pruneincidents.sh cron job into the whole thing. In any event, I will forget about pruneincedents.sh for now.

I am reluctant to simply replace runner since:
1. We got into the problem in the first place by removing and reloading software. I don't want to dig in deaper unless it is absolutely necessary.
2. The same version of runner is working fine on two of our HP's.

I still think that there is some pointer or config file that is out of alignment. Can you point me to a document that details the working of runner. It seems that i tried in the past and could not find anything.

Thanks,

Darrell Tschkaert


Thanks,

Darrell Tshckaert

I'll add a quote when I think of one.
Jody Jones
Occasional Visitor

Re: ISEE is confused

It looks like it you simply upgrade runner to the latest release it may solve your problems. The download page is only available on the HP network, so you have to ask to HP rep to get it for you.
Liem Nguyen
Honored Contributor

Re: ISEE is confused

As Jody said and I've said at the very beginning:

"If these are MC Servers and you are still seeing the same problem with Runner filling up /var, please contact your Account Support Team and ask for Runner upgrade."

NOTE: The new Runner Version: 0.00.01039 is NOT bundled with ISEE A.03.95.500, you will need to work with the Account Team to get it and it will correct your problem.

The reason I suggested ISEE upgrade is:
1. Your version is not the latest
2. Confliting info about PruneIncident and motprefs.
3. You should do a clean ISEE install after Java is removed and reinstalled. During ISEE installation, it searches for a suitable Java version on your Server and use this copy by making the reference in motprefs.


Hope this helps,
Darrell Tschakert
Regular Advisor

Re: ISEE is confused

I have fixed the runner problem. It was not necessary to reaload ISEE or runner. I stopped and started runner a couple of times while monitoring runner with HealthCheck.sh. It finally rebuilt a few files including /var/opt/runner/data/hardware. Runner has been working without generating errors for about 18 hours on the four broken HP's.

In my original posting, I asked if anyone knew where I could find documentation on runner. I have googled and searched the HP web site for this documentation, but without luck. If anyone can point me toward this documentation I would be very appreciative. If not, then I will call support and open a ticket.

I did not reload runner or ISEE for a number of reasons - some of which I have already indicated in my postings. The main reason follows:

Some time back, we had trouble getting the GUI screen which is connected directly to the PCI Graphics card to work on some of our rx2620 and rx4640's running 11.23. Two rx2620's were identical and were delivered at the same time from HP, yet one GUI screen worked and one did not. We had four rx4640's with identical hardware which were clones of each other - The GUI screens on two did not work. I opened a ticket with HP support. The HP support person who helped me had me loading patches and software packages until I finally closed the ticket without fixing the problem. I was convinced that if I had gotten someone who really knew DT and other GUI software, they might have seen this problem before and have a silver bullet for fixing it - we were not the only agency with Itaniums. Finally, my supervisor opened a new ticket and insisted that she speak with someone who knew the software area in question. That HP support person was familiar with the problem and thought that a particular
parameter or variable (the name I have forgotten) had gotten unset when the computer was booted without a screen connected to the Graphics card.
Once this variable was unset it never got reset again. Resetting the variable by hand would fix things. He was right. I spent about four weeks working on this problem in my spare time. My supervisor fixed it in ten minutes - but not by reloading software. I don't mean to suggest that patching and reloading software is never required. But when I have identical computers with one working and the other not, it is probably some simple thing that needs resetting.

Thank you,

Darrell Tschakert
I'll add a quote when I think of one.
Jody Jones
Occasional Visitor
Solution

Re: ISEE is confused

Runner documentation is also available only on the internal HP network, so you have to ask your hp rep to get you a copy.
Darrell Tschakert
Regular Advisor

Re: ISEE is confused

It appears as though runner got confused when Java was updated, although I would not bet too heavy of odds on this. In any case, stopping and starting runner - in some cases, twice - appeared to fix the problem. Of six 11.23 systems, five had problems. They all appear to be running properly now. One thing to look for was the size of the /var/opt/runner/data/hardware file. The file was only a few bytes when the problems existed but was about 6K when things got straightened out. The runnerError file has not increased in size on any of the six servers since runner corrected itself.

I also ran HealthCheck.sh to see what the script thought of runner's health.

Thanks,

Darrell Tschakert
I'll add a quote when I think of one.
Darrell Tschakert
Regular Advisor

Re: ISEE is confused

I have since discovered, by using glance, that runner has thousands of FIFO's open. It would appear that runner keeps opening these FIFO's until it has about 4,000 open and then the program starts to generate error messages to the logs which eventually fill up. I still think that it is some simple thing causing this to happen. However, I don't think that I am going to figure it out, nor is anyone going to give me the answer. So I will have to try loading a new copy of runner.

Darrell Tschakert
I'll add a quote when I think of one.