System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

pfmstat-tkt SEP was restarting

dawn_jose85
Frequent Advisor

pfmstat-tkt SEP was restarting

Hi,
In my hp proliant server i tried to install a package . But i'm getting error
I noticed that all seps have been restarted this morning 0519 hrs india time.

The reason is the SMF1_pkg has been switched to ips122 at 0504 hrs india time and switched back to ips222at 0519 hrs india time

Jan 21 05:03:37 ips122 cmcld[26440]: (ips222) Halted package smf1_pkg on node ips222.
Jan 21 05:04:19 ips122 cmcld[26440]: Request from node ips122 to start package smf1_pkg on node ips122.
Jan 21 05:04:19 ips122 cmcld[26440]: Executing '/usr/local/cmcluster/conf/alcatel_osp_packages/smf1_pkg.sh start' for package smf1_pkg, as service PKG*5377.
â ¦..
Jan 21 05:19:07 ips122 CM-CMD[21144]: cmrunpkg -v -n ips222 smf1_pkg
Jan 21 05:19:07 ips122 CM-CMD[21144]: Request from root on node ips122 to start package

I'm attaching a log .CAn anyone suggest
12 REPLIES
Alzhy
Honored Contributor

Re: pfmstat-tkt SEP was restarting

What is your issue.?
This is ServiceGuard on Linux right?

What are you trying to do?
Hakuna Matata.
dawn_jose85
Frequent Advisor

Re: pfmstat-tkt SEP was restarting

Hi,

our psmfstat-tkt SEP ( process in server )
was restarting
dawn_jose85
Frequent Advisor

Re: pfmstat-tkt SEP was restarting

hi can anyone help me to find out the rootcause of this issue . This issue is going critical
Matti_Kurkela
Honored Contributor

Re: pfmstat-tkt SEP was restarting

When a package is moved from one node to another, it is first halted on the original node and then restarted on the new node. This obviously means all the processes that belong to the package must restart when a package is moved.

Your log snippet indicates someone was logged in as root on Jan 21 05:19:07 and gave the "cmrunpkg" command to restart the package smf1_pkg on node ips222.

Your log snippet is too short to indicate why the package was halted on 05:03:37. Maybe a "cmhaltpkg" command was run before that? If so, then you must find who was logged on as root at that time, and ask him/her why s/he moved the package.

The gdb dump in your attachment is not useful without knowing a lot more about the program that was dumped. You would need a solid understanding of the structure of the program to find out what it was doing at the moment of the dump, and you would need the debugging symbols specific for that particular program version to convert the "??"s to meaningful function/method names. Since the program seems to be multi-threaded, understanding it is likely to be extra difficult.

If the application produces its own log, it might be a lot more useful than gdb dumps.

MK
MK
dawn_jose85
Frequent Advisor

Re: pfmstat-tkt SEP was restarting

Hi

I think Issue was not with cluster switch over/cluster package. Problem is that once cluster is switch over, application is crashing in ips222

I'm not able to identify the exact issue
Alzhy
Honored Contributor

Re: pfmstat-tkt SEP was restarting

Well surely your package's apps, etc have logs that could give you a clue as to what's going on?
Hakuna Matata.
dawn_jose85
Frequent Advisor

Re: pfmstat-tkt SEP was restarting

yeah
i installed the package .
It is showing about 76% of memory usage daily.
it is good?
dawn_jose85
Frequent Advisor

Re: pfmstat-tkt SEP was restarting

yeah
i installed the package .
It is showing about 76% of memory usage daily.
it is good?

This is the output
07:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
07:10:01 AM 3900956 12404988 76.08 648736 6490124 24002632 0 0.00 0
07:20:01 AM 3896948 12408996 76.10 648968 6492152 24002632 0 0.00 0
07:30:01 AM 3893212 12412732 76.12 649176 6494456 24002632 0 0.00 0
07:40:01 AM 3887252 12418692 76.16 649416 6497012 24002632 0 0.00 0
07:50:01 AM 3882824 12423120 76.19 649540 6499772 24002632 0 0.00 0
08:00:01 AM 3875664 12430280 76.23 649716 6502772 24002632 0 0.00 0
08:10:01 AM 3878772 12427172 76.21 649892 6495804 24002632 0 0.00 0
08:20:01 AM 3871248 12434696 76.26 650092 6499236 24002632 0 0.00 0
08:30:01 AM 3860652 12445292 76.32 650312 6502908 24002632 0 0.00 0
08:40:01 AM 3850408 12455536 76.39 650504 6506808 24002632 0 0.00 0
08:50:01 AM 3840728 12465216 76.45 650720 6510748 24002632 0 0.00 0
09:00:01 AM 3841020 12464924 76.44 650980 6504504 24002632 0 0.00 0
09:10:01 AM 3831572 12474372 76.50 651196 6508772 24002632 0 0.00 0
09:20:01 AM 3831408 12474536 76.50 651368 6502804 24002632 0 0.00 0
09:30:01 AM 3818400 12487544 76.58 651468 6507152 24002632 0 0.00 0
09:40:01 AM 3812732 12493212 76.62 651604 6511600 24002632 0 0.00 0
09:50:01 AM 3813576 12492368 76.61 651796 6505884 24002632 0 0.00 0
10:00:01 AM 3800340 12505604 76.69 651992 6510592 24002632 0 0.00 0
10:10:01 AM 3791032 12514912 76.75 652252 6515464 24002632 0 0.00 0
10:20:01 AM 3800464 12505480 76.69 652476 6499532 24002632 0 0.00 0
10:30:01 AM 3790792 12515152 76.75 652684 6504592 24002632 0 0.00 0
10:40:01 AM 3784460 12521484 76.79 652916 6509652 24002632 0 0.00 0
10:50:01 AM 3772928 12533016 76.86 653140 6514412 24002632 0 0.00 0
11:00:01 AM 3769224 12536720 76.88 653400 6509264 24002632 0 0.00 0
11:10:01 AM 3769000 12536944 76.89 653644 6503828 24002632 0 0.00 0
11:20:01 AM 3751744 12554200 76.99 653856 6508924 24002632 0 0.00 0
11:30:01 AM 3745576 12560368 77.03 654064 6513828 24002632 0 0.00 0
11:40:01 AM 3743044 12562900 77.04 654244 6508532 24002632 0 0.00 0
11:50:01 AM 3731016 12574928 77.12 654360 6513632 24002632 0 0.00 0
12:00:01 PM 3717600 12588344 77.20 654616 6518784 24002632 0 0.00 0
Average: 3854224 12451720 76.36 647041 6505996 24002632 0 0.00 0
dawn_jose85
Frequent Advisor

Re: pfmstat-tkt SEP was restarting

I'm suspecting issue with RAM
eventhough the server is not showing anything. In the other member of the cluster the same application is working properly. After we swicth over to the other member , the same package is getting crashed .
Matti_Kurkela
Honored Contributor

Re: pfmstat-tkt SEP was restarting

>I'm suspecting issue with RAM eventhough the server is not showing anything.

That's possible.

One possible way to diagnose it would be to switch the package to the server that does not crash, and then run memtest86 or some other rigorous memory test in a continuous loop on the crashy server for a few days or so.

> In the other member of the cluster the same application is working properly. After we swicth over to the other member , the same package is getting crashed .

Are both servers at exactly the same patch level? Perhaps one member has a patched version of some library, and the other has a buggy, unpatched version?

On both nodes, run this command (all on one line):

rpm -qa --queryformat '%{NAME}%{VERSION}%{RELEASE}.%{ARCH}\n' | sort > packages.$(hostname -s).txt

You'll get one file on each node: packages.ips122.txt and packages.ips222.txt. Move both files to the same node, and then run:

diff -u packages.ips122.txt packages.ips222.txt

If the nodes don't have exactly the same versions of RPM packages installed, this command will tell you the differences. If it outputs nothing, the lists are identical.

MK
MK
dawn_jose85
Frequent Advisor

Re: pfmstat-tkt SEP was restarting

The package version is same ,
The package is not crashing if we run it in non problamatic server.
dawn_jose85
Frequent Advisor

Re: pfmstat-tkt SEP was restarting

I had reduced my RAM to 16GB in the problamatic server .Now the package is working fine in the server .
But i have a doubt
We have two servers in cluster . One server was of 16GB and the other was of 32GB.
Is it required that two nodes in the cluster should have same capacity of RAM ?
Means is it required that my two nodes in the cluster should be of either 32GB or 16GB?