1833780 Members
2178 Online
110063 Solutions
New Discussion

Serious Sys Mem Problem

 
Justin Willoughby
Regular Advisor

Serious Sys Mem Problem


I have two L3000 boxes with MC/SG. One of the two has an issue. The boxes should be pretty close to being the same.

One of the boxes slowly has the sys memory go up until the point the system is unusable, e.g. there is no real or swap memory left for user processes.

The other box does not have this issue. On the box with the problem it takes several months for all the memory to get used up and then we have to reboot.

Right now glance shows that sys mem is 4.5GB, user mem is 207MB and phys mem is 5GB.

I have used the following command but it does not account for all the sys memory (or even close) that is in use.
UNIX95= ps -eo vsz,ruser,pid,args | sort -rn | head -50

I have attached a screen print from glance and also running ipcs.

I have tried using different command to see what is using all the system memory but I can't seem to figure out what is going on. Any ideas on how to track this down?

Thanks so much,

- Justin
16 REPLIES 16
Jeff Schussele
Honored Contributor

Re: Serious Sys Mem Problem

Hi Justin,

Can you say memory leak?
I knew you could.
The best cure I've seen for this malady is a Louisville Slugger liberally waved at the developers.
What you need to do is scan the PIDs for memory "growth" over time.
Short of that, you're not going to be able to do much about it.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Devender Khatana
Honored Contributor

Re: Serious Sys Mem Problem

Hi,

The difference in load can be due to difference in packages running in both nodes. How many packages are there ?

Have you tried swaping packages in between them and see the utilization?

Is there something else running out of MC Service Guard?

Also "swapinfo -atm" will give memory utilization details.

HTH,
Devender
Impossible itself mentions "I m possible"
Justin Willoughby
Regular Advisor

Re: Serious Sys Mem Problem

No PIDs seem to be using any more memory then usual, at least from what I can see from using

UNIX95= ps -eo vsz,ruser,pid,args | sort -rn | head -50

I am assuming there is some sort of shared memory usage that is not being released but I don't know for sure or how to track it down if that is what is happening.

- Justin
Justin Willoughby
Regular Advisor

Re: Serious Sys Mem Problem

Devender,

There are only 2 packages running on this box and 1 package running on the box that's ok. The applications are very similar, one is for production and two are semi-production.

I have not swapped which servers they run on long enough to see if it's the packages or the node, but I might have to.

The swapinfo command does not really tell me much:


(ifback)root# swapinfo -atm
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4096 1097 2999 27% 0 - 1 /dev/vg00/lvol2
reserve - 1036 -1036
memory 3881 3881 0 100%
total 7977 6014 1963 75% - 0 -
Jeff Schussele
Honored Contributor

Re: Serious Sys Mem Problem

Hi Justin,

The "virtual" (vsz) can be misleading.
Try using ps -efl & try sorting on the 10th column. That's the "resident" size which is the "truer" usage size.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Steven E. Protter
Exalted Contributor

Re: Serious Sys Mem Problem

I'm with the bad application memory leak crowd here.

These issues are almost always caused by poor programming, though I advise against any kind of violence. :=)
Linking a good set of data collection tools.

http://www.hpux.ws/system.perf.sh

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Jeff Schussele
Honored Contributor

Re: Serious Sys Mem Problem

Hey Steven,

Notice I said wave - not thump. ;~))
Shalom, my friend.

Cheers,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Bill Hassell
Honored Contributor

Re: Serious Sys Mem Problem

From Glance, the memory is not user memory (a paltry 200megs) so ipcs and ps won't be useful. This looks like a horrible kernel module memory leak. Unfortunately, none of the normal Unix tools can map out what in the kernel space is consuming this RAM. I would strongly suspect any non-HP driver or subsystem such as SAN, commercial security and monitoring tools, anything that requires code to be added to the kernel. There is a slight chance that kernel patches are needed but without a kernel map, it will be difficult to see what's happening. Can you get a copy of kmeminfo from your sales/support rep?


Bill Hassell, sysadmin
Justin Willoughby
Regular Advisor

Re: Serious Sys Mem Problem

Hi Bill,

I was hoping to not open a case with HP (trying not to bother them) but sounds like I need to so I can get a copy of kmeminfo. I will do that shortly. Thanks,

- Justin
Patrick Wallek
Honored Contributor

Re: Serious Sys Mem Problem

Justin,

Never ever think twice about opening a support call with HP. If you have a support contract, that is what you pay for. You might as well get your moneys worth.

Bill Hassell
Honored Contributor

Re: Serious Sys Mem Problem

Since I was on the HP support team for several years answering calls like this, please do open a call. There are many sysadmins who do not have the luxury of emailing or calling HP for HP-UX support. And the problem you're seeing will likely take a senior engineer to resolve. It's pretty serious since growth is without bounds and your Glance memory listing shows significant paging (swap) which is severely impacting (slowing down) your applications.


Bill Hassell, sysadmin
Justin Willoughby
Regular Advisor

Re: Serious Sys Mem Problem

Well I had to reboot the box a week and a half ago because it ran out of memory and became unusable. I have started checking the memory since the box has been rebooted. It seems that when my backup run is when the system memory goes up. I have seen this happen with the TSM backup that we run nightly and our weekly Ignite backup. I have not seen a problem with memory going up the last few nights with TSM but I just did an Ignite backup and the system memory increased at the time the Ignite backup started by 60MB and never went down after the backup completed.

I have not placed a call to HP yet. I don't see any patches related to Ignite and memory leaks on the itrc. I am thinking I might just upgrade Ignite and see what happens. I am not sure how that could be related to the TSM nightly backup using system memory and not giving it back.

Very strange as the same version of Ignite is loaded on our other box and it's not having the same issue.

- Justin
CAS_2
Valued Contributor

Re: Serious Sys Mem Problem

What HP-UX release is ?

Run

/usr/contrib/Q4/bin/kmeminfo

and post the output.

Q: I assume that the swapinfo output you posted is about box memory was exhausted, doesn't it?
Please, post the swapinfo output when box has no memory problem.
Justin Willoughby
Regular Advisor

Re: Serious Sys Mem Problem

CAS,

We are running HPUX 11.11 on rp4440s.

Attached is the out put from swapinfo and the kmeminfo command. There is no memory problem other then each day or at least every week more system memory is used up and not released.

- Justin
CAS_2
Valued Contributor

Re: Serious Sys Mem Problem

I'd suggest to add a entry in the root's crontab in order to save the output of swapinfo and kmeminfo commands.

If backup might be the problem, I'd run the command

vmstat 2 count # substitue 'count' for the number of times the vmstat will print a line ENOUGH to monitor the memory for the time the backup is running. For example, if backup session last 1 hour, count should be 60 x 60 / 2 = 1800

The goal is to monitor the values of "avm", "free", "po" and "sr" columns.
Justin Willoughby
Regular Advisor

Re: Serious Sys Mem Problem

I'll run the vmstat during an Ignite backup tomorrow. What I just noticed as well is the ignite backup takes over 2 hours. On the other rp4440 in the cluster it takes about an hour to run. Very strange.... - Justin