Operating System - HP-UX
1819711 Members
3054 Online
109605 Solutions
New Discussion юеВ

2-second delays in fsync/msync/munmap

 
SOLVED
Go to solution
Kris Kelley
Frequent Advisor

2-second delays in fsync/msync/munmap

Hi,

I'm running our proprietary DBMS on an HP-UX 11.0 and occasionally observe very large elapsed times for my fsync, msync, and munmap calls. When writing to a VxFS or RAID device, they are always near a multiple of 2 seconds (HFS also gives me occasionally large times, but not in multiples of 2 seconds, possibly .5 seconds, but not as obvious). I was able to reduce the problem somewhat with one test variation of our software by avoiding mixing of real I/O and writes to the correspondnog mmap'd pages of the file. However, another variation of our code doesn't use mmap (or msync or munmap) at all - just regular writes and fsync - and has the same problem (e.g. in one test where an index file has many (>100) areas modified, the fsync takes 14-24 seconds).
I've tried to find patches or ITRC posts that describe this problem and haven't found a clear match. We've loaded several patches that sounded close, but they haven't solved it.
65 REPLIES 65
Alzhy
Honored Contributor

Re: 2-second delays in fsync/msync/munmap

How large is (memory wise) is your server? I am thinking - may be you have a lot of filesystem buffer cache getting between your application and your physical disk storage... Have you tried disabling dynamic caching (dbc_max_pct = dbc_min_pct == to say 5% of memory)? Or have you tried using directIO on your mount option to those VxFS filesystems.?
Hakuna Matata.
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

Of all the system calls, munmap() is the most puzzling --- it should be all but instantaneous. This makes me believe you actually have a performance problem rather than an issue directly related to these system calls especially if you have installed the latest I/O/LVM/JFS patches.

One thing that occurs to me is that you may be running extremely large buffer caches; 11.0 tends to degrade in terms of perfomance when buffer cache exceeds about 800MB. You really need to use Glance to see what the system is doing when these delays occur.

One more thing to try: Recreate you vxfs filesystems with a larger logsize -- that may be the logjam.
If it ain't broke, I can fix that.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

top tells me we have 424M of real memory.
dbc_max_pct = 60% (min = 5%)
From various posts, it sounded like around 300M was a good size for a fixed buffer cache size, so I figured 60% would be OK(?).
We haven't tried directIO on the mount. It's currently configured with delaylog and nodatainlog.
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

60% is absolutely terrible and is almost certainly your problem. Top is absolutely meaningless as an indication of the amount of physical memory in your machine. It only concerns itself with process memory and knows nothing about other memory like kernel data structures. You need to find out how much memory you really have and then set buffer cache to a reasonable value but certainly much less than 60%. The default 50% was a terrible choice for HP; typical maximum values are 25% but usually even lower. I actually prefer to pin the value to a known amount by setting bufpages to a non-zero value. This will turn off dynamic buffer cache. Leave nbuf at zero -- the default behavior is almost always best.
If it ain't broke, I can fix that.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

OK, I tried setting bufpages to 75000. It actually made the problem slightly worse.
I also tried a bufpages of 75 (by accident) and this actually helped quite a bit (i.e. the times were around where I'd expect (30 sec) vs. 200). Since our application test does 10 fsyncs each of all the modified files, it may be that we don't really need that much buffer cache?
BTW, the machine has 4GB of real memory.
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

Well 75, (307KB) is way too small; typically about 100MB is considered a reasonable mimimum. Your setting of 75000 (307MB) should have been reasonable and in any event should have been better than 60%. Bear in mind that there are many other processes that need buffer cache. I suppose that you are you are terribly concerned about integrity but you are taking a huge performance hit by doing so much synchronous i/o. By any chance, have you "improved" the setting of timeslice? It should be left at 10. It's really time to use Glance to find out where the bottlenecks are. You could have huge disk bottlenecks.

If it ain't broke, I can fix that.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

The fsyncs are necessary for our DBMS COMMIT operation (to ensure data integrity).
It appears that timeslice has been set to 1.
Alzhy
Honored Contributor

Re: 2-second delays in fsync/msync/munmap

If you're using VxFS, have you tried the following mount options:
.
log,largefiles,mincache=direct,convosync=direct
.
on your "DBMS" filesystems? Say, is this DBMS homegrown? By any chance is it Ingress or PostGress?
Hakuna Matata.
Jeff Schussele
Honored Contributor

Re: 2-second delays in fsync/msync/munmap

Bingo!!!

Set that timeslice back to 10 ASAP.
Having it at 1 could (probably would) be killing this system.
And as Clay has mentioned ~300MB buffer space should be OK.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

As you have probably guessed, I came to suspect a low timeslice setting --- that is what prompted my question. When set to 1 the system under even very moderate loads will do context switching and very little else. I'm all but positive that setting it back to 10 will cure your problem. Because it appears that your box has been tuned by someone with somewhat less than expert knowledge of HP-UX, it would probably be a good idea to post the output of the kmtune command.

Regards, Clay
If it ain't broke, I can fix that.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

I'm sad to say the timeslice change didn't help. Since the busiest of the 4 CPUs is still usually >90% idle, the inefficiency probably wasn't a problem. I've attached the kmtune output.
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

Well, I see a few things that look suspicious but I'm not all that confident; you really need metrics that only a performance tool like Glance can supply. You can install the fully-functional trial version from an Applications CD set.

You could also compile with -p and -g and profile the code to see where this guy is really spending its time.

vnode_cd_hash_locks and vnode_hash_locks are at 2048 while the default is 128. These are related to spinlocks and it is very unusual to set these to other than default values.
I would also return scsi_max_qdepth to 8 -- simply because this parameter is i/o related.

The other thing that I note is the you have fs_async enabled while your application clearly wants synchronous i/o.
If it ain't broke, I can fix that.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

I tried changing some of the parameters you mentioned (with kmtune) but there was no change (BTW, when would you need to use sam vs. kmtune?).
FYI, I put some debug code in the SW and find that - for the msync/munmap version of the SW - the 2-second delay happens periodically (e.g. often about every 110th or 140th msync, and more than half of the munmaps).
I got a trial license for glance and it tells me that I'm waiting on VM over 90-95% of the time (and no I/O wait). Seems a bit high.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

So what do I look at now? I'm not sure what would cause a large Virtual Memory wait.
FYI, for the fsync version, the waits were around 50% for both IO and System.

More info on the msync/munmap version:
There is a distinct pattern of 2-second msync delays. In each COMMIT where the delay appears, it's frequently about the same "n-th" msync on the file (e.g. 18th-22nd out of usually 22-30 total, 1st-4th out of around 170) in the 2 tests I looked at. It also seems interesting that there is only ever ONE appearance of the delay in each commit (which syncs all modified pages), even if a different test does many more (and bigger) msyncs.
There were a less-frequent bunch of fairly consistent "anomalies" where the 2-second delay actually appeared to be split (i.e. total of times was ~2s) between an msync on one file and a couple msyncs later on another file or (less frequently or consistent/certain) 2 adjacent msyncs on the same file.
BTW, some of the msyncs cover more than one contiguous region of modified pages (if this matters).
...and the munmaps generally exhibited the delay about every other call.

In the fsync version, the delay usually occurs every time, and was around 2 or 4 seconds for one test and 14-40 seconds (but always around a multiple of 2) for a larger test.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

I also determined that (for the msync/munmap version), the VM waits do coincide with the 2-sec delays AND appear to be page-ins.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

...for a while. The page-ins have now subsided to near-0 and the VM is still spiking.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

Perhaps a critical clue (msync version)?:
The "Logical Volume Detail" screen (1-sec interval) shows the Writes/sec regularly spiking to around 1500 Writes/sec with Write kb/sec values showing around 4-8KB per write. Is this normal?
Bill Hassell
Honored Contributor

Re: 2-second delays in fsync/msync/munmap

It sounds as if you have a VERY small amount of RAM, perhaps 1Gb? You can verify how bad this is with:

swapinfo -tm

and to see how fast paging (swapping) is going, use vmstat, looking at the po column. Sinle digits are fine, 2 digits marginal, 3 or more digits means that your system is only useful about 10-20% of the time, The rest of the time is wasted getting processes in and out of memory. Add 2Gb or 3Gb to your system and set the maximum buffer cache size to about 400-600 megs.

If you really want to see what is happening on the system, get a copy of Glance installed.


Bill Hassell, sysadmin
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

I had already determined that the system has 4GB of memory.
I have obtained a trial license for Glance. Are there some specific metrics I haven't already reported that you'd like to see?
As far as paging:
With the fsync version of the software, the po column shows 0.
With the mmap/msync version of the software, the po can get into the triple digits, but this would be expected for mmap usage, yes?
The buffer cache is currently set to 5-15%.
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

This is a tough one but I'm guessing that your spikes are the result of coincident fsync()'s and another competing process.

I suggest that you download and read this paper (I actually remembered a reference to a problem similar to yours):
http://docs.hp.com/hpux/onlinedocs/os/11.0/tuningwp.html#know

Look under the "Points of Interest" section. There is a topic which deals with some strange behavior dealing with memory-mapped files. The "workaround" which you have already accidently discovered is to reduce buffer cache. I would set bufpages to 16384 (64MB) and see if the situation improves.
If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

PHKL_28602 appears to address part of the problem described by Stephen in his paper.
If it ain't broke, I can fix that.
Kris Kelley
Frequent Advisor

Re: 2-second delays in fsync/msync/munmap

I have noticed that PHKL_28602 might address the problem. However, it appears to be only for JFS 3.3 and I don't think we're running that.
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

I seemed to have made a fundamental assumption when I read your initial posting -- namely that you were current on patches.
If you have not installed a recent SupportPlus patchset then that is certainly the place to start. There have been a large number of VxFS, I/O, mmap, NFS, and SCSI patches released and a number of them are related to problems in your area. In HP-UX, far more problems are caused by not patching regularly than are avoided by not patching at all.
If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: 2-second delays in fsync/msync/munmap

Look at PHKL_28105 and specifically at the section containing 'VX_NOTHROTTLE'.
If it ain't broke, I can fix that.