Operating System - HP-UX
1826432 Members
3913 Online
109692 Solutions
New Discussion

Re: Strange performance issue

 
Elmar P. Kolkman
Honored Contributor

Strange performance issue

Hi gurus,
we are trying to implement a backup with nearline storage and ran into a strange issue. When we backup we have a speed of about 30Mb/s to an NFS filesystem. But when the file reaches the 2Gb, the speed drops down to about 3Mb/s. So we are looking at the backup software. But now I've done some more testing, and the problem is not limited to the backup software.

If I run a cp of a file of 10 Gb (from a local to disk to a NFS mounted disk) and monitor the growth on the copy, it grows at the same rate, about 40Mb/s. But if I copy the file using 'cat > ' the speed drops down from about 37Mb/s before the 2Gb limit to 1-7 Mb/s after that limit.

The OS: HP-UX 11.00
Patches: QPK march 2003 + PHNE_29210 + its requirements.

Backup software: Legato Networker 7.1

NFS server: NetApp filer, but same issue with other NFS servers.

Anyone seen this before? Any thought on a solution? HP and Legato haven't come up with a solution in the last 2 weeks...
Every problem has at least one solution. Only some solutions are harder to find.
19 REPLIES 19
David Burgess
Esteemed Contributor

Re: Strange performance issue

Hi Elmar,

I had an issue with Networker 6.1.3 recently. I was trying to restore an AIX client from an HP server running 11.11. The recovery took hours. I had the network cards at both ends of a crossover cable set to auto negotiate. They were both showing 100 full duplex. It turned out they weren't. When I forced them both to 100 full duplex the recovery speed up and all was ok.

It may not be related, but may help.

I assume you have all the latest patches from Legato and HP.

Regards,

Dave.
Elena Leontieva
Esteemed Contributor

Re: Strange performance issue

Elmar,

Make sure that NFS_CLIENT=1 in the

/etc/rc.config.d/nfsconf

Elena.

Elmar P. Kolkman
Honored Contributor

Re: Strange performance issue

Thanx for the responses, but:

David, this is a known issue to us. If that was the cause, the copy action would drop in performance too.

Elena, if we didn't have NFS_CLIENT=1 in the config file, we wouldn't be able to nfsmount the directory at all...

Better luck next time.
Every problem has at least one solution. Only some solutions are harder to find.
Elmar P. Kolkman
Honored Contributor

Re: Strange performance issue

Something we tested too: ftp the 10Gb file to the NFS disk. Monitoring the growth of the file returns same behaviour.
If we start another ftp, paralel, to the same NFS disk but with another filename, the behaviour is the same, even though there is still a ftp session writing...

Only difference with cat and networker: the performance drops to less then 1 Mb/s in total.
Every problem has at least one solution. Only some solutions are harder to find.
James Murtagh
Honored Contributor

Re: Strange performance issue

Hi Elmar,

Sounds like an interesting one. Once the file reaches 2GB the filesystem will need to use indirection to manage the extra blocks, although I would be surprised if it caused that much performance degredation. Certainly 30MB/s seems good though before this. Its probably easier I try to replicate this and trace it at the various levels. A few things that would help though :

> the nfsstat's from the server and client, also the connection protocol
> size of nfs server memory and what is dedicated to the buffer cache
> type of filesystem used on the nfs server, also state the patches for this fs type
> have you tried any other OS's?
> have you changed any network parameters using ndd on the systems?

Also, what tool are you using to monitor the throughput?

Cheers,

James.
Ralph Haefner
Frequent Advisor

Re: Strange performance issue

Could it have something to do with caching on your disk array (if you're using one)?

I've seen before where a copy slows down once you've transferred enough data to fill up the cache.

Just a wild guess but maybe that will give you an idea to pursue.
Jeff Schussele
Honored Contributor

Re: Strange performance issue

Hi Elmar,

Sounds like a ptach issue to me. I see you already have the NFS perf patch. Do you have the latest LVM cumulative - PHCO_24437?

http://www2.itrc.hp.com/service/patch/patchDetail.do?BC=patch.breadcrumb.search|&patchid=PHCO_24437&context=hpux:800:11:00

Since the network is involved here, I'd also look at the latest ARPA cumulative - PHNE_26771:

http://www2.itrc.hp.com/service/patch/patchDetail.do?BC=patch.breadcrumb.search|&patchid=PHNE_26771&context=hpux:800:11:00

And while you're at it - the latest JFS 3.3 cumulative - PHCO_29258:

http://www2.itrc.hp.com/service/patch/patchDetail.do?BC=patch.breadcrumb.search|&patchid=PHCO_29258&context=hpux:800:11:00

And if fibre channel is involved here, the latest fibre channel cumulative - PHKL_23939:

http://www2.itrc.hp.com/service/patch/patchDetail.do?BC=patch.breadcrumb.search|&patchid=PHKL_23939&context=hpux:800:11:00

And probably should also look at the latest streams cumulative - PHNE_27902:

http://www2.itrc.hp.com/service/patch/patchDetail.do?BC=patch.breadcrumb.search|&patchid=PHNE_27902&context=hpux:800:11:00

I guess the point is if you're not current on patches, now would be a good time to do so as a significant portion of patches are performance related.

HTH,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
A. Clay Stephenson
Acclaimed Contributor

Re: Strange performance issue

I suggest that you read over this paper carefully -- hopefully something will "click" inside your head. Dave did an outstanding job of presenting NFS performance issues and techniques for diagnosing problems.

http://docs.hp.com/hpux/onlinedocs/1435/NFSPerformanceTuninginHP-UX11.0and11iSystems.pdf
If it ain't broke, I can fix that.
Sridhar Bhaskarla
Honored Contributor

Re: Strange performance issue

Hi Elmar,

You can see if this is with NFS in fact by probably doing an ftp or scp. If cache on the system or on the storage is playing a part, then you would see that behaviour through the other tools too.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
James Murtagh
Honored Contributor

Re: Strange performance issue

Well I couldn't replicate this, although I didn't have an 11.00 system. I tested using 11i (well patched) and Solaris 2.8 (unpatched from base)and the throughput was steady before and after 2GBs in any configuration. I suspect it is a patch issue, make sure you are on the latest of the relevant subsystem, especially vxfs if this is what you are using. Also, as mentioned, Dave Olkers online guide is excellent, his book even better. I don't know much about the NetApp product but it may be worth looking into their software too. If the software is on a unix host you can use tusc to trace the server process, use the timestamp option to check for a delay in a system call. If there is a delay here HP can trace the process stack to find the kernel function that is causing a problem.

- James.
Elmar P. Kolkman
Honored Contributor

Re: Strange performance issue

I'm still looking at the given info... A. Clay Stephenson's document is proving to be usefull, but takes time to interpret to our problem. But I now found that I can get the performance after 2Gb the same as before the 2Gb by running without the biod. But the performance will drop down to 17Mb/s the whole way. Running with 1 biod gives about the same speed, running with 2 biods gives the old results: speed drops after 2 Gb. Haven't tried concurrent backups yet. I'm afraid they drop in speed with 1 biod...

Some info I forgot and is asked: we don't have this issue on 11i systems (used as storage node). James, thanks for testing, but we had done this too.

One thing to bear in mind: the problem does not exist when using cp, so it can/should not be directly network related !

Jeff, I will look into the patches.
Every problem has at least one solution. Only some solutions are harder to find.
Elena Leontieva
Esteemed Contributor

Re: Strange performance issue

Just a quick note. Normally your biods processes should not take a lot of CPU - 1-3% depending on the NFS activity. When we had a problem with the application being very slow (the server running (HP-UX 11.00) financial application with the Oracle DB on Network Appliance Filer), our four biods consumed almost all four CPU time.
You may want to check biods and run nfsstat -m during your tests.
I'll do the latest NFS patches first.
A. Clay Stephenson
Acclaimed Contributor

Re: Strange performance issue

I would try running with 0 biod's. I would also bump the number up to about 16 and get that data point. I assume you started with the default 4. Boxes running NFS also typically benefit from larger than normal buffer cache (maybe as much as 800-1200MB). So that you are not altering too many things at once, I would disable dynamic buffer cache by setting bufpages to a non-zero value --- that's my preference anyway for almost all HP-UX boxes. Finally, eventhough it is not obvious, especially for an NFS client, I would make sure that all VxFS/hfs performance patches are installed because many of those affect buffer cache usage -- and that in turn affect even NFS clients -- and biod's.



If it ain't broke, I can fix that.
Elmar P. Kolkman
Honored Contributor

Re: Strange performance issue

We have tested with 0, 1, 2, 4 and 16 biods.

With 16 biods (the situation I started with), the performance dropped after 2 Gb.

With 4 biods and 2 biods, the same.
With 1 biod, the speed is a bit better with 1 stream, but totally unacceptable with multiple streams.

With 0 biods, we have the most stable solution, right now.

I'm going to test patch 28995 (GE-LAN patch) now. Haven't yet have time to look into Jeff's patch list.

Since I don't know yet what solves the problem, I can not yet assign points.
Every problem has at least one solution. Only some solutions are harder to find.
Elmar P. Kolkman
Honored Contributor

Re: Strange performance issue

Of the list of patches in Jeffs post, only the LVM and JFS patch are not installed. I know that the parent directories of the NFS mounts are in those, but for now I will not install those patches. The machine needs to be available as failover node for other software too, so I don't want to be too far off with patches between the two environments.
Every problem has at least one solution. Only some solutions are harder to find.
James Murtagh
Honored Contributor

Re: Strange performance issue

It would be interesting if you could provide the server model, physical memory installed and buffer cache kernel settings. We know that only 25% of the buffer cache can be used with biod's running but that doesn't explain why have one will give you steady performance while more than this it drops....unless they are pulling data too quick over your gigabit lan to the server and saturating the cache, causing constant invalidations/flushes. From your posts it appears(?) that you are always testing the write case to the NFS mounted filesystem - if you haven't already tried then a read test may throw up some interesting results, i.e. copy a file from the NFS mount to the local disk. I'm not quite sure if the buffers marked as writeable to the local disk will still fall under the NFS window but I think its worth testing if you haven't already. Even if they are the writes to the local disks should hopefully be a lot quicker and you should see different results.

- James.
Stefan Farrelly
Honored Contributor

Re: Strange performance issue

Not sure if this will help as you may have already seen this doc but we had a few problems with our Netapps filers too until we followed their documentation exactly;

http://www.netapp.com/tech_library/3146.html

Do everything - hidden kernel parameters etc. Read it thoroughly.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Elmar P. Kolkman
Honored Contributor

Re: Strange performance issue

Stefan, this document was already brought under my attention. It was the reason for the 29210 patch. But thanks.

James, I also tried to read from the disk when copying to another NFS disk. But since I'm going to use those disks for backup, write performance is the most important. When restoring, the network to the client trying to restore will be the bottleneck... So it may be a bit slower, though I assume using biods will improve performance then.

We're still looking into it... And HP and Legato are too. But no solution yet...
Every problem has at least one solution. Only some solutions are harder to find.
Elmar P. Kolkman
Honored Contributor

Re: Strange performance issue

Problem still exists... What I find strange about it, is that apparently no one on this forum has run into this problem... Or nobody noticed it.

HP still has not come up with a solution. As has Legato.
Every problem has at least one solution. Only some solutions are harder to find.