Operating System - HP-UX
1823417 Members
2774 Online
109655 Solutions
New Discussion юеВ

Socket or DIsk performance

 
SOLVED
Go to solution
MohitAnchlia
Frequent Advisor

Socket or DIsk performance

I have a java client application that reads files from a directory "A" and sends that data to C++ application on a socket connection. Java client application then moves these files to directory "B" directories. Directory "A" has subdirectories based on the day and time when those files were received. What I am seeing is that when we have 1M+ files the process really slows down. When I take the stack trace I see the following:

"0" prio=10 tid=00055ac0 nid=36 lwp_id=4707656 runnable [21240000..21240738]
at java.io.UnixFileSystem.rename0(Native Method)
at java.io.UnixFileSystem.rename(UnixFileSystem.java:318)
at java.io.File.renameTo(File.java:1212) at com.i.e.R.moveFile(Unknown Source)

It looks like it's taking time to move files. I am not sure why it would be slow in moving files if there are lot of files. I am also not sure if writing on socket is slow. How can I tell which resource is slowing this App. I tried to look at the sar data but I didn't know how to relate the device name with the file system. I've done basic analysis I am looking for something advanced that would tell me about the resources that are slowing this app.
24 REPLIES 24
Hein van den Heuvel
Honored Contributor

Re: Socket or DIsk performance

>> What I am seeing is that when we have 1M+ files the process really slows down.

Sure! Those directories have on on-disk structure which needs to be maintained. As they grow that's more work adn there will be less cache to go arounf.

Check out the tail end of:
http://docs.hp.com/en/5576/JFS_Tuning.pdf

>> It looks like it's taking time to move files.

May we assume the moves are just between directories onthe same underlying volume?

>> I am not sure why it would be slow in moving files if there are lot of files.

Because there is more data to trounce through?

>> I am also not sure if writing on socket is slow. How can I tell which resource is slowing this App.

Isolate and measure! Try 1000 moves outside application context on shell level

What else is the application doing beside a 'rename'? Could it be doing readdir's galore? Or doing a stat on all file before moving 1? Maybe it is blowing the inode cache?

You might want to try 'truss' or other system call trace tool

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

- I haven't looked at the JFS tunning , but I'll look at it. thanks
- Yes it's moving file within same volume
- So when file is moved does it slow down if that directory has lot of files ?
- Yes this application does read files from directory, how can I tell if it blowing the inode cache. Also what's the role of inode cache and how can I say if that's the problem
- How can I use truss and how to intrepret the output in this context ?
- If I find writing on socket is a problem then how can I tell if it's a network related issue. Could it be a network related issue if client and server are on the same box ?
Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

Its as easy (* for the problem file systems *) as looking for a disk bottleneck with 'sar -d'. Refer to monthly data (* if you have it *) in /var/adm/sa/sa##. And use :

sar -d -f /var/adm/sa/sa10 (* for 3/10 *)

-or-

sar -d 5 5 in a cron every fifteen minutes and outputed into a file. Look for disk entries where avwait is greater than avserv. Use 'pvdisplay' to id the file system on the disk.

avwait Average time (in milliseconds) that transfer requests waited idly on queue for the device;

avserv Average time (in milliseconds) to service each transfer request (includes seek, rotational latency, and data transfer times) for the device.

Look into fragmentation. If you have online JFS you should be defragging once a week.

fsadm -F vxfs -d -D -e -E /filesystem

Note: Once started don't interrupt or kill. If too long then read man page under number of passes and how to reduce the number of passes.

As for sockets, 'lsof' (* list of open files *) provides you with part of the information. You'll also neet 'netstat' to cound collisions, etc. Or, the GNU HP-UX version of the Sun command 'snoop'.

'lsof' is also a GNU command. Here's 'lsof'.

http://hpux.cs.utah.edu/hppd/hpux/Sysadmin/

As for a 'snoop' equivalient. look into 'tcpdump', tcptrace, and tcpflow which can be found here:

http://hpux.cs.utah.edu/hppd/auto/ia64-11.31-T.html


Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

I am trying to relate my file system to the devices that I get in sar output. How can I use pvdisplay or lvdisplay to list all the devices in sar. I tries pvdisplay and also pvdisplay but I don't get the information.

I was reading about "Directory Name Lookup Cache" on http://docs.hp.com/en/5576/JFS_Tuning.pdf. How can I verify if this is the problem ? And could this be increased ? Is it advisable to increase this cache ? Since client application reads the file name and then opens it and writes it to the socket I was thinking probably fast lookup from DNLC could help.
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

Could somebody reply to my questions.
Tim Nelson
Honored Contributor

Re: Socket or DIsk performance

lvdisplay will show you the device files used for each logical volume.

e.g.
lvdisplay -v /dev/vgabc/lvol1

or the reverse.

pvdisplay /dev/dsk/cxtxdx will show you what lvols the device is used in.

OldSchool
Honored Contributor

Re: Socket or DIsk performance

pvdisplay /dev/dsk/cXtYdZ -or-
pvdisplay -v /dev/dsk/cXtYdZ

where X, Y and Z come from from the sar output.

However, I suggest that you start with:

"bdf /your/filesystem" ......which will tell you what logical volume the filesystem resides in.

Then do a lvdisplay of the logical volume...from above

"lvdisplay -v /dev/vgXXX/lvolXXX" where the logical volume is taken from the output of the prior "bdf" above. That should give you a list of the physical volumes used by the LV.

You can use the PV Names listed by the LV display to match to the output from SAR



Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

Sorry.

'pvdisplay -v /dev/dsk/c#t#d# | more'

This will list out several pages, you only need the first and second. The other pages will display PE (* physical extent *) data but isn't needed for this.
Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

When I do lvdisplay I get

$ /usr/sbin/lvdisplay /dev/vx/dsk/app1dg/app1vol02
lvdisplay: Illegal path "/dev/vx/dsk/app1dg".
lvdisplay: Cannot display logical volume "/dev/vx/dsk/app1dg/app1vol02".

I took "/dev/vx/dsk/app1dg/app1vol02" this from bdf.

Also, if someone can help me understand how to diagnose if DNLC is a problem. And if increasing this will help.
Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

"/dev/vx/dsk/app1dg/app1vol02".

This is for VXVM not an LVM logical volume
Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

Then how can I measure the performance for VXVM. Could somebody help me understand how can I see how disks are performing for VxVM. Looks previous examples are applicable to LVM.
Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

I'll give you some information on VXVM opposites numbers from the LVM command set. Here's a great URL:

http://www.bhami.com/rosetta.html

pvdisplay vxdisk list
vgdisplay vxdg list / vxprint
Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

So I finally found a way using "vxdisk list" to get the device names for the volume. Then ran grep from sar data. I do see devices for that having more avwait than avserv. I am not sure what to do next. How can I tell what needs to be tuned ?

Snap shot of sar (I've cut the device name)
--
device %busy avque r+w/s blks/s avwait avserv

d0 59.69 44.89 515 4094 38.55 3.46
d1 59.24 42.16 507 3990 36.60 3.43
d2 59.86 40.40 509 4014 34.65 3.44
d3 59.43 41.09 509 4029 35.16 3.45
d4 60.90 44.55 517 4126 37.97 3.50
d5 60.74 40.59 507 3984 34.69 3.50
d6 59.01 43.72 513 4117 36.65 3.46
--
Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

Finding a disk bottleneck is not as clear cut in VXVM as it is in LVM. You have to go through the /etc/pat_to_inst file. Here's a Solaris disk bottleneck doc that refers to disk suite and metastat. It's related to your situation because of /etc/path_to_inst is involved and because slices are used instead of logical volumes.

Paste in your vxdisk list and vxprint data if you can't get what you need.
Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

Attached document tells how to identify associated file system. I already know which file system these disks point to. I am trying to understand how much of a problem it is and how it can be tuned.
Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

Oh, well, in that case you have a number of options beginning with your hardware and raid level. If this is a raid 5 consider mirroring only or stripping and mirroring, raids 0, 1, or 01 or 10. Raids 01 and 10 will be striped out in disk groups and disk groups can be tricky when one disk in the group fails. Depends on the type of disk array that you have.

Check the rotation speed on your disks and consider getting faster disks.

From a O/S level consider additional file system on additional disk to load balance better. This is probably your best choice.

But reviewing your sar -d report all of your avwait times are 10 times greater than avserv times. This is a significant bottleneck.

I'd throw more disks at the problem. Usually adding in more spindles is what Oracle and other databases will also recommend.

Consult your dba's for advice on what the database and application recommend for optimal performance and compatiabilty. For example, they may not be able to handle a new file system well. Especailly if code changes are involved.
Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

One thing I don't understand is that when writing files it's fast. But, while moving files it slows down. What could be the theory ? Is there a way to balance I/O within existing resources.
Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

If you are using online JFS then you can defrag your file systems to increase performance.

To test for online JFS try

fsadm -F vxfs -D -E /filesystem

What does this return?
Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

I don't even see -D as an option.

This is what I get for fsadm -h:

usage: fsadm [-F FStype] [-V] [-o specific_options] special
Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

these are preview report options, no harm.
Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

We recently cleaned the file_system that is slow to perform another test. We started a test that writes to this file system 3 hrs ago. So fsadm output that I have below is from few files in that file system. We plan to run this test for another 6 hrs. I'll run this again and send the output again.

--

Directory Fragmentation Report
Dirs Total Immed Immeds Dirs to Blocks to
Searched Blocks Dirs to Add Reduce Reduce
total 1224 45616 6 0 485 2735

Extent Fragmentation Report
Total Average Average Total
Files File Blks # Extents Free Blks
1543980 44 1 595439040
blocks used for indirects: 1696
% Free blocks in extents smaller than 64 blks: 0.66
% Free blocks in extents smaller than 8 blks: 0.01
% blks allocated to extents 64 blks or larger: 87.36
Free Extents By Size
1: 4018 2: 11403 4: 1134
8: 110718 16: 61461 32: 63118
64: 113601 128: 29814 256: 1
512: 2058 1024: 14989 2048: 14527
4096: 9963 8192: 804 16384: 1
32768: 10 65536: 10 131072: 7
262144: 6 524288: 4 1048576: 3
2097152: 4 4194304: 4 8388608: 4
16777216: 1 33554432: 0 67108864: 0
134217728: 3 268435456: 0 536870912: 0
1073741824: 0 2147483648: 0
---
Michael Steele_2
Honored Contributor
Solution

Re: Socket or DIsk performance

So it looks like you've got online JFS. Now run defrag. But note, was started NEVER kill -9 or stop the procedure early.

fsadm -F vxfs -d -D -e -E /file system

Also, refer to man pages on fsadm to change the default of 5 pass throughs to something less if this is going to take too long.

And the first time will be long and slow.
Support Fatherhood - Stop Family Law
MohitAnchlia
Frequent Advisor

Re: Socket or DIsk performance

Here are the results from today morning. Do you think fragmenting would help ? "Dirs to reduce" seems to be high compares to "Dirs searched".

--
Directory Fragmentation Report
Dirs Total Immed Immeds Dirs to Blocks to
Searched Blocks Dirs to Add Reduce Reduce
total 1954 154823 13 2 1897 5086

Extent Fragmentation Report
Total Average Average Total
Files File Blks # Extents Free Blks
5389324 32 1 489126803
blocks used for indirects: 4256
% Free blocks in extents smaller than 64 blks: 2.11
% Free blocks in extents smaller than 8 blks: 0.15
% blks allocated to extents 64 blks or larger: 86.16
Free Extents By Size
1: 6587 2: 191242 4: 89499
8: 210269 16: 159417 32: 167483
64: 203406 128: 31834 256: 1
512: 1 1024: 1 2048: 0
4096: 0 8192: 1 16384: 1
32768: 9 65536: 10 131072: 7
262144: 6 524288: 2 1048576: 2
2097152: 3 4194304: 3 8388608: 2
16777216: 1 33554432: 0 67108864: 0
134217728: 3 268435456: 0 536870912: 0
1073741824: 0 2147483648: 0

--
Michael Steele_2
Honored Contributor

Re: Socket or DIsk performance

You should be defragging periodically. Like once a month or once a week. The more you defrag the better you're performance will be and the lesser the time of each procedure.
Support Fatherhood - Stop Family Law