1838604 Members
2946 Online
110128 Solutions
New Discussion

VxFSD Physical IO Rate

 
SOLVED
Go to solution
Luis Ernesto Angulo
Occasional Advisor

VxFSD Physical IO Rate

Hi everyone,

We have an issue with the VxFSD process in one HP-UX box. This one is running an SAP Central Instance with an Oracle Database. Basically we have a bottleneck in our internal disks, with an average wait around 60 and avserv around 30.

Average c1t1d0 100.00 16.54 271 5684 58.69 26.57
Average c1t0d0 68.53 12.88 260 5640 30.94 17.28

We don't have our oracle files located in those disks, instead, we have the server attached to an EVA8000. These internal disks are only for OS filesystems (/home, /opt, /stand, etc) and swap.

We are not seeing paging out in our system, but the swapinfo output looks like this:

Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4096 3957 139 97% 0 - 1 /dev/vg00/lvol2
dev 40960 4192 36768 10% 0 - 1 /dev/vg00/swap2
reserve - 13593 -13593
memory 16353 3045 13308 19%
total 61409 24787 36622 40% - 0 -

- vmstat sample:
procs memory page
faults cpu
r b w avm free re at pi po fr de sr in
sy cs us sy id
1 5 0 2371675 73705 19 4 4 0 0 0 12 2407
38196 1080 3 4 93
3 2 0 1822742 73628 34 2 2 0 0 0 0 1820
16363 501 16 2 82
3 2 0 1822742 73628 27 1 1 0 0 0 0 2020
17412 677 3 11 86
3 2 0 1822742 73628 25 2 0 0 0 0 0 1964
33100 607 0 1 99
3 2 0 1822742 73628 20 1 0 0 0 0 0 1967
27591 588 1 2 97
3 2 0 1822742 73628 16 0 0 0 0 0 0 1917
23275 544 0 1 99
3 2 0 1822742 73628 12 0 0 0 0 0 0 1852
19620 488 0 1 99
3 2 0 1822742 73628 9 0 0 0 0 0 0 1862
17392 507 3 1 96
3 2 0 1822742 73628 7 0 0 0 0 0 0 1959
17825 697 2 2 96
3 2 0 1822742 73628 5 0 0 0 0 0 0 1883
14771 609 6 1 93

Given all this, we would like to know why the vxfsd process has a sustained IO rate around 500 (I'm attaching the glanceplus screenshot)...

Regards,
Luis Angulo
8 REPLIES 8
Tim Nelson
Honored Contributor
Solution

Re: VxFSD Physical IO Rate

Two things as you have two separate questions here.

1) Internal OS disks are typically slow, but maybe not that slow. There certainly is something on your OS disks generating alot of IO. What else is on there besides just the OS, what log files ? I assume your sample was taken over a period of time ? Review your process IO stats ( other than vxfsd ) then review open files for that process.


2) My first guess, this is only a guess, vxfsd sustained io, flushing buffers?
What is kmtune or kctune|grep dbc_max ???

Luis Ernesto Angulo
Occasional Advisor

Re: VxFSD Physical IO Rate

Thanks for your reply Tim.

1. The internal disks have, as I said, only OS related files, nothing out of the normal. SAP binaries, Oracle binaries, datafiles, controlfiles and log files are located out of that volume group (in the EVA array). I've been monitoring this environment for the last two weeks (I was hired in a new company three weeks ago) and this behavior has been consistent.

2. Here are the requested parameters:
dbc_max_pct 8 8 Immed
dbc_min_pct 5 Default Immed
These settings are OK according with the SAP installation guide. Given that Oracle has its own buffers and SAP as well, OS buffer is set at 8% as a Maximum Value.

What I was wondering is, why if our server isn't paging out to disk the swapinfo output shows that we have like 4GB used in dev type?
Emil Velez
Honored Contributor

Re: VxFSD Physical IO Rate

you have 2 swap partitions on VG00 and they probably are using the same disks. You could create swap on a different disk that is not on the EVA.

Evidently you seem to not have enough memory and you are doing a signficant amount of paging in and page outs.
Luis Ernesto Angulo
Occasional Advisor

Re: VxFSD Physical IO Rate

That's something strange, I've been monitoring the paging out in the same period of time and I haven't seen that happening. If I launch vmstat for long intervals the "po" column value is consistent in 0, but the internal disks are experimenting higher average wait times in the same period...

The logical volumes assigned to swap are located in the vg00 (internal disks) which are the ones who are experimenting contention.

Again, if I were experimenting issues with the swap I think that the process with a lot of IO would be "swapper" or "vhand" or something like that, and I would see paging out in some moment... Since my issues are related to the IO caused by the vxfsd process, I've been looking at the kernel parameters related to the JFS, but I don't know if some change here would be a benefit for us:

vx_ninode 131072
vxfs_ifree_timelag -1

I tried changing the vx_ninode parameter to 40000 but it didn't help. This behavior is consistent in the PRD and QAS environment (same configuration). We are using JSF 3.5 and kernel parameters are configured according with the SAP installation guide (ECC 6.0).

I don't know what else could I check to avoid this bottleneck in our environments...

Regards,
Tim Nelson
Honored Contributor

Re: VxFSD Physical IO Rate

Looks like you are already investigating some of this. This doc may help.
http://docs.hp.com/en/5992-0732/5992-0732.pdf


One thought.

vxfsd is going to manage all vxfs filesystems. Just because its IO is high and your root disks are high it may not be accurate to blame all of the OS IO on vxfsd.

Something else still needs to be reading/writing to the OS disks to force vxfsd to be active unless there is some huge bug that needs to be patched? What ?

I see no deactivations taking place with your vmstat, unless I am misaligning the columns. Yes sometime in the past 4GB was placed on your second device swap area but without PO stats it is just sitting there..

8% dbc_max is ok, this is still 1.2GB, for a database server I would reduce it to 600-800 meg max, give the mem to the application.

A queue of 16+ on the OS disk and 200+ io/s? Something has to be reading/writing like at a pretty good clip, especially if this is sustained, maybe we need some more sar stats over a longer period ? Even on my busiest server I typically only see 10-20 ios on my root disk if even.

Got some log files that you have noticed growing like crazy ?

Hate to ask this... When is the last time you rebooted ?
Luis Ernesto Angulo
Occasional Advisor

Re: VxFSD Physical IO Rate

Hi,

The server has been up for 73 days. That isn't much time for me. I'm blaming the vxfsd process because I've been looking at physical IO rate per process closely (through glance plus) and it is the only one who maintains a sustained rate above 450. The IO activity over the volume group where the DB resides is very low. So, besides that process, there's no other one with that much IO... Also, as I said, all of the application log files are outside the internal disks, so I really don't understand why that process is stressing the internal disks the whole day...

I'm going to search in the bug database to see if I'm hitting a known issue of the VxFS, but honestly I've been researching these last days and I haven't found a similar situation described by somebody else...

Any other guess or idea is welcome...

Thanks,
Tingli
Esteemed Contributor

Re: VxFSD Physical IO Rate

Please post your solution as I have a similar issue with high vxfsd i/o. And the sar shows
Average c1t1d0 16.48 9.15 41 570 27.81 20.06
Average c1t0d0 16.88 9.56 37 551 25.78 21.72
Luis Ernesto Angulo
Occasional Advisor

Re: VxFSD Physical IO Rate

I haven't found the solution yet...

At least now I know that the vxfsd process is only writing to our /usr logical volume (raw writes) but I'm still not able to find what's triggering these behavior...