Operating System - Tru64 Unix
1752794 Members
5783 Online
108789 Solutions
New Discussion юеВ

Re: vfast on clusters

 
SOLVED
Go to solution
Han Pilmeyer
Esteemed Contributor

Re: vfast on clusters

vFast can use as much CPU and I/O resources as it wants, provided that their is no useful "user" work that needs to be done (subject to percent_ios_when_busy limit).
Orjan Petersson
Frequent Advisor

Re: vfast on clusters

(To put some of the figures into context: the machines have 4 CPUs each)

I restarted vfast (activate, defragment=enable) on the 3 domains yesterday at 18h30.
The system was pretty idle (load average between 1 and 3; "kernel idle" %cpu 10%-25%) until about 22h30. Then the load average increased to 4-5, and "kernel idle" %cpu increased to 105%-115%.

It stayed like that until the morning, and at around 8 o'clock when people started using the system (Sunday is a normal working day here), the load average increased to 9-10, but the "kernel idle" %cpu stayed at 110%. At that time the CPU definitely had other useful "user" work to do.
At around 9, I stopped vfast (defragment=disable, deactivate) on the eppix_apps domain which made everything go back to normal.

When the change at 22h30 happened, "collect" data shows that CPU Idle goes down from around 60% to 15%-30%, and CPU System increases from 15% to 45%
The eppix_apps domain has been defragmented (with defragment) a few days ago without any errors, and verify on the active domain reported only minor problems ("probably due to file system activity")

This is the 2nd night in a row that I see this behaviour. I only started to run vfast a few days ago so I can not tell if it has been like that before.

Any ideas of other things I can check to get an understanding of what is really going on?

I will let vfast continue on the two domains to see if the %cpu for kernel idle will increase during the night or not.
Han Pilmeyer
Esteemed Contributor

Re: vfast on clusters

I just checked and the last performance fix for vFast is in BL26 (PK5). So this would most likely be an unknown issue at the moment.
Orjan Petersson
Frequent Advisor

Re: vfast on clusters

Thanks Han,
The result from running vfast during the night on the two domains was no increase in %cpu for "kernel idle" so the problem seems to be related to the specific domain "eppix_apps".

I will raise this issue through our local support company but I will leave this site in a few days so I am not sure how much will come out of that.
Joris Denayer
Respected Contributor

Re: vfast on clusters

Han, Orian

I've seen similar behaviour on an internal server and also on one of our customers' system.

With vfast enabled, we observed kernel_idle CPU percentages upto 200%.
The vfast kernelthreads took ~180% of the total. At the same time the IO load was considerable. High enough to impact the applications throughput.

Disabling balance and topIObalance didn't change much.

On our local system, large files (up to 200GB are created) while on the customer system a couple of million small files/month are created.

The first file_creation pattern is completely different from the second one.

At the end, vfast was disabled and now the customer is again running a defragment, a couple of hours during the night and a complete day during the weekend.

I guess that the algoritms that must calculate the permitted cpu/io load are not working correctly under certain conditions.

Possible conditions might be:
- creation of very large number of files/day
- creation of very large files with very much extents

To err is human, but to really faul things up requires a computer
Orjan Petersson
Frequent Advisor

Re: vfast on clusters

Good to know that we are not the only ones seeing this behaviour.

We do not create any huge files on the domain, but are more in the "large number of files/day" case (on the order of 10000/day).

I would like to avoid running defragment as I experienced the same kind of cluster service check timeouts as described here.

http://www.ornl.gov/lists/mailing-lists/tru64-unix-managers/2005/04/msg00021.html

(That happenened even though I ran defragment on the same cluster member as the one serving the domain)