1827806 Members
2246 Online
109969 Solutions
New Discussion

Re: Task to task

 
labadie_1
Honored Contributor

Re: Task to task

I see

2) the 10 cluster stations start booting. Because the system disk is in shadow merge, they all have to do their reading twice. I checked and found that they transfer each 350 MByte over their 10 MBit network interface (normally 200). The last boot was finished at 8:00 !!!

You shoud definitely use BOOTSYNC
http://h71000.www7.hp.com/freeware/freeware50/bootsync/

I guess you know that you can shutdown a Cluster Station under the >>> (and not with the message "system shutdown complete...), and then boot them one by one from the boot node with a
$ mc ncl trigger node service password 16char ...
of course, on the station, you must do first
>>> set trig 1
>>> set mop 1
>>> set pswd 16char

I did that long ago

:-)

regards

gerard
Wim Van den Wyngaert
Honored Contributor

Re: Task to task

Labadie

Don't need bootsync because I have a home made one in use. And again, why doesn't it work ?
Wim
Stanley F Quayle
Valued Contributor

Re: Task to task

Priority 1 or not, the shadow set rebuild of the system disk greatly impacts process creation, including FAL.

Can't you get a small UPS and shut down the central node cleanly when the power goes away?

http://www.stanq.com/charon-vax.html
Wim Van den Wyngaert
Honored Contributor

Re: Task to task

I am not looking for a solution because I have one, retry later, but I want to know why it does't work, why the cpu isn't given to the processes, etc.
And power cuts are not always planned and ups can not be used for every station.

Wim
Jan van den Ende
Honored Contributor

Re: Task to task

Wim,
there WAS an announcement for a shadow patch that enabled MINImerge. It was to become available for V7.3-1 (and only later for 7.3-2) end 2003/early 2004. I am not aware if it HAS been released, but that should not be to hard to find out.
If you implement that, your shadow merge time will be down to centiseconds, and I guess that will effectively solve your problem. Get the patch, or find out how long you still have to wait. I guess waiting a few (?) days WILL be better than a tweeked workaround.... ;-)

jan
Don't rust yours pelled jacker to fine doll missed aches.
Wim Van den Wyngaert
Honored Contributor

Re: Task to task

Jan,

We don't install patches over here and we just arrived at 7.3 and are not going to upgrade.

See my numberred questions.
Wim
Uwe Zessin
Honored Contributor

Re: Task to task

2) You processes very likely don't get CPU because they are waiting for the disks! There a multiple I/O streams going on:

- the shadow merge is a pure sequential operation. It reads data from both disks, compares them and corrects any differents.

- then you have all user I/O. Any read beyond the 'merge fence' (the yet unmerged area) does an implicit merge (it reads from _both_ disks and corrects any differences - during normal operation the shadow driver distributes the I/O requests over both members). These reads are synchronous operations, so the shadow driver must wait for boths reads to continue. Also, both disks are not synchonized - they don't hit sector 1 at the same time, which results in an additional delay, because the system must wait for the second I/O to complete.

So, your disk's read/write heads spend _much_ more time with movements. All I/O operations take longer.

4) It is my understanding that they are coming in through the MSCP server which is running on the 'interrupt stack' - there simply is no process for which to account the I/Os.
.
Wim Van den Wyngaert
Honored Contributor

Re: Task to task

Uwe,

OK but that doesn't explain why the cpu is not given to the process during at least 1 minute.
Suppose the same thing happens in production : every time a shadow merge is done, processes no longer receive cpu. Or is it the unvisible IO that is done that runs in real time ?
Wim
labadie_1
Honored Contributor

Re: Task to task

May be your cluster needs some tuning ?

Check mscp is ok on all the nodes serving Mscp disks with

MSCP Disk Server, Usage And Performance Tuning

http://h18000.www1.hp.com/support/asktima/operating_systems/009324D2-0E9A6D00-1C01E7.html
Wim Van den Wyngaert
Honored Contributor

Re: Task to task

Labadie,

Nope. Rebooted a station and the 3 rates are almost zero.
Wim
Uwe Zessin
Honored Contributor

Re: Task to task

Wim,
how did you measure that a process didn't receive any CPU time for a minute? Note that a process can receive CPU for a very short time and then it is not accounted in its performance counters. Sorry, I don't understand the network problem and can help you there, but I try to give some background information and hope you find useful.

The problem is not giving CPU to a process. The CPU is simply idle. The problem is that the processes are waiting for their I/Os to complete. They cannot continue, because they need the data from the disk(s) to continue processing it. The disks are so much busy moving their heads that an I/O queue has build up.

Imagine lots of people going to the newsstand to buy something. Nobody can leave and continue until he got served at the newstand. The owner is busy walking around. The longer the queue is - the more the customers are waiting. OK, that doesn't fully reflect the situation of a disk, but I hope you get the idea.
.
Wim Van den Wyngaert
Honored Contributor

Re: Task to task

Correct Uwe.

CPU is charged 8%. But is mscp in it ?

Compute q 0.4

VPA reports q of 4 average and peak at 15 (non virtual IO)
Wim
Uwe Zessin
Honored Contributor

Re: Task to task

I haven't used VPA in years - last seen 1989 at most. If time on 'interrupt stack' is included, then I think, yes. I hope somebody else can explain.
.