1827884 Members
1297 Online
109969 Solutions
New Discussion

Re: System slow

 
SOLVED
Go to solution
John Russell_13
New Member

System slow

Our VMS guy is on leave. I'm a windows guy and have been left with the task of watching over our Alpha for a few days. Today the system is crawling. We have 3 cpu's and 2GB of ram. I'm attaching a file with some screenscrapes. Can anyone tell me what's going on here. Any assistance would be greatly appreciated.

Thanks in advance.

John
20 REPLIES 20
Ian Miller.
Honored Contributor
Solution

Re: System slow

Your system appears a bit short free memory but as the paging rate is not excessive I expect thats not the problem. I wonder if you have a bottleneck on one disk. Could you post the result of
MONITOR DISK/ITEM=Q/INT=1/VIEW=10

I wonder if some disk will show an excessive queue length.

Can you post more detail on your disk subsystem (scsi, FC) raid or not and so on.
____________________
Purely Personal Opinion
Robert Gezelter
Honored Contributor

Re: System slow

John,

Please also do the following commands and post the output:

$ SHOW SYSTEM
$ SHO QUEUE/BATCH/ALL

One of the possibilities that I would like to eliminate is the possibility that you have a batch queue at an interactive priority, with a large,resource hungry batch job competing for resources.

- Bob Gezelter, http://www.rlgsc.com
John Russell_13
New Member

Re: System slow

Hello,

Thanks for your replies. It's late in the day and everything is back to normal at this point. But here is the info you both asked for.

2 HSG80's running dual redundant fiber paths. We are letting VMS shadow. No raid whatsoever. I'm attaching a .txt as well.

Thanks,

John
John Russell_13
New Member

Re: System slow

Also, Ian, what would be considered excessive queue length? I will check this tomorrow during peak time but I have no idea how this would be measured. I do know that one of our shadow sets are accessed quite frequently by many endusers.
Travis Craig
Frequent Advisor

Re: System slow

John,

I haven't dealt with disk queue length much, but I ran a test on my machine that is pretty heavily loaded, with a disk I/O rate of 50-100, and the average queue length is well under 1. I suspect that an average of 2 or 3 would mean at least one process is quite disk bound, and other users of the same disk will run slowly. I don't know whether that effect would apply to the whole affected controller or not.

I notice you have that one job that has consumed most of a CPU and has done a lot of I/O's, but it has done them over a long period of time (12-18 days). That would be between 700000 and 1000000 per day. That seems like quite a few if they are all disk I/O's, so it might be hogging quite a lot of one disk's throughput. Whether that would slow down everyone would depend on whether they are using the same disk or controller. I guess the process's CPU use isn't a problem because you have 3 CPU's. I assume its priority is not a problem, for the same reason.

I don't see anything else in your outputs that stands out to me.

--Travis Craig
My head is cold.
Ian Miller.
Honored Contributor

Re: System slow

shadowset DATA4 has a average queue length of 4 in the displays you posted. This is worth investigating. The other disks have a minimal queue length. Either lots of I/O is directed to that disk and it is overloaded or there is a problem and it is not responding as fast as the other disks.

$ MONITOR DISK/ITEM=OP

will tell you the operation rate to each disk.
If its found that a lot of I/O is going to DSA4 then
$ SHOW DEVICES/FILES DSA4
will tell you the files open on that disk - talk to the application people about what the files are.

Has the workload changed recently?
____________________
Purely Personal Opinion
Petr Spisek
Regular Advisor

Re: System slow

Hi,
queue lenght 4 (permanently) not so good for performance. Try to find hotfiles and separate this.
How looks interrupts on your system? (MONITOR MODE).

Petr
Robert Gezelter
Honored Contributor

Re: System slow

John,

Ok, first things first. The queue length of 4 is a potential problem, if it persists for an extended period. If it is a momentary thing, it is not as much of a problem.

My curiosity is piqued by JOB0f UZ02DRV, however. In the 18 days that the system since the system has last been booted, it has accumulated 12 days of CPU time (translation, one of the CPUs has effectively been 66% occupied by this job since bootstrap -- presuming that the job started at boot time -- if it was started later, it is more suspicious).

Working from here, it is hard to diagose, but I wonder what that job is doing, and would suggest checking if the IO is originating with that job.

I hope that the above is helpful.

- Bob Gezelter, http://www.rlgsc.com
Jeff Chisholm
Valued Contributor

Re: System slow

Is this the appropriate time to bring up our System Performance Analysis offer? In exchange for some minor $$'s we will locate and tell you just how to work around performance bottlenecks. See the details on the web page. Regards, Jeff

http://www.hp.com/hps/perevent/valupack/openvmssysadmin/vp002.html
le plus ca change...
John Russell_13
New Member

Re: System slow

Hello Gentleman,

The job that people are suspicous about is a real time interface between our patient billing/scheduling system and the patient/billing system that is used by the hospitals in our healthcare network. It runs 24/7 and used to be a problem until we upgraded from 1 to 3 CPU's.

About a month ago we had a major software upgrade and I've been told that physical memory is an issue. Being a financially poor healthcare system, we've been told that we cannot purchase memory at this time and must make due with what the alpha system has. As I mentioned before, out VMS guru is on leave and now we're not sure when he's coming back so I'm having to learn OpenVMS very quickly.

I'll attach another .txt of what's happening with the system as we speak. It's actually flying today. Some days it's very sluggish. All of the information I've gotten in this string has been very helpful in knowing what to look for. I really appreciate all of you giving me this information. I think I want to be an OpenVMS guru when I grow up!! I love the monitor utility. It's fun!!

Anyway, DSA4 is our heavy hitter. I'm going to have someone from HP look into our HSG controller configuration and make sure it's behaving efficiently/correctly.

I'm just going to assume that a few large jobs were running yesterday and killing our Finance database and I'm going to do what I can to prevent this from happening again.

Thank you all!!! Any other suggestions would be greatly appreciated!!

John
Hein van den Heuvel
Honored Contributor

Re: System slow

John,
For a windows guy you seem to be doing an admirable job trying to make sense of this VMS system! There could be a carreer here :-).

Bob brings up a good point high-lighting this JOB0f UZ02DRV process. That's an awfull lot of CPU and a good chunck of the total IOs. It that job trying to accomplish a single tasks, or is it a slave waiting for work items?

Still, that would have been a problem all along, and best dealt with by the regular staff.

I suspect (WAG) that for your immediate (now past) incident you had some shadowing event. Something where the system decided to read entire disks and tried to compare & resync them? Dunno. An other good chunk of the IOs has indeed come from the SHADOW_SERVER.

The XFC hit rate is abysmall. That is not normal. It would be good to try to understand that, but deep(er) application knowledge is needed for that. Large file scans? Private (non-vms-backup) backup technique (simple copy?)?

Good luck,
Hein.
Hein van den Heuvel
Honored Contributor

Re: System slow


Looks like you were re-replung while I posed mine.

Nice to see the MONI MODE. It excludes for example and RMS or Oracle problem (no exec mode)

Looks like an intersystems cache application.
You really should try to work with that DB vendor to see if the caches are set up correctly, the execution plans are behavign and to forth. You may want to give more memory to its cache an not have the, ineffective for you, XFC cache try to waste time on blocks that are not going to be re-requested.

Check with intersystems whether this low XFC is typical for their VMS solution (Maybe they should learn to issue 'bypass cache' style IOs for the main DB work).

Hein.
Lawrence Czlapinski
Trusted Contributor

Re: System slow

John,
1. Can you attach a MONITOR CLUSTER and MONITOR PROC/TOPCPU outputs when you have this situation again. It would give an overview of the situation.
2. If you can get AVAIL_MAN installed, it would give you a continous look at % CPU usage and CPU queues on all the nodes. It would also show you % memory usage, etc. I often notice runaway CPU processes that way. It would show which jobs are getting CPU time and which are waiting for CPU time.
3. If it is some large jobs taking up a lot of the CPU time perhaps they should be running as batch jobs at a lower priority than interactive jobs. Or perhaps something is running at a higher priority and keeping the rest of the jobs from getting a quantum of time. Batch jobs should be running at a lower priority than interactive jobs.
4. I would also recommend looking to see if you have a "Guide to OpenVMS Performance Mangement" manual around. It has some troubleshooting trees.
Lawrence
Lawrence Czlapinski
Trusted Contributor

Re: System slow

John, after rereading your question it looks like you have one system with 3 CPUs. So disregard the MON CLUSTER.
Lawrence
Robert Gezelter
Honored Contributor

Re: System slow

John,

When you increased the number of CPUs, the CPU hog job basically got its own CPU. This is has the potential to cause other problems, as may be the case here.

The last time a client presented a similar situation to me, we were able to identify and eliminate the source of the high CPU consumption, with a substantial savings in resources.

The SHOW MEMORY command will show the memory utilization. Even if you see a low free page count, you may not need additional memory, it may be a straightforward tuning issue.

I hope that the preceeding is helpful, it is difficult to "get a handle" on the situation when one is this remote from it.

- Bob Gezelter, http://www.rlgsc.com
Ian Miller.
Honored Contributor

Re: System slow

look in SYS$MANAGER:OPERATOR.LOG to see if there are messages from the shadowing software particularly DSA4.
What is CACHE.DAT ?

Does

$ SHOW MEMORY/CACHE=(TOPQIO,VOLUME=DSA4)

Show any particular file?
____________________
Purely Personal Opinion
Jeff Chisholm
Valued Contributor

Re: System slow

The Intersystems Cache product uses it's own global sections and maps chunks of physical memory for improving disk io performance. The VIOC and XFC are not recommended by them, since both of these take memory resources that could be assigned to the application.

It's possible to turn on XFC, but you'd want to limit the max size and exclude the Cache database volumes.

Take care what you recommend, and what recommendations you implement, unless you've got some experience tuning Cache.
le plus ca change...
Antoniov.
Honored Contributor

Re: System slow

John,
welcome to vms world :-)

Because you are newbie in vms you could run $ @SYS$UPDATE:AUTOGEN SAVPARAMS GETDATA
This command write two files SYS$SYSTEM:AGEN$FEEDBACK.DAT and SYS$SYSTEM:PARAMS.DAT
Looking at AGEN$FEEDBACK.DAT you can see how os evaluates system parameters.
The above command doen't make any change, so it's no dangerous and you can understand more about system parameters of vms.

Antonio Vigliotti
Antonio Maria Vigliotti
Antoniov.
Honored Contributor

Re: System slow

John,
I've forgotten you can find all techincal vms documetantion here
http://h71000.www7.hp.com/doc/os73_index.html

Antonio Vigliotti
Antonio Maria Vigliotti
Jan van den Ende
Honored Contributor

Re: System slow

Re Antonio:

AUTOGEN is a very usefull tool, but in this case I would be very carefull with interpreting its advises, because of the "3rd party" cache. It really requires someone with experience with that stuff to look at the advise, and consider what is valid, and what is not in this case!

Jan
Don't rust yours pelled jacker to fine doll missed aches.