Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

nfs directory problem - whose fault?

 
Jim Strehlow
Advisor

nfs directory problem - whose fault?

We had a problem with a NFS directory.
I wonder whose software is at fault.

An Oracle 8.1.7.4 application on a Windows 2003 server writes a text file into a local directory.
Third party software on that server shares the directory as a NFS directory.

On v7.3-1 OpenVMS on a timer we do a
$ DIRECTORY nfsDirectoryName:*.TO_DO
and process any of the TO_DO files.

That has worked for several years ...
until some file was written that created a problem. The problematic file looked normal in the Windows directory.
The OpenVMS application process hangs.

We notice the NFS$ACP1 process use a lot of resources and take 70% CPU.

When I manually run the OpenVMS $ DIRECTORY command to investigate the problem, it lists the problem file over and over again until I pressed .

I had the problem file deleted on the Windows side. The problem went away.

We have applied all patches for OpenVMS 7.3-1 and TCP/IP v5.3-184 (V0503-184-4).

We can not reproduce the problem at will.

Do we open a problem ticket with HP?
- is it a problem with the DIRECTORY command?
- is the problem with the Compaq TCP/IP version 5.3-184 stack that mounts the disk?

Is it a Windows third-party NFS problem?
Is it a problem with Windows 2003 Server?


Anyone else had such a dilemma?

(That problem does not occur where we have the Oracle database on OpenVMS since we do not need to have a NFS directory.)

Thanks.

<---- Jim ---->
9 REPLIES
Uwe Zessin
Honored Contributor

Re: nfs directory problem - whose fault?

If the same file is shown over and over again, it looks like a directory corruption. Have you tried an '$ ANALYZE /DISK_STRUCTURE'?
.
Martin P.J. Zinser
Honored Contributor

Re: nfs directory problem - whose fault?

Hi,

you write you cannot reproduce the problem, i.e. if you do create a file with the same name now on your Windows Server the problem does not happen?

If this is so it will be very tough for hp to track this one down. Generally speaking NFS had a number of issues in previous TCP/IP versions and hp is working to correct them. 5.4 (+ ECOs) might help, but if the problem does not reoccur there is also not much pressure to upgrade.

Greetings, Martin
Jim Strehlow
Advisor

Re: nfs directory problem - whose fault?

If the same file is shown over and over again, it looks like a directory corruption. Have you tried an '$ ANALYZE /DISK_STRUCTURE'?

The files are on the Windows server and not on OpenVMS. I can not use that command.



Something fails within the file itself to confuse OpenVMS's DIRECTORY command or to confuse the NFS$ACP1 process that provides the information. But where?

What a nightmare for me or H.P. or a third party vendor to try to sort it out.

The only recent difference was to install the application on a Windows 2003 Server instead of on a Windows 2000 server.

The application is a 24x7 one in production for years. When the problem occurs, we must immediately delete the file to allow the application to continue and to not tax the OpenVMS CPU. We can not "wait for it to happen" and then which vendor needs to debug what component?

<---- Jim ---->
Martin P.J. Zinser
Honored Contributor

Re: nfs directory problem - whose fault?

Hello Jim,

what type of systems are we talking here? If you have servers with EV7 chips OCLA might be of help, as it allows you to capture the instructions executed on the CPU. Support might be able to figure something from there (as you certainly know the current problem description does not provide much to start a focussed investigation).

Greetings, Martin
Jan van den Ende
Honored Contributor

Re: nfs directory problem - whose fault?

Jim,

your title question is an easy one: who is at fault?
Well, of course the one who decided to run Oracle on M$ware! :-(

I am afraid I have too little expertise on Windoze to 'solve' your problem, but I can point out what I would do in the line of damage control:

From your question I make it that this is running a rather vital, 7 * 24 applic, and (part of?) it gets interrupted if this problem occurs.
The symptoms are: a DIRECTORY command on an NFS-attached file that goes into stampede, accompanied by the NFS$ACP1 becoming a heavy CPU burner.

I don't immediately see a way to trap your derailed DIR command, but it should be relatively easy to create a small DCL procedure that checks every 'interval' for the amount of CPU used by NFS$ACP1, and if that explodes, clean away the offending file.
By clean away I would first try to just RENAME it away, so as to keep it for investigation, but be prepared that maybe it somehow even then keeps trashing things, so you should deal with that as well.

Of course you should signal the event!
In our case I would issue a pager call. I guess if you have a serious 7 * 24 operation, you have something like it in place, your computer centre is staffed around the clock. In the latter case, MAIL to that staff and instruct them to warn you immediately at all hours. (we DO have an interesting job, after all!)


hth


Jan
Don't rust yours pelled jacker to fine doll missed aches.
Willem Grooters
Honored Contributor

Re: nfs directory problem - whose fault?


The only recent difference was to install the application on a Windows 2003 Server instead of on a Windows 2000 server.

... and after that, their was this problem.

Ok, culprit found: the one who decided to install Windows2003 _without_ proper testing.
For what I know, there are quite some internal changes in Windows2003 internals compared to Windows2000, and it IS known that some software (even from Microsoft!)cannot work well - if not worse - with Windows2003 because of this.
If your 3rdParty software runs fine on Windows2000, it might have a problem on Windows2003. Either reverse to Windows2000, or get a new (Windows2003-compatible) release of your 3rdParty software.

Willem
Willem Grooters
OpenVMS Developer & System Manager
Anton van Ruitenbeek
Trusted Contributor

Re: nfs directory problem - whose fault?

Jim,

I've got a client who is also running Oracle as 24*7*365,24 p/y. They are (ofcourse) running on VMS, but have printers which are using (god forbid it) Windows to print. On VMS we have AdvancedServer running and de Windows PCs are looking into VMS in stead of VMS is looking in Windows. So we got all the stability of VMS and the PC may reboot as will.
Is there a option to use FTP to move the file to VMS ? So you don't need the 3rd party software again ?

Another question: How do you backup you're Oracle if this is 24*7 up ?

AvR
NL: Meten is weten, maar je moet weten hoe te meten! - UK: Measuremets is knowledge, but you need to know how to measure !
Jim Strehlow
Advisor

Re: nfs directory problem - whose fault?

Our application works fine with Oracle on OpenVMS. Some of our customers "feel more comfortable" with Windows and even insist on Windows 2003; so that is what gets installed. (Some customers have I.T. staff who are not familiar with OpenVMS; but they can not figure out the problem on Windows and they blame OpenVMS.)

It is probably the third party NFS serving application that is the problem; but we can not currently prove that. The problem can not be reproduced at will. But, forum postings indicate that the OpenVMS tcp/ip stack has "had its share of historical problems" and can not be totally ruled out.

For the question about backing up Oracle, there are several HOT BACKUP solutions to export data and copy files; so there is no problem there. But each of our customers has the choice of a different Windows backup software vendor.

For the comment about 24x7 operations center, some companies conduct business 24x7 without a 24x7 I.T. staff on site. With pagers and cell phones and budget limitations, some of our customers do not have I.T. staff on site. We have a 24x7 (800) phone number for remote support.

Anton, our Oracle database writes files to the NFS directory so that OpenVMS can process the information. One software application writes to an NFS directory that an OpenVMS application checks for activity and prints to tcp/ip printers using Northlake Software's PRINT KIT.
We "could" rewrite the application to use FTP; but the application has run for years under the multiplatform scenario without a problem. It would require a new Windows service (a new piece that could break) to perform the FTP work, checking, etc. I would rather insist on going back to Windows 2000.

A goal of posting this problem was to let others see what multiple platform solutions some companies insist on installing (a misguided sense of comfort level having Windows.)

The more moving parts there are, the easier it is for something to break.

<--- Jim --->
(My opinions are my own and not necessarily those of the company for whom I work.)
Willem Grooters
Honored Contributor

Re: nfs directory problem - whose fault?


... have I.T. staff who are not familiar with OpenVMS; but they can not figure out the problem on Windows and they blame OpenVMS.)


As long as certification is about knowing what key or button to press, these people will NEVER understand what it's all about.
These are the 'specialist' that companies rely on for their 24*7*365 operations. It looks good on you resume: "System administrator". But it's just an empty shell. Biggest problem is these people are hired by CIO's that are barely better :-((((

(Sorry, I did have bad experiences before on this....)

Willem
Willem Grooters
OpenVMS Developer & System Manager