Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Hard drive problem

SOLVED
Go to solution
Scc_3
Advisor

Hard drive problem

Hello All,
While running program on Vax, The programs usually create a data file and let the program carry on to dump data in. Last week monday, for some reason the system or Harddrive (dka400) don't write or create that file. And cause error and can not continue running the program. I move all the date to other drive and working fine there.
A few days later I move all the data back to dka400: and working fine, till yesterday it happen again.
I wonder will that be that Hard drive have problem ?
And also, right now I dump all the data (data directory only) to Dka100: and let the user carry on. After I login, I type
mount
dka100
label
the device status is mounted alloc.
and when I logout it dism the dka100:
How to set it so it will not to dismount after logout.

Thanks !
Scc




There is something funny happing for the last 2 week. While users running programs to create quotation, the program will create a file on write information into it. Two weeks ago monday
error show up everytime when create a file.
Example : Program try to create a file name
Q123456.300 (300 is the date) q123456 is the quotation number. look to me I can write or create this file name in that Hard Drive. I copy all the date to another di
32 REPLIES
Joseph Huber_1
Honored Contributor

Re: Hard drive problem

Of course it can be a hardware error caused by disk building up hardware defects.
What error messages get Your program/procedure when file creation is failing ?

Have You looked into the error log of this system: does it contain entries for this device (dka400:) ? Go to sys$specific:[syserr] and do analyze/error.


>>How to set it so it will not to dismount after logout.

You are mounting the disk "private", i.e. only for the actual process/login: You have to mount the disk either /GROUP or /SYSTEM to have the mounted disk available for other users (of Your group or system-wide).
http://www.mpp.mpg.de/~huber
Ian Miller.
Honored Contributor
Solution

Re: Hard drive problem

$ MOUNT DKA100 label /SYSTEM

will mount the drive and keep it mounted when you logout.

Try
$ ANAL/DISK/READ DKA100:

to check the file structure of the disk and read every file. This may show some errors.
There may be a problem on an area of the disk currently not in a file. You can create a file that fills the disk and then delete it forcing all the blocks to be written.

$ COPY NL: DKA100:[000000]TMP.TMP/ALLOC=x
$ SET FILE/END DKA100:[000000]TMP.TMP
$ DELETE/ERASE DKA100:[000000]TMP.TMP;

where x is the number of free blocks. This will test the drive and show up any bad blocks in the free area. Bad blocks will be revectored or otherwise made unavailable.
____________________
Purely Personal Opinion
Scc_3
Advisor

Re: Hard drive problem

Hello,
The error message was feed back from my in-house programs. Example program "a" get information and create a file call q123456.300 and this file was format to a size was set to be use. Then call program "b"
and "b" trying to open this file and write information in to it. Since this file is not create at all, usually it does but now it not. The error message from my in-house program say "Can't open q123456.300, file not found", of cause since this file is not create at all.

Last nite, I try to create a file thru. edit
example :
edit q123456.318 and the system don't allow me to do that, the screen go to quick I can't catch them. Something like ACP file access .......
and today I create the same file and is o.k
that is very strange.

An about the mount disk, So you think I can't just use
mount dka100: label
since this disk is for everyone to use. Maybe
I mount this disk in system account, instead of my own account. But I don't think there is any different between them.
All I need is after I mount the disk, the disk will not dismount when I log out.
Thanks !
Scc
Joseph Huber_1
Honored Contributor

Re: Hard drive problem

Since the file is not created by program "a":
what error is this program receiving , or does it no error/success checking at all ?
If so, make program "a" logging errors.

Since You apparently are not the system manager, contact the responsible to have a look into the errorlog after You know the error received from the file creating program.

To mount a disk /group or /system, the account needs either the SYSNAM or GRPNAM privilege, and depending on the device protection, SYSPRV,OPER or GROUP privilege.
If it's a disk usually accessible by all users, ask the system manager to add the mount/system dka400: to the system startup procedure.
http://www.mpp.mpg.de/~huber
Scc_3
Advisor

Re: Hard drive problem

Hello Ian,
$ COPY NL: DKA100:[000000]TMP.TMP/ALLOC=x
$ SET FILE/END DKA100:[000000]TMP.TMP
$ DELETE/ERASE DKA100:[000000]TMP.TMP;

I am using dka400: instead of dka100: since dka400: is the one have problem. I follow your step and I type the tmp.tmp out like following :

?
% type-f-writeerr,error writing sys$output:,:
-rms-f-sys, Qio system service request fail
-system-f-exquota, exceeded quota

Do I have to do something with the quota stuff.
Thanks

Uwe Zessin
Honored Contributor

Re: Hard drive problem

No, you can ignore the quota stuff - you are not supposed to
$ TYPE TMP.TMP
anyway. The only purpose of Ian's commands is to write over all unused blocks of the disk in order to test them.
.
Mike Reznak
Trusted Contributor

Re: Hard drive problem

VMS Help for your message.

EXQUOTA, process quota exceeded

Facility: SYSTEM, System Services

Explanation: An image could not continue executing or a command could
not execute because the process exceeded one of its resource
quotas or limits.

If this message is associated with a status code returned
by a request to a DR32 interface, a DR11-W interface, or an
LPA11-K driver, the AST quota for the requesting process is
exceeded. In the latter case, an AST cannot be queued for a
buffer full/empty AST. Normally, a start data transfer request
can require no more than three AST blocks at a time.

If this message is associated with a vector disabled (VECDIS)
status code, the process's paging file quota prohibits the
allocation of sufficient process memory for storing its
mainline vector state.

If this message is associated with a status code returned by
a request to a DUP11 interface, a request cannot be queued
because the buffered I/O quota is exceeded.

This message can indicate failure to create a subprocess
because deductible quotas, when subtracted from the current
quotas of the creator, would not leave the minimum required
quotas for the creator.

This message may also occur if the size of a buffered I/O
request exceeds the value of the SYSGEN parameter MAXBUF.

User Action: Use the DCL command SHOW PROCESS/QUOTAS to determine the
current quotas and to determine which quota is exceeded.
Determine whether any subprocesses are hibernating and
are no longer performing useful functions; delete any such
subprocesses.

If a program fails consistently because of insufficient
quotas, ask the system manager to increase your quotas.

Mike
...and I think to myself, what a wonderful world ;o)
Scc_3
Advisor

Re: Hard drive problem

Hello,
I do the tmp.tmp already and pointing to dka400:[trs.data] to run a quotation, same error it just don't create the file name for me.
Point to dka100:[trs.data], no problem...
That is very strange. Same program and the data was copy over from dka400: to dka100:

Maybe I have to look into something else to solve this.

Thanks !
Scc
Mike Reznak
Trusted Contributor

Re: Hard drive problem

The question not asked yet. What VMS version and have latest patches been installed?

M
...and I think to myself, what a wonderful world ;o)
Mike Reznak
Trusted Contributor

Re: Hard drive problem

...and do you see disk errors on dka400 increasing ?
...and I think to myself, what a wonderful world ;o)
Scc_3
Advisor

Re: Hard drive problem

Hello,
Vms 5.5, there is no more error on dka400:
There are 4 - 5 files was deleted long time
but report while running ana/disk_structure as "incorrect in VBN 2313 of directory trs.data (3043,1,1)"
Scc
Mike Reznak
Trusted Contributor

Re: Hard drive problem

Have you tried ?

$ anal/disk/repair

Mike
...and I think to myself, what a wonderful world ;o)
Ian Miller.
Honored Contributor

Re: Hard drive problem

can you post the entire output from ANAL/DISK
____________________
Purely Personal Opinion
Scc_3
Advisor

Re: Hard drive problem

Hello,
ana/disk_structure dka400:
%analdisk-i-openquota, error opening quota.sys
-system-w-nosuchfile, no suck file
%analdisk-w-bad_nameorder, filename ordering incorrect in VbN 2312 of directory trs.data (3043,1,1)
filename are q162296.094 and q162288.094


P.s I found 2 files name the same was create on the 04-apr-2005 q162296.094;1 and can't delete it.
Thanks !
Scc
Scc_3
Advisor

Re: Hard drive problem

Hello All,

Wow, that is very strange, I ran the same program again just last mins ago, using the same program and pointing at the bad data directory that give me problem before and now it working fine.

Scc
Uwe Zessin
Honored Contributor

Re: Hard drive problem

If you really used the command:

> ana/disk_structure dka400:

without the /REPAIR qualifier, then ANALYZE did not temporarily lock the disk against changes. If some job is still running, there is a certain chance that ANALYZE does not see a consistent picture of the disk and print out false warnings.
.
Ian Miller.
Honored Contributor

Re: Hard drive problem

Recommended action for bad_nameorder errors is to delete the directory so that the files become lost (SET FILE/NODIR name.dir; DELETE name.dir), repair the disk (ANAL/DISK/REP) to recover the files, create a new empty directory and rename the files into it.

However you said its ok now?
____________________
Purely Personal Opinion
Scc_3
Advisor

Re: Hard drive problem

Hello Ian,

I do a dir and see 2 copies of q162296.094;1
I delete one copy and the other one can't do it.

What I just did is, I went in and delete all file q*.094;* and after it done. I only see one copy left (q162296.094), but I can't delete it.
I ran another ana/disk_structure dka400:/repair (twice)
Now all error is gone, except the quota.sys , no such file

I hope this is the problem. But I will try your way to delete all the files out from this directory, since they are old and no good anymore. and create a new directory and copy the current data over. Or I just let the user running on dka100:

Thanks !
SCC
Uwe Zessin
Honored Contributor

Re: Hard drive problem

> Now all error is gone, except the quota.sys , no such file

I'm pretty sure the message looks like:
-SYSTEM-W-...

That is just a warning and you can safely ignore it, because you are apparently not using disk quotas on this disk, so the file QUOTA.SYS is absent. (Don't try to create it - again, just ignore the message).
.
Scc_3
Advisor

Re: Hard drive problem

Hello All,

I do a check Ana/disk_structure on all drives
all are good except that quota.sys. I believe is fine, since we are not using that quota.sys file here.

I have a question is. I set up a drive dka100: usually this drive is to back up all data from dka200,300 and 400. Act as a backup drive, just in case the user lost some file by delete they. I can recover the file quicker instead getting it from the tape.

I ran the ana/disk_structure on that drive too,
There are 3 error saying pagefile.sys, swapfile.sys and a7353ccc.txt;3 inconsistent highwater mark and efblk. Do a repair still there. Is this error message o.k or I can go in and delete this 3 files.

Thanks !
Scc

Uwe Zessin
Honored Contributor

Re: Hard drive problem

You can ignore that, too.

This seems some inconsistency created by BACKUP which, as far as I can tell, disappears once the file gets accessed by the operating system.
.
Scc_3
Advisor

Re: Hard drive problem

Hello again.
I have a question again. Since I don't run ANA/dish_structure offen.

I was running ana/disk_structure while the other user are write/create new files in Dka100:

There is an error message show up saying
-analdisk-w-baddirent, invalid file identification in directory entry [trs.data]inv008002.doc;1
-analdisk-i-bad_dirheader, no valid file header for directory

Is this o.k ?

Scc
Uwe Zessin
Honored Contributor

Re: Hard drive problem

SCC,
are the users working while you run ANALYZE?

Again, if you don't use the /REPAIR qualifier it means that the disk is not locked and any change can produce inconsistencies and it is impossible to tell if this is a real or a temporary problem.
.
Scc_3
Advisor

Re: Hard drive problem

Hello,
Yes, I just checking it no repair. The data look o.k.
I can run again at lunch time /repair and leave it, run again tonite while they all gone and see is there any new error.
Thanks!
Scc