1845531 Members
2678 Online
110244 Solutions
New Discussion

filesystem full mystery

 
bernjvg
Visitor

filesystem full mystery

Not been here for awhile.   Got a familar issue with a twist.   I have old HP-UX 11.11 server running a WANG emulator program which has a 12GB filesystem that fills up.  There are only about 100MB of files in the filesystem indicated with du so there is nothing to delete.  So this seems like the classic "someone deleted an open file issue"   Both lsof and fuser show no open files.  In fact when this occurs we are able to unmount the filesystem and mount it again and the space is recovered back down to 1% used.   This filesystem has a tmp area used by the emulator so I suspect it is doing something with writing files and not closing them properly but there doesn't appear to be a open process tied to it.     If we reset the mount the filesystem will fill up again in a couple hours.    This whole thing started yesterday and we are trying to find out what they changed on the app.

Any ideas on tracking down the source?   Also wondering if there might be another way to clear this without doing the remounts to buy us some time to track this down.    Thanks for any advice!

3 REPLIES 3
Matti_Kurkela
Honored Contributor

Re: filesystem full mystery

Does "lsof +L1" show anything at all?

 

Is this filesystem exported with NFS? (check /etc/exports)

 

If a NFS client causes a "deleted an open file" issue, it can be hard to detect on the NFS server since there is no userspace process associated with the open file on the server: the NFS server works within the kernel. But on the NFS client, it should be detectable as normal.

 

When you reset the mount, did you run fsck on it? Did it report the filesystem is clean?

After cleanly unmounting the filesystem, you might run "fsck -F vxfs -n -o full,nolog" on it to make a full filesystem check. If it detects errors, you may want to take an extra backup, then run the fsck commad again without the  "-n" option to actually fix the errors.

MK
Dennis Handly
Acclaimed Contributor

Re: filesystem full mystery

>If a NFS client causes a "deleted an open file" issue, it can be hard to detect on the NFS server

 

Hmm, I thought every time that happens, it creates a .nfs* file.  And if you try to remove that, it will create another.

So I'm not sure you can ever delete an open file over NFS, it just turns it into a hidden file.

You should be able to find it with du(1) or find . -name ".nfs*".

bernjvg
Visitor

Re: filesystem full mystery

No NFS involved.   lsof doesn't show anything.   I ran fsck a couple times and it did come back clean, but I do not need to run the fsck with the mount, that is all that is needed.  

 

just an update on this and my current theory.  First of the the suspected process/effort that we think was behind this is not runing today and the problem has ceased.   Yesterday they stopped this migration effort mid morning and the issue continued, however we found out that there were several hung ftpd processes trying to send files to a certain directory (not the one with the issue) and those processes could not be killed until the server was reset.  The directory those processes were writing too was inaccessible.   It appears they were kicking off hundreds (if not over a thousand) ftp processes to this location.  On the back end the application was likely trying to grab these ftp'd files and process them.  Since that app uses the filesytem with the space issues as its "temp" I believe that was the cause behind this mystery.   They will not be running those for at least a few days, once they start again we will be asking them to do scaled back or in a different manner.