- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: inodes, deleted files, and phantom disk utiliz...
Operating System - HP-UX
1753902
Members
10017
Online
108810
Solutions
Forums
Categories
Company
Local Language
юдл
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
юдл
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-19-2006 08:11 AM
тАО06-19-2006 08:11 AM
inodes, deleted files, and phantom disk utilization
First, let me start off with some information about the configuration we are running:
HP-UX 11.23, VXFS filesystems
Here is the problem: We have a process that is writing to a file and filling up the filesystem (according to standard bdf output). Of course when we check the filesystem we can't find any file that seems abnormally large. I have already seen several posts on this, so I know this is a fairly common problem. We used fuser to identify which processes were causing us problems then we downloaded and installed lsof to get more detail. We were able to identify two processes (parent / child) that were tying up enough disk space to account for the amount missing. Using lsof we were able to get the node number (inode number I assume?) and I tried to run a find (find /bad/filesystem -inum 63), of course this came back with nothing because the file name was removed from the inode table.
Two questions:
1) Is there any place that stores what the last name entry was for an inode? (The reasoning is that the name might give us some clue as to what the purpose of the file is and how it was deleted).
2) From reading other threads it sounds as if the name entry was deleted from the inode table but the inode was not marked as free because my running process still has an open handle on it and presumably is still writing data to it. If data is still being written to this is there any way for me to read, redirect or even intercept this data (for basically the same reason as above)?
We know how to fix the problem (stop / start processes), but we are trying to do some forensics on this because it is a reoccurring issue and causing production down time (my bonus is in jeopardy). The applications are internally developed code, so if I can give the developers some clues they should be able to find and fix the problem.
Thanks
HP-UX 11.23, VXFS filesystems
Here is the problem: We have a process that is writing to a file and filling up the filesystem (according to standard bdf output). Of course when we check the filesystem we can't find any file that seems abnormally large. I have already seen several posts on this, so I know this is a fairly common problem. We used fuser to identify which processes were causing us problems then we downloaded and installed lsof to get more detail. We were able to identify two processes (parent / child) that were tying up enough disk space to account for the amount missing. Using lsof we were able to get the node number (inode number I assume?) and I tried to run a find (find /bad/filesystem -inum 63), of course this came back with nothing because the file name was removed from the inode table.
Two questions:
1) Is there any place that stores what the last name entry was for an inode? (The reasoning is that the name might give us some clue as to what the purpose of the file is and how it was deleted).
2) From reading other threads it sounds as if the name entry was deleted from the inode table but the inode was not marked as free because my running process still has an open handle on it and presumably is still writing data to it. If data is still being written to this is there any way for me to read, redirect or even intercept this data (for basically the same reason as above)?
We know how to fix the problem (stop / start processes), but we are trying to do some forensics on this because it is a reoccurring issue and causing production down time (my bonus is in jeopardy
Thanks
- Tags:
- lsof
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-19-2006 08:39 AM
тАО06-19-2006 08:39 AM
Re: inodes, deleted files, and phantom disk utilization
Hi Christian:
I would suspect that the code creates a temporary file with 'tmpfile()' or 'tmpnam()'. These routines generate files with "random" names that are immediately unlinked (removed). Thus, the file (space) is released when the birthing process terminates.
Aside from seeing the presence of an open file descriptor with 'lsof' or 'glance', there isn't going to be any record of the file's transient existence.
Since the applications are internally developed, you should be able to examine the code and look for 'tmpfile' / 'tmpnam' / 'mktemp' and 'unlink' calls.
Regards!
...JRF...
I would suspect that the code creates a temporary file with 'tmpfile()' or 'tmpnam()'. These routines generate files with "random" names that are immediately unlinked (removed). Thus, the file (space) is released when the birthing process terminates.
Aside from seeing the presence of an open file descriptor with 'lsof' or 'glance', there isn't going to be any record of the file's transient existence.
Since the applications are internally developed, you should be able to examine the code and look for 'tmpfile' / 'tmpnam' / 'mktemp' and 'unlink' calls.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-19-2006 08:46 AM
тАО06-19-2006 08:46 AM
Re: inodes, deleted files, and phantom disk utilization
Hi (again) Christian:
One other caveat. An inode number is only unique within a filesystem. That is, the same number can represent different files in different filesystems (mountpoints).
Regards!
...JRF...
One other caveat. An inode number is only unique within a filesystem. That is, the same number can represent different files in different filesystems (mountpoints).
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-19-2006 09:01 AM
тАО06-19-2006 09:01 AM
Re: inodes, deleted files, and phantom disk utilization
1) That is really not possible. The pathname is stored as a directory entry and the directory entry itself has an inode number. There can be n directory entries pointing to a common inode number; that is how (hard) links are implemented.
2) Not quite right. When a file is unlinked (rm'ed) the directory entry is removed (actally marked as available becaise the directory itself does not shrink) and the corresponding inode's link count (st_nlink) is reduced by 1. If that value is now 0 AND no processes have the file open, the space is then actually returned to the filesystem's free list.
It's actually an extremely common UNIX idiom to open a new file and immediately unlink it. The file is still available for input and output and child processes can access it eventhough there is now no directory entry. This is how the functions which create temporary files work. When the processes terminate, the space is automatically freed whether or not all the processes explicitly close() them.
Man stat, unlink for details.
The way to "fix" this is to have a monitor in place that checks for filesystem free space and issues warnings before the matter is serious. You could also choose to launch the offending processes under a wrapper script that sets a ulimit. Another option is to use quotas.
Your developers already have all the data they need, their write()'s are setting errno = ENOSPC (no space left on filesystem) or EFBIG (quota or ulimit exceeded). What they should do before each write (or at least periodically) is to run statfs() to check for free blocks although normally this is not considered a programmer responsibility. The program should detect the write() failure, log an explicit error message, and exit gracefully. Corrective action should then be taken (e.g. expanding the filesystem).
Normally, an applications programmer is not responsible for making certain that there is room left in the filesystem but he is responsible for clearly indicating the nature of the problem.
2) Not quite right. When a file is unlinked (rm'ed) the directory entry is removed (actally marked as available becaise the directory itself does not shrink) and the corresponding inode's link count (st_nlink) is reduced by 1. If that value is now 0 AND no processes have the file open, the space is then actually returned to the filesystem's free list.
It's actually an extremely common UNIX idiom to open a new file and immediately unlink it. The file is still available for input and output and child processes can access it eventhough there is now no directory entry. This is how the functions which create temporary files work. When the processes terminate, the space is automatically freed whether or not all the processes explicitly close() them.
Man stat, unlink for details.
The way to "fix" this is to have a monitor in place that checks for filesystem free space and issues warnings before the matter is serious. You could also choose to launch the offending processes under a wrapper script that sets a ulimit. Another option is to use quotas.
Your developers already have all the data they need, their write()'s are setting errno = ENOSPC (no space left on filesystem) or EFBIG (quota or ulimit exceeded). What they should do before each write (or at least periodically) is to run statfs() to check for free blocks although normally this is not considered a programmer responsibility. The program should detect the write() failure, log an explicit error message, and exit gracefully. Corrective action should then be taken (e.g. expanding the filesystem).
Normally, an applications programmer is not responsible for making certain that there is room left in the filesystem but he is responsible for clearly indicating the nature of the problem.
If it ain't broke, I can fix that.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP