Operating System - HP-UX
1834769 Members
2884 Online
110070 Solutions
New Discussion

vx_nospace how to find culprit

 
SOLVED
Go to solution
Doug_3
Frequent Advisor

vx_nospace how to find culprit

Hello all, I have a problem in an N class, hp-ux 11.00. The past two days I am getting this broadcast msg for lvol8, which happens to have /var mounted. I have never had this msg before and no changes have occured to the system that I know of. How do I find the process that is attempting to write to this filesystem or determine if a problem in the OS or H/W is starting to occur.
Thanks in advance
Doug

vx_nospace - /dev/vg00/lvol8 file system full (1 bloc
k extent)
10 REPLIES 10
Sridhar Bhaskarla
Honored Contributor

Re: vx_nospace how to find culprit

Hi Doug,

How are you sure that no changes were done to the system?.

There are lot of processes that write into /var filesystem. You can easily find the sizes of each subdirectory with the following command.

#cd /var
#du -sk * |sort -n

Now go to the directories that appear huge and repeat the same command in there. Below are some directories where you will find files growing.

/var/mail, /var/adm/syslog, /var/tmp, /var/stm, /var/adm/sw
/var/spool

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Michael Steele_2
Honored Contributor

Re: vx_nospace how to find culprit

It???s what it appears to be, probably in /var/tmp which is used by vi and other system utilities. Use the following:

du -kx /var | sort -rn | more

-or-

cleanup -a 1

The latter, cleanup, is applied to /var/adm/sw while the former, du, will list out your file in sorted order, largest to smallest.
Support Fatherhood - Stop Family Law
Paul Sperry
Honored Contributor
Solution

Re: vx_nospace how to find culprit

look in /var/adm/crash and delete any thing in there

Also delete anything in /var/preserve.



Also you can #cat /dev/null > /var/adm/wtmp

(that's the login log)

finally use the /usr/sbin/cleanup tool
# /usr/sbin/cleanup -c 2
is the command I use.


Also you can do a du in var


#cd /var
#du -sk *
Rajeev  Shukla
Honored Contributor

Re: vx_nospace how to find culprit

Hi,
This message really means that your /var is 100%, but does that remain just for some time and goes back to normal or /var gets 100% and you have to manually cleanup.

If its a manual cleanup you are doing then you might be knowing which file it is and you can nail down to who creatd this file and when etc...

Now if /var grows and drops down by itself it could be a temporary file in /var/tmp or someone is opening some big file(i mean doing vi to some big file) and when you close that file /var drops. You'll have really see all your applications if they are using the /var as temporary storage for some applications and deleting them after process finishes.


John J Read
Frequent Advisor

Re: vx_nospace how to find culprit

Here's a few tricks I've picked up to clean up var.

run:
find /var/tmp -atime +3 |xargs ll
If there are large old files, then:

run:
find /var/tmp/ -atime +3 |xargs rm

I put the above line to run daily in cron.


If you don't care about current log files you can:

Zero out cron.log by:
/sbin/init.d cron stop
/sbin/init.d/cron start
( moves cron.log to OLDlog )
then run stop/start again to remove OLDlog.


Zero out /var/adm/mail.log by: cat /dev/null >/var/adm/mail.log


Zero out syslog by doing:
/sbin/init.d/syslogd stop
/sbin/init.d/syslogd start
( this will copy /var/adm/syslog/syslog.log to /var/adm/syslog/OLDsyslog.log. Run stop/start again
to get rid of OLDsyslog.log.


If bdf still says /var is near 100% full then, look for large files.

cd /var
du -sk *

Look for the largest directories and search down those for big log files. You should remove anything in /var/adm/crash unless you are analyzing a recent system crash. I recommend staying away from /var/adm/sw unless you are desperate. You will run into problems if you clean that out and later need to remove patches.

If /var suddenly jumps I would say your culprit is most likely in /var/tmp. If a user does a "vi" on a large file, you will notice it in /var/tmp.

If /var fills up slowly over time, it's most likely some of the log files I mentioned.

Check for large /var/adm/wtmp and /var/adm/btmp files. These are login logs. You can zero these out if you don't care about the logs ( or get them from backup tapes if you need them. )

cat /dev/null >/var/adm/wtmp
cat /dev/null >/var/adm/btmp


Hope this helps. Somewhere I have a script that does all of this and I'll post it if I can find it.
































Patrick Wallek
Honored Contributor

Re: vx_nospace how to find culprit

A big culprit that I have found can also be diagnostics. Check in /var/stm/logs/sys

The activity.log file in that directory can get quite large if diagnostics starts logging lots of errors for some reason.
Jeff Schussele
Honored Contributor

Re: vx_nospace how to find culprit

Hi Doug,

A lot of Application SW will write their tmp files to /var/tmp.

Another dir to check for large files is /var/opt/perf/datafiles IF you're running PerfView.

I'd suggest you set up a cron job to periodically look at /var usage to determine the culprit.
TIP: Don't write this log to anywhere in /var when you create this cron job.

Rgds,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Doug_3
Frequent Advisor

Re: vx_nospace how to find culprit

Thanks to everyone, I am going to clean up /var, but it is actually at 60% full, so something is hammering that file system. It has only occured yesterday and today. I haven't seen any other broadcast msgs and syslog doesn't report anything unusual, so I don't think it is stm or ems or a fault.

I guess I'll wait for something to break or a user to complain, I have no other way of discerning what is causing the issue.

Thanks Again,
Doug
Sridhar Bhaskarla
Honored Contributor

Re: vx_nospace how to find culprit

Hi Doug,

So far the messages covered almost all the points.

Lemme give you one more hint. See if anyone is trying to edit large files. The editor session will keep a copy under /var/tmp thus filling up /var.

We had a developer that tried to open a very big file few times giving us hardtime to figure out what was happening.

-Sri
You may be disappointed if you fail, but you are doomed if you don't try
Bill Hassell
Honored Contributor

Re: vx_nospace how to find culprit

/var is always by far the busiest system directory in HP-UX. If this filesystem is getting hammered, it is due to an application but the trick is to find the app (assuming it is causing a problem). A simple shell script can slam files and/or data records into /var and a runaway series of scripts that are performing: du /var
will definitely hit this filesystem.

The easiest way to track the culprit(s) is to use Glance. You can load this from your Application CDs and run the trial version for a few weeks. In Glance, use the 'o' key to jump to the 'interesting process threshold' screen. To keep the displayed list under control, I usually change the CPU threshold and I/O's per second from 1 to 5 to keep the quiet processes out of the list. Then change the sort method from CPU to IO so the processes creating the most amount of I/O sort to the top.

Then use Glance's ability to examine a single process and display all the open files to see whether this process is using /var. The file pointer shown by Glance will bounce around a lot if a particular file is busy.


Bill Hassell, sysadmin