1833767 Members
2275 Online
110063 Solutions
New Discussion

BDF timing out

 
Josh Dalziel
Occasional Advisor

BDF timing out

From time to time I have a machine that is maxed out on disk I/O. While it is maxed out, My file system monitoring scrpit is failing. Basicly BDF times out. I was wondering if anyone has a quick fix for this?
7 REPLIES 7
Dave Olker
Neighborhood Moderator

Re: BDF timing out

I don't believe there is a fix for this.

It's my understanding that when bdf is attempting to get the statistics on the filesystem it issues a "sync" on the filesystems to get a stable picture of the available disk space and inode usage. If the disks are completely I/O bound, then the sync issued by bdf will get queued and bdf will block waiting for the sync to complete.

Also, I know that if you have any NFS mounted filesystems on this system and the NFS server is not responding, that will also cause bdf to hang while it waits for a response from the NFS server, assuming the filesystem is mounted with the "hard" option (default).

Regards,

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
RAC_1
Honored Contributor

Re: BDF timing out

Restrict your bdf to file systems you want to check.

like bdf /stand etc.

Any nfs mount points??

bdf -l (local FSs)
There is no substitute to HARDWORK
Steven E. Protter
Exalted Contributor

Re: BDF timing out

Lower I/O use. That will fix it.

bdf usually, eventually returns something if you are patient.

This issue can be triggered also by a stale NFS link. If you have an nfs filesystem mounted and the other server has been booted making the connection stale, bdf will take a long time answering or fail.

bdf will also sometimes fail when a disk goes and a filesystem is mounted.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Josh Dalziel
Occasional Advisor

Re: BDF timing out

I can not say about checking for every mount point itself, but that would be a long process itself seeing how I have around 150 mount file systems. I have been thinking about using sar and my fstab in a joint venture to tell me what I want to know
harry d brown jr
Honored Contributor

Re: BDF timing out

What does

showmount -ade

show?

live free or die
harry
Live Free or Die
Dave Olker
Neighborhood Moderator

Re: BDF timing out

Are there certain types of filesystems you're interested in, where you could limit the output of bdf to just those types of filesystems? For example, to limit the output to only vxfs filesystems use "bdf -t vxfs". That would eliminate any hfs, cdfs, lofs, nfs, etc. filesystems from the report and possibly return the report faster.

Dave


I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Bill Hassell
Honored Contributor

Re: BDF timing out

bdf doesn't timeout. It issues requests to the filesystem routines to sync and then it summarizes the results. If bdf hangs, it is usually due to NFS (don't scan NFS filesystems) or a dead/dying disk. First, change your bdf to bdf -l or bdf -t vxfs (and perhaps separately run bdf -t hfs). You don't want to analyze NFS or CDFS in your space monitoring. Note also that running bdf on a really busy system (busy=creating and/or removing lots of inodes) will impact everything that uses the disks. That means that your script should never run every minute, perhaps once every 10 minutes.

Also add a high water marker in your script so that once it reports a problem, it will not report the same problem again unless it gets worse.

For dead/dying disks, you'll need to check in syslog for error messages as well as the word "stale" from vgdisplay -v (for all volume groups that have mirrors). For non-mirrored volumes, look for anything coming out of stderr in vgdisplay and lvlnboot -v. Put those tests in your diskspace checker since they are as much of a problem as full filesystems.


Bill Hassell, sysadmin