Why is 'df -i' diff from sar and /proc/sys/fs?

Michael Steele_2 · ‎05-29-2010

*********/proc/sys/fs/inode-nr
ALLOCATED FREE
INODES INODES

64689 256

********* df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sda1 856480 109553 746927 13% /
none 209419 1 209418 1% /dev/shm
/dev/sda5 146592 502 146090 1% /tmp
/dev/sda3 292608 857 291751 1% /var
/dev/sda6 147744 27 147717 1% /var/tmp
rcdn9-9u-filer01a:/vol/dfs1/usrcisco-linux-x86/usrcisco-linux-rhel3.0-x86-32
8800918 498125 8302793 6% /auto/usrcisco-linux-rhel3.0-x86-32
rcdn9-40b-filer14b:/vol/local3/apollo_dev1_opt
30515486 1 30515485 1% /apps

Support Fatherhood - Stop Family Law

Matti_Kurkela · ‎05-30-2010

Didn't you ask something like this before?

In short: "df -i" is counting apples, while sar and /proc/sys/fs/inode-nr are counting oranges. There is no direct connection between the two kinds.

First, an inode is a fundamental part of the structure of an unix-style filesystem. If there are no free inodes in a filesystem (and no possibility to make more), then no new files can be created to that filesystem, even if there is still free disk space. So running out of inodes in the filesystem severely restricts the usability of that filesystem.

Some filesystems can automatically create new inodes when they're all out of existing ones, or the concept of a fixed number of inodes simply isn't applicable to them. VxFS on HP-UX and GFS on Linux belong to this category.

"df -i" shows the number of free and used inodes in each filesystem.

----------

Second, because inodes are such an essential component, the kernel needs to read and manipulate inodes quite often. Because the access patterns of inodes can be different from the access patterns of actual data, it makes sense to have a dedicated inode cache.

"sar -v" and /proc/sys/fs/inode-nr report the state of this cache.

If the inode cache gets full (and cannot be expanded by the kernel because there is no free RAM), it generally isn't a big problem: the kernel simply picks the least recently used slot in the cache, writes the inode data back to disk (if it's been changed since it was read) and simply recycles the slot for further use. If the old inode data is needed again later, it's simply loaded again from the disk.

Of course this is less efficient than having both inodes in the cache, but the cache cannot grow arbitrarily large: the RAM is finite, and usually the applications' RAM needs are more important than OS-level metadata caching. The kernel will (since version 2.4.18) automatically shrink the inode cache if the memory occupied by it is needed by applications.

On my mostly-idle laptop, the inode numbers of "sar -v" and /proc/sys/fs/inode-nr are generally within about +/- 20 of each other, and the sar value tends to be the bigger one.

I don't know why this difference exists, and a true explanation might require someone with a deep understanding of the Linux kernel's internal works.

But I guess part of the explanation might be similar to an application of Heisenberg's Uncertainty Principle to the IT world: because sar needs to access quite a few files in /proc to get the system statistics it wants, it necessarily causes the state of the inode cache to change. So sar's own activity may be the cause of the difference (or at least some part of it).

MK

MK

Michael Steele_2 · ‎05-30-2010

You seem to be saying that both the inode cache and the inode table are being counted, making some inodes counted twice.

I think you must be on the right track for the difference seen in the number of allocated and free inodes reported in /proc/sys/fs is a fraction of what is being reported in 'df -i'. For example, in the 'df -i' report, 109553 inodes reported used, 746927 reported free. While /proc/sys/fs/inode-nr, is 64689 allocated (* similar to used *) and 256 free.

I don't think the 'df -i' report has any value. And I am looking for the source of this data.

Support Fatherhood - Stop Family Law

Matti_Kurkela · ‎05-30-2010

> the difference seen in the number of allocated and free inodes reported in /proc/sys/fs is a fraction of what is being reported in 'df -i'.

The /proc/sys/fs/inode-nr is reporting about the state of the inode structures currently in system RAM. (Sorry about calling it a "cache" before: upon further research, it seems to be more complicated than a simple cache. Neither is it exactly a table: it is some sort of a dynamic table/list hybrid structure.)

The in-RAM inode structures are *not* a straight copy of every used inode on every filesystem or anything like that: it includes the inode information of every file that is currently being accessed, and some other things like pipes and network sockets too.

"df -i" is reporting about the inode allocation status in the various filesystems on disk, *if* the concept of a limited number of inodes is meaningful to that filesystem.

These are two completely separate types of things. There is no fixed relation between /proc/sys/fs/inode-* and "df -i": don't waste your time looking for one.

/proc/sys/fs/inode-* are important only if you suspect the kernel's inode structure management is failing somehow, or you're doing some really fine tuning to get the absolute maximum performance out of some Linux supercluster.

For a regular sysadmin, they're pretty useless: as long as you have enough RAM, the inode structure will automatically expand to whatever size required.

After kernel version 2.4.18, there is no tunable for a maximum size, because the kernel can automatically shrink the inode structures if there is a need for memory elsewhere and the structure has empty slots that could be reclaimed to free some RAM.

Stop worrying about /proc/sys/fs/inode-* parameters.

>I don't think the 'df -i' report has any value.

On the contrary, 'df -i' is the more important one. The lack of inodes in a filesystem can be a showstopper.

If you fill an ext3 filesystem (created with default settings) with tiny files, the filesystem is likely to run out of inodes before it runs out of disk space. At that point, you start getting "no space left on device" errors when trying to create new files. This might cause your applications to fail. Yet you can write more data into existing files just fine.

You will first run a regular "df", see that there is plenty of space available, and start scratching your head.

Only the "df -i" will reveal what is going on.

The default mke2fs parameters are pretty cleverly chosen so this won't happen very often. But if your application stores e.g. SMS messages as individual files (less than 200 bytes apiece), you might run out of inodes if you didn't have the foresight to use customized inode allocation parameters when creating your SMS storage filesystem.

(Yes, such an application would be pretty stupid. There is a reason why databases have been invented, after all.)

In a "df -i" report, "Inodes" is simply the total number of inodes available in the filesystem. For an ext2/ext3 filesystem, a ratio of "inodes per unit of disk space" is chosen at mkfs time, and then set in stone after that. If you expand an ext2/ext3 filesystem, the added part will have some new inodes, but the overall ratio will stay the same. This is because the ratio determines how the ext3 filesystem is laid out on disk, and the filesystem structure won't allow changing the ratio in the middle of the filesystem.

If "Inodes" is zero, that means the concept of inodes is not applicable to this filesystem (for example, VFAT).

"IUsed" is simply the total number of files, directories, sockets, device nodes, etc. currently on the filesystem.

Finally, "IFree" is simply Inodes - IUsed.

MK

MK

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Why is 'df -i' diff from sar and /proc/sys/fs?

Why is 'df -i' diff from sar and /proc/sys/fs?

Re: Why is 'df -i' diff from sar and /proc/sys/fs?

Re: Why is 'df -i' diff from sar and /proc/sys/fs?

Re: Why is 'df -i' diff from sar and /proc/sys/fs?