Operating System - HP-UX
1821802 Members
3141 Online
109637 Solutions
New Discussion юеВ

Re: Disk write/cache and network

 
SOLVED
Go to solution
Helen French
Honored Contributor

Disk write/cache and network

Hi All,

We have a cron process that uploads data from some real time systems to the HP9K server everyday. The data size range is from 10KB to 30GB. After the upload, the script compares the checksum results of source and destination files. Intermittently this comparison fails by giving timeout errors and incorrect file size errors. One suspect is the network and the other one is the SCSI channel. Now, my questions:
1) During data transfer of big files, are they firt get cached and then writes to disk? If yes, then while checksum testing will it return the size of an incomplete file (which is still in cache or network)?
2) Can anybody give a detail description of data tranfer/write process, when a file gets transferred through network?

I just need some inputs from you to debate my assumptions.

Thanks!
Shiju
Life is a promise, fulfill it!
15 REPLIES 15
Steven E. Protter
Exalted Contributor

Re: Disk write/cache and network

I can't answer question 1 or 2.

I can say we had similar problems and they were caused by the network. We had files failing checksum after transfer and large transfers sometimes hanging or failing.

Specfically, a Cisco router set to autonegotiate for the HP port.

lanadmin -x 1 showed 100 Base T full duplex autonegotiate.

The problem was eventually solved with two steps:

1) Set the router to explicit 100 BaseT Full Duplex.
2) Set the expected speeds for the NIC in /etc/hpbtlanconf

I'm attaching our conf file.

I'm sure you know all of this, but it is interesting that we have the same symptoms.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Helen French
Honored Contributor

Re: Disk write/cache and network

May be I should add some more:

The OS is 11.0, fully pacthed. Also when I said cache, I actually meant disk array and controller cache, system memory etc..my bad :((
Life is a promise, fulfill it!
A. Clay Stephenson
Acclaimed Contributor

Re: Disk write/cache and network

You amazingly want a detailed explanation of the data transfer when you don't even both to identify the protocols. Is it tcp or udp? Is is even IP? At a higher level, how are you transferring the files? NFS, ftp, rcp, homegrown?

If I make the assumption that this is TCP/IP then the packets have to be cached because the packets may not even arrive in order they have to be reassembled by the receiver.

I will say that transferring 30GB files over anything except a rock-solid network is asking for trouble. If this were me, I would split the files into smaller chunks, transfer them with error checking, and then cat them together. That way, if a transmission error occurs, error recovery is much faster/cheaper than for a huge file.
If it ain't broke, I can fix that.
Helen French
Honored Contributor

Re: Disk write/cache and network

Steven, thanks for your reply.

Clay, sorry! my bad ..it happens when you have bunch of issues on the friday before a long weekend =))

Most of the RT systems use NFS mount points and some of them uses an rcp command.
Life is a promise, fulfill it!
Helen French
Honored Contributor

Re: Disk write/cache and network

Clay, to add, 30GB is not the size of single files, but directories of smaller files (in MBytes).

Have a great weekend to all, I will check the responses again on Tuesday.

Thanks!
Shiju
Life is a promise, fulfill it!
Helen French
Honored Contributor

Re: Disk write/cache and network

Anybody? Any more inputs? descriptions?

Thanx!
Life is a promise, fulfill it!
Jeff Schussele
Honored Contributor

Re: Disk write/cache and network

Hi Shiju,

As I see it you have possibly two types of caches at work here.

1)Let's tackle the first cache type -> HW (either on HBA itself or on the controller on the disk - or both) As far as the OS is concerned as soon as the controller/HBA reports that the write is complete, the OS thinks the data is ON the disk while in actuality it may be in the HW cache still. Now most HW caches will report write complete when the data is safely in the ECC cache memory. The frequency of the flush of this cache must be determined to allow for the max time to check the disk. On some - not all - this reporting & flush time is configurable. But you should at least be aware of the absolute max time from HW cache to disk & adjust the program to allow for this latency.

2) The second cache is a SW (or OS) buffer cache & you need to check the filesystem mount options for whether you're using this cache (async write) or not using this cache (sync write). There are *many* reasons when one would or would not want to use the buffer cache & they mostly deal with performance, integrity or memory usage implications. The normal flush time for the OS is *roughly* 5-6 times a minute, but can be sped up by high disk usage that causes all buffer entries to fill before the next *normal* flush by the sync command.
But again you're in the position where the HW will tell the OS that the write is complete, even though the data sits in the HW cache.
So as I see it you need to determine:
A) What the max HW cache to actual disk write latency is
B) Whether you write async (NO buf cache) or sync (using buf cache) in those filesystems
and ensure the program has enough pause time for the sum of the max values of both.

I really don't think the network latency affects you here as the transfer mechanism probably will not report complete until all the data makes it to the destination intact. IF you allow time for all caches involved to flush, you MUCH less likely to have a misreported file size due to incomplete cache flushes.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
A. Clay Stephenson
Acclaimed Contributor

Re: Disk write/cache and network

You are really looking in the wrong place. It should not be possible for cksum to timeout unless deliberately interrupted or unless cksum()'ing an NFS mounted file in which case NFS is timing out (if appropriate mount options are used).

Cksum or any other command really doesn't care if the file is in cache or on disk or any state in between - the buffer cache will take care of that and satify the read requests as needed.

If this were me, I would rethink the entire approach. I would much rather have error checking built into to the transfer so that you know immediately if a) the transfer timed-out b) any other errors were encountered. There is then no need for a cksum because the error reporting is integral to the transfer. One method is to use the Net::FTP Perl module. You can easily script the transfer and all the error checking you need is available simply by looking at a variable. If $status = 2, all is well and that's all you need to know.

If you do a boolean search for Perl and FTP, you should find examples. The Net::FTP module also uses the .netrc conventions so that the passwords do not have to be included in the source or passed as command line arguments.

If it ain't broke, I can fix that.
A. Clay Stephenson
Acclaimed Contributor

Re: Disk write/cache and network

You are really looking in the wrong place. It should not be possible for cksum to timeout unless deliberately interrupted or unless cksum()'ing an NFS mounted file in which case NFS is timing out (if appropriate mount options are used).

Cksum or any other command really doesn't care if the file is in cache or on disk or any state in between - the buffer cache will take care of that and satify the read requests as needed.

If this were me, I would rethink the entire approach. I would much rather have error checking built into to the transfer so that you know immediately if a) the transfer timed-out b) any other errors were encountered. There is then no need for a cksum because the error reporting is integral to the transfer. One method is to use the Net::FTP Perl module. You can easily script the transfer and all the error checking you need is available simply by looking at a variable. If $status = 2, all is well and that's all you need to know.

If you do a boolean search for Perl and FTP, you should find examples. The Net::FTP module also uses the .netrc conventions so that the passwords do not have to be included in the source or passed as command line arguments.

If it ain't broke, I can fix that.
Helen French
Honored Contributor

Re: Disk write/cache and network

Jeff and all!

Thanks for the information Jeff, much appreciated. I will observe the parameters you said. I have some questions though:
a) What's the default FS mount (vxfs) value - sync or async?
b) If the data is still in cache during a chksum command, will it return "time out" errors? instead of an incomplete value?
c) I did a performance comparison between network and local checksum processes. The same directory on the host was used for getting outputs and there was a huge performance drop when I used Network (nfs). Does it sounds anything?

Thanks ..points will follow!
Life is a promise, fulfill it!
A. Clay Stephenson
Acclaimed Contributor

Re: Disk write/cache and network

You are really looking in the wrong place. It should not be possible for cksum to timeout unless deliberately interrupted or unless cksum()'ing an NFS mounted file in which case NFS is timing out (if appropriate mount options are used).

Cksum or any other command really doesn't care if the file is in cache or on disk or any state in between - the buffer cache will take care of that and satify the read requests as needed.

If this were me, I would rethink the entire approach. I would much rather have error checking built into to the transfer so that you know immediately if a) the transfer timed-out b) any other errors were encountered. There is then no need for a cksum because the error reporting is integral to the transfer. One method is to use the Net::FTP Perl module. You can easily script the transfer and all the error checking you need is available simply by looking at a variable. If $status = 2, all is well and that's all you need to know.

If you do a boolean search for Perl and FTP, you should find examples. The Net::FTP module also uses the .netrc conventions so that the passwords do not have to be included in the source or passed as command line arguments.

If it ain't broke, I can fix that.
Helen French
Honored Contributor

Re: Disk write/cache and network

Clay, Thanks for the reply. I was thinking the same about NFS time out. But wanted to confirm since Jeff said, about the data flushing time from cache to disk.

Did you mean that the buffer/cache will satisfy the command with the correct value during a cksum (even if the data still in cache)?

Thanks for the idea about Net::FTP perl module, but unfortunately, I cannot do anything on that since the data transfer is being invoked by RT systems and is administered by different set of people. I would defenitely give this suggestion.
Life is a promise, fulfill it!
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Disk write/cache and network

Yes, cksum or anything else that uses the read() system call doesn't have a clue (or care) that the data may only reside in buffer cache at this moment. "Where" the file "is" is completely hidden from the application.

If you want to completely eliminate buffer cache then mount the filesystem convosync=direct,mincache=direct - that will bypass buffer cache but you will see a performance hit for precisely that reason. (Even in this case, on-disk cache will still be in play but that's completely invisible to the OS.)
If it ain't broke, I can fix that.
Jeff Schussele
Honored Contributor

Re: Disk write/cache and network

Hi (again) Shiju,

Command line created default is log, which is a sync write.
SAM created default is delaylog, which is an async write.

You also need to remember that there are two types of writes - Data & Metadata (inode info). Metadata is almost always sync unless overruled by mount options.

mount -v
will show the FS mount options

mount -o remount,option1,option2 /dev/vg_name/lv_name /mnt_point
will allow you to change options on the fly w/o unmounting. Easy way to test performance & integrity implications.

BTW - Clay is correct that any reads will force OS buffer flushes and I would *suppose* HW cache flushes as well.

HTH,
Jeff
PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!
Helen French
Honored Contributor

Re: Disk write/cache and network

Thanks Clay and Jeff. That answered my questions! I will continue with my testings and will let you all know, what was the issue. It might take a long time, if we need to purchase more hardware (network or disk controller?)

Any more inputs are welcome !

Shiju
Life is a promise, fulfill it!