- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- Operating System - Tru64 Unix
- >
- Collect I/O stats report
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-11-2005 04:17 PM
08-11-2005 04:17 PM
Collect I/O stats report
Attached is a graph for the I/O stats of a particular disk on my system, but something does not add up.
The graph shows a wait queue of nearly 6000, yet the number of writes per second is only about 600, and very little reads. My question is, where did the items in the queue come from?
There seems to be a much stronger correlation between the throughput (KB Written/Sec) and the wait queue, than there is between the Reads- or Writes/Sec and the wait queue.
Am I misinterpreting the stats? If so, your advice shall be greatly appreciated.
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-11-2005 08:38 PM
08-11-2005 08:38 PM
Re: Collect I/O stats report
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-17-2005 02:51 AM
08-17-2005 02:51 AM
Re: Collect I/O stats report
After normalizing the graph, it is like counting as the Irish do - one, two, many, lots:-)
The graph now says that there were lots of writes, which resulted in lots of items in the wait queue, causing lots of throughput.
I am currently busy troubleshooting a performance issue that requires exact figures, but the stats don't add up.
Here is how I see it, but perhaps you can point out a flaw in my reasoning:
- Each read- and write I/O will be placed in the queue of a particular LUN, for processing.
- If the reads and writes come in faster than the device can process them, the queue will build up, resulting in delays.
Therefore, if there are 6000 items in the queue, there had to have been more than 6000 read plus writes, 'cause the LUN will continue processing them as they come in.
However, this is not what the "un-normalized" graph says.
Hope you can help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-17-2005 03:25 AM
08-17-2005 03:25 AM
Re: Collect I/O stats report
" Normalization of Data
Where appropriate, data is presented in units per second. For example, disk
data such as kilobytes transferred, or the number of transfers, is always
normalized for 1 second. This happens no matter what time interval is
chosen. The same is true for the following data items:
+ CPU interrupts, system calls, and context switches.
+ Memory pages out, pages in, pages zeroed, pages reactivated, and pages
copied on write.
+ Network packets in, packets out, and collisions.
+ Process user and system time consumed.
Other data is recorded as a snapshot value. Examples of this are: free
memory pages, CPU states, disk queue lengths, and process memory."
So: Your I/O rates and throughput figures are one second averages, but the queue depth is an instantaneous snapshot. What interval are you using to collect this data? 'collect' really isn't the ideal tool for short-interval data collection like this. I find 'iostat' or 'advfsstat' (assuming you're on AdvFS) more uuseful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-17-2005 03:26 AM
08-17-2005 03:26 AM
Re: Collect I/O stats report
Looking at the graph it's hard for me to tell which line represents what. I suggest you produce 3 graphs instead of one as follows:
Graph 1: Active Queue & Wait Queue
Graph 2: Reads/Sec & Writes/Sec
Graph 3: KB Read/Sec & KB Written/Sec
I suspect you may be interperting the lines incorrectly. What you think is the wait queue may actually be I/O per sec or KBs per sec.
Vic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-17-2005 03:29 AM
08-17-2005 03:29 AM
Re: Collect I/O stats report
Vic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2005 02:17 AM
09-08-2005 02:17 AM
Re: Collect I/O stats report
Did some further digging, but the thick only plottens:-(
Some background - the users sometimes complain that their actions take long to complete. This is also reflected in the Oracle database, where it sometimes has to wait up to 20 seconds for a transaction to complete, because it is waiting for an I/O.
So, I am trying to figure out what is happening on the I/O subsystem.
I saw queues forming on some disks, but no sudden burst of I/Os going to that disk that would cause the queue to build up. Wrote a little script that sends a single I/O to that disk, in 1 second intervals, just to see how long the I/O takes to complete. What I found was, that the I/O takes a fraction of a second to complete in most cases, but when there is a queue, the I/O could take up to 20 seconds to complete. Which ties in with what Oracle is experiencing.
I used iostat to collect information about the load on that disk, and used monitor to get the queue length (if you know of a better way to get queue length information, please let me know).
My question is - if very little I/Os are going to a disk, what can cause a queue to build up, so badly that it takes a single I/O 20 seconds to complete?
Long story, I know, and I hope it makes sense. Any help will be most welcome.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2005 02:46 AM
09-08-2005 02:46 AM
Re: Collect I/O stats report
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2005 06:13 PM
09-08-2005 06:13 PM
Re: Collect I/O stats report
All the disks in question, contain raw volumes, used by Oracle.
I'm not too comfortable with the queue stats, though. collect shows queues of up to 2000, where monitor only shows queues of 20 or so. Are there better ways of getting disk queue information?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2005 09:14 PM
09-08-2005 09:14 PM
Re: Collect I/O stats report
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2005 09:34 PM
09-08-2005 09:34 PM
Re: Collect I/O stats report
The storage sits on an EMC DMX box, with the data SRDF'ed to a remote site.
We have run similar stats collections on other systems that have storage on the same DMX, but they all look fine.
We also failed the system over to the remote site, but we get the same problem.
You are right, it sounds all wrong. I'm starting to doubt my stats collectors (iostat, collect and monitor).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2005 09:55 PM
09-08-2005 09:55 PM
Re: Collect I/O stats report
I just happened to verify that for the BL24 (PK3) release and newer that the storage device statistics are correct for collect.
Is it possible that the SRDF link is "stalled" when you see those high I/O queues?
I'm not sure that I could match the colors in the graph to the statistics. Could you perhaps present cfilt output for a similar event?
What does the DDR entry for the EMC device look like? I assume EMC configured that correctly for you, right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2005 10:22 PM
09-08-2005 10:22 PM
Re: Collect I/O stats report
Possible cause for such behaviour can be synchronization of two DMX-es, that is SRDF.
Check if synchronization between two DMXes is sync or asych.
We also have GS80 connected to a Symmetrix box which has SRDF to a remote site. And when we turn the synchronization on we experience bad disk performance for those disks which are synchronizing.
Other systems on DMX that work fine at your site maybe are not synchronizing with remote site.
Also you can try to turn off completely synchronization between DMX boxes for a while and then check the performances.
Also as a step in troubleshooting could be to take a look on disk usage on DMX-es through EMC control center software, not from the server side (i.e. monitor, iostat, collect...)
Hope this will help
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2005 10:48 PM
09-08-2005 10:48 PM
Re: Collect I/O stats report
These are brilliant ideas. I know, 'cause I tried them as well:-)
Even with SRDF completely out of the picture (split), the problem still occurs.
DDR entries are correct (verified that).
The EMC engineers are about to start a trace on the FA's that this system is connected to. I'll also be collecting stats with iostat, monitor and collect, at 1 second intervals. I'll be posting some graphs soon.
Thanks for your help so far.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2005 01:27 AM
09-09-2005 01:27 AM
Re: Collect I/O stats report
I wrote a little script that send a single I/O to each disk and then measures the time it takes for the I/O to complete. In this short collection time, there was at least 1 I/O that took 8 seconds to complete. They sometimes take up to 20 seconds and will happen on each disk (independently) within 30 minutes. This I/O was issued at exactly 13:38:37 and completed 8 seconds later. Now look at the graphs...
There isn't much happening on the disk, but a queue length of 36 pops out of nowhere, the service times shoot up and the I/O takes forever to complete.
Oh the humanity!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2005 02:28 AM
09-09-2005 02:28 AM
Re: Collect I/O stats report
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2005 02:35 AM
09-09-2005 02:35 AM
Re: Collect I/O stats report
Is the swap area OUT of the SAN? Swap devices should be located on local disks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2005 07:00 PM
09-11-2005 07:00 PM
Re: Collect I/O stats report
Adapters:
# hwmgr -show fibr -adapt
ADAPTER LINK LINK FABRIC SCSI CARD
HWID: NAME STATE TYPE STATE BUS MODEL
--------------------------------------------------------------------------------
786: emx7 up point-to-point attached scsi11 FCA-2384
Revisions: driver 2.14 firmware 1.90A4
FC Address: 0x6a0070
TARGET: -1
WWPN/WWNN: 1000-0000-c93e-60ae 2000-0000-c93e-60ae
ADAPTER LINK LINK FABRIC SCSI CARD
HWID: NAME STATE TYPE STATE BUS MODEL
--------------------------------------------------------------------------------
51: emx0 up point-to-point attached scsi3 FCA-2384
Revisions: driver 2.14 firmware 1.90A4
FC Address: 0x650071
TARGET: -1
WWPN/WWNN: 1000-0000-c93e-ca10 2000-0000-c93e-ca10
ADAPTER LINK LINK FABRIC SCSI CARD
HWID: NAME STATE TYPE STATE BUS MODEL
--------------------------------------------------------------------------------
928: emx9 up point-to-point attached scsi12 FCA-2384
Revisions: driver 2.14 firmware 1.90A4
FC Address: 0x21300
TARGET: -1
WWPN/WWNN: 1000-0000-c93e-615a 2000-0000-c93e-615a
ADAPTER LINK LINK FABRIC SCSI CARD
HWID: NAME STATE TYPE STATE BUS MODEL
--------------------------------------------------------------------------------
955: emx11 down scsi13 FCA-2354
Revisions: driver 2.14 firmware 3.92A2
FC Address: 0x0
TARGET: -1
WWPN/WWNN: 1000-0000-c931-4bb4 2000-0000-c931-4bb4
ADAPTER LINK LINK FABRIC SCSI CARD
HWID: NAME STATE TYPE STATE BUS MODEL
--------------------------------------------------------------------------------
960: emx13 up point-to-point attached scsi14 FCA-2384
Revisions: driver 2.14 firmware 1.90A4
FC Address: 0x6b0002
TARGET: -1
WWPN/WWNN: 1000-0000-c93e-61c2 2000-0000-c93e-61c2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2005 07:11 PM
09-11-2005 07:11 PM
Re: Collect I/O stats report
http://h20000.www2.hp.com/bizsupport/TechSupport/DriverDownload.jsp?pnameOID=341798&locale=en_US&taskId=135&prodTypeId=12169&prodSeriesId=341796&swEnvOID=1048
There is NO issue at all having swap on SAN storage.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2005 10:45 PM
09-11-2005 10:45 PM
Re: Collect I/O stats report
Besides collect and monitor, is there another way to get disk queue information?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2005 02:09 AM
09-13-2005 02:09 AM
Re: Collect I/O stats report
falling behind with respect to Tru64.
Part of the problem with collecting I/O data is that it may not be properly maintained in the kernel, which means you're SOL not matter what tool you use.
Collect is the best tool for disk I/O.
cheers,
Rob Urban
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2005 04:58 PM
09-18-2005 04:58 PM
Re: Collect I/O stats report
We ran the "I/O test" described above, and with the system totally idle, we recorded a worst case of 0.55 seconds during a 45 minute test period.
I would think that this is quite high for a system without any load?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-18-2005 09:49 PM
09-18-2005 09:49 PM
Re: Collect I/O stats report
Could you give more details about the results of the test:
- You say that this is using the same test. Is this reproducable?
- Is it always on the same disk?
- Is the EMC also idle (during the maintenance window) or are other systems using it?
- Are the disks behind the LUN dedicated to the test?
- What did the EMC performance investigation reveal?
- etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-02-2005 09:43 PM
10-02-2005 09:43 PM
Re: Collect I/O stats report
To answer some of your questions:
- Reproducable? Kind of. Under load, we see the 8 to 20 delays, every time. Without load, we don't see these delays.
- We see this on all the disks, at different times.
- No, the EMC is busy serving other systems all the time.
- No, the disks (spindles) serve other hosts as well.
- While our test reported an I/O that took about 10 seconds to complete, the EMC test didn't show any I/O that took more than a second to be served.
This issue has been escalated within HP. We are expecting some assistance with our investigation.
I'll be sure to post our findings.