- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Slow IO
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-07-2007 08:37 AM
тАО09-07-2007 08:37 AM
Slow IO
Anyone have a clue where to look? I am thinking there is a hardware issue somewhere.
Just found something interesting from autogen:
ACP_DIRCACHE parameter information:
Feedback information.
Old value was 1050, New value is 1050
Hit percentage: 95%
Attempt rate: 4294966590 attempts per 10 sec.
That doesn't seem right. System was up 213 days. That is only on 1 of the 3 nodes. Maybe a field overflowed since it didn't increase the value.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-07-2007 09:39 AM
тАО09-07-2007 09:39 AM
Re: Slow IO
I'd be looking for processes slamming I/O at the disk first; at a run-away process.
Fragmentation of key files or of available disk free-space would be another avenue of investigation.
Anything paging to disk heavily? Processes thrashing?
Deep queues are either a fluke of the monitoring (which has been occasionally known to crop up), or there's really a whole lot of I/O. MONITOR and process I/O counts and AvailMan/AMDS and SHOW MEMORY /CACHE other such can be brought to bear.
I'd also ensure I was current on patches for OpenVMS Alpha, as there are various of those that have been issued in the last 213 days. Some apparently targeting quite serious issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-07-2007 10:29 AM
тАО09-07-2007 10:29 AM
Re: Slow IO
Presuming that your HSZ's are using writeback cache modules, you should make sure that the cache batteries are viable and that the caches haven't been disabled - you'll also want to insure that all units that you expect to be using the writeback cache are enabled to do so.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-07-2007 10:55 AM
тАО09-07-2007 10:55 AM
Re: Slow IO
A IO queue is an effect, not a cause.
Closer to the cause would be the IO/sec.
But those can only be interpreted knowing a little more about IO distrubution (small and random vs large and sequential) and IO capacity (Number of spindles, Speed of drives, RAID, Cache levels available)
So tell us more about those quantities.
Simple stuff to start with.
During production:
$SHOW CACHE/FULL/OUT=XFC_ACCUMULATED.LOG
$SET CACHE /RESET ! Resets counters, not contents.
$WAIT 1:00:00
$SHOW CACHE/FULL/OUT=XFC_BUSY_HOUR.LOG
Give us a hint as to what the application is doing.
OLTP-ish? Datawarehousing? File & print? Web serving?
Mostly read? Mostly write?
Is this a database application? Oracle? RDB?
Is it an RMS (indexed file) application?
Home-grown DB?
What does the DB layer think about the IOs?
Hot-files? Hot-tables?
Do you have historical data? For example a T4 day or DecPS collection for a similar day last month and this month? Did the IO queues go up with a similar load, or did the load in fact increase?
Just copying files around gives a useful gutfeel of performance but is fraught with perills in interpretation: Input cached? Output pre-allocated contiguous? (+ COPY/OVER)? Disk fragmentation on one node? RAID levels?
3x difference is worth investigating though.
Sounds like a fun problem. Contact me offline if you think you need help beyond some hints here.
Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 03:53 AM
тАО09-10-2007 03:53 AM
Re: Slow IO
Paging isn't too bad, might be worse right now as we had a hardware failure over the weekend on one node and the entire application is running on one system.
CUR AVE MIN MAX
Page Fault Rate 302.72 155.11 0.00 1091.01
Page Read Rate 73.93 42.72 0.00 305.72
Page Read I/O Rate 35.96 21.63 0.00 168.84
Page Write Rate 0.00 0.12 0.00 4.99
Page Write I/O Rate 0.00 0.02 0.00 0.99
This is a Codaysl DBMS application. Recording production at a manufacturing site. The Oracle 9i server is on the third node and is also experiencing slowness. The one program I am testing for initial load speed is all reads.
I/O Operation Rate CUR AVE MIN MAX
$5$DKA101: (MAPPRA) MAPPR 473.57 552.64 0.00 1042.05
We took the entire cluster down Friday and the site cleaned all the cables and connections. After powering everything up I was able to get a test done before all the users got back on. The times were about the same as my DR system. The times got longer as more users got on. Instead of 5 seconds it was 30-45 seconds. This morning it was 1:50 seconds.
The database disk is a striped mirrorset. 4 18gb mirrorsets striped together.
Originally I was thinking there was a hardware issue but I don't think so now. Since I was able to get similar times on an empty system, it seems that we have exceeded the capability of the storage.
attached is a 20 minute cache log. The command was bad so I don't have the before log.
show mem/cache
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 04:04 AM
тАО09-10-2007 04:04 AM
Re: Slow IO
The system is out of memory, or rather... it's memory is LIKELY to be be overcommited.
The XFC only has 9.7MB in use.
That's almost a joke, so little.
How much memory is there?
Roughly, How is it used? (free/modi/process/db/..)
What are those 9-block IO's?
That's an 'odd' size (sic) well worth investigating, as they represent 1/2 of all the IOs.
Have you asked the HSZ what's happening?
It is serving 500+ IO/sec average over those 20 minutes. Not too low, but not too high. It would be too high IF the bulk of that goes out to random IOs to just 4 older drives, if the HSZ cache is nto kicking in.
In your situation I would want to study a VTDPY picture for a while.
Cheers,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 04:19 AM
тАО09-10-2007 04:19 AM
Re: Slow IO
Not sure what the 9 block IO's are. Probably database related.
The controller doesn't seem too busy.
About 90+% read options.
75% I/D Hit 66.4% Idle 3228 KB/S 768 Rq/SB/S Rd% Wr% Cm% HT% Unit ASWC KB/S Rd% Wr% Cm% HT%
Pr Name Stk/Max Typ Sta CPU% Xfer Rate Target Unit ASWC KB/S Rd% Wr% Cm% HT% Unit ASWC KB/S Rd% Wr% Cm% HT%
0 NULL 0/ 0 Rn 66.4 T W I Mhz 111111 D0100 o^ b 39 99 0 0 32
1 FCTRL 1/ 1 FNC Rn 0.3 00W 5 20.00 0123456789012345 D0101 o^ b 3182 97 2 0 13
2 RECON 10/ 1 FNC Bl 0.0 W 6 20.00 P1 DD hHDDDD D0102 o^ b 6 7 92 0 100
3 HP_MAIN 40/ 2 FNC Rn 24.9 W 7 20.00 o2DDD hHDDDD D0200 x^ b 0 0 0 0 0
4 HP_TIM 10/ 1 FNC Bl 0.0 01W 5 20.00 r3DDD hHDDDD D0201 x^ b 0 0 0 0 0
7 VTDPY 20/ 11 DUP Rn 0.0 W 6 20.00 t4DDD hHDDDD D0202 x^ b 0 0 0 0 0
13 MDATAIO 10/ 5 FNC Bl 0.0 W 7 Async 5DDDD hHDDD D0203 x^ b 0 0 0 0 0
15 SVANVUP 10/ 2 FNC Bl 0.0 6DDDD hHDDD D0204 x^ b 0 0 0 0 0
16 MDATA 10/ 4 FNC Bl 0.0 D0205 x^ b 0 0 0 0 0
17 FMTHRD 10/ 3 FNC Bl 0.0
18 SCSIVT 10/ 2 FNC Bl 0.0
19 DS_HB 10/ 3 FNC Bl 0.0
20 RMGR 40/ 3 FNC Rn 0.0
21 VA 10/ 3 FNC Bl 0.0
22 DS_1 40/ 3 FNC Rn 7.5
23 MEMMGR 10/ 2 FNC Bl 0.0
24 CHKMEM 10/ 3 FNC Bl 0.0
25 DS_0 20/ 1 FNC Bl 0.5
26 CLIMAIN 20/ 13 FNC Bl 0.0
27 NVFOC 10/ 3 FNC Bl 0.0
28 REMOTE 20/ 3 FNC Bl 0.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 04:33 AM
тАО09-10-2007 04:33 AM
Re: Slow IO
And you don't think that alone is reason enough for the slow IO? The XFC is squeezed down to a few MB and has to let more than 30% of the IO go through. That can be enough to go from happy to sad on the storage side.
IMHO Your best bet is to work on relieving memory pressure. Everything installed shared? Squeeze come working sets? Reduce DB cache (when apps are combined)...
>> The controller doesn't seem too busy.
Correct, the controller load is not too bad, but if those 500+ IO/sec go to just 4 drives, then that may well be too much.
So I was more interrested in teh SHOW DEV page (or however that was called, I havent's used that too in a while).
Be sure to attach it as text file again, like you did for the XFC output, and perhpas also include the VTDPY main page as it is too hard to decipher as posted.
Hope this helps some more,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 05:04 AM
тАО09-10-2007 05:04 AM
Re: Slow IO
I did install the main image shared on Friday. I know it's gotten bigger, 126,000 block executable. There are over 20 copies of it running now.
Here's the devices and vtdpy page attached.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 05:30 AM
тАО09-10-2007 05:30 AM
Re: Slow IO
I had a slightly different VTDPY page in mind (cache?) showing per disk IO numbers but this is find and you nicely broke down D0101, the busy disk. It is nicely split over the back-end scsi's
There are 8 drives (4 mirrors) behind that really, so I would hope it could deliver more than 500 IO/sec readily, but that's still busy. It's more than 60 random IO/sec per spindle sustained... if all spindles are equally busy. That is NOT operating in a comfort zone, but it's hard to call it overloaded without even further detail. It might be too much.
Notice how the cache hits are minimal (3%).
And read-write ratio is mamimum.
The write cache is pre-allocated, not need based, but is hardly used. So if this usage pattern is typical, then you might want to disable the write cache to see whether the extra space for read cache might help.
btw... what does the 'o^ b' in the ASWC refer to? I forgot (or never knew :-).
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 06:35 AM
тАО09-10-2007 06:35 AM
Re: Slow IO
^ - disk spinning
blank - no write protect
b - read and write caching enabled
Had the manual handy!
Everywhere else(16 sites) the RA10000 handles our I/O load. You maybe right about the memory though, the application is growing over time. It's possible that it grew enough to take enough memory away from the cache. The other sites with similar load have EMC Symms. The sites with similar configs don't have the same load.
Thanks for the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 08:59 AM
тАО09-10-2007 08:59 AM
Re: Slow IO
We brought the 2nd application node back up.
I was able to get on it before any users. The benchmark program I am using took 1:40 to bring up the data. That's with no other users, plenty of memory, and it did it several times to utilize the cache. That should point to it being something with the overall storage sub-system then something else.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2007 10:39 AM
тАО09-10-2007 10:39 AM
Re: Slow IO
OpenVMS Alpha and OpenVMS I64 make use of "spare" memory for I/O caching and related, so little memory in a busy system doesn't go to waste; throw some more memory at this configuration.
XFC is almost shut down here.
Check your process pagefault fault rates; see if your boxes are in contention for memory. See if your freelist is dinky. (Given your description, I'd tend to expect the processes are competing for memory and you have a very small freelist, and probably a pretty good fault rate.)
Again, perform a systemic evaluation of the system. This approach might seem wasteful, but bottlenecks are quite often not where you might expect. (Too little available memory for I/O caches means larger I/O rares, and can mean higher pagefault rates, for instance.)
BTW, I picked up a nicely-configured used AlphaServer DS20e box for about US$1500 almost a year ago -- rather faster than your boxes here. That box had 4GB, too. Faster Alpha boxes are certainly available, and can often be an easy "go faster" option.