- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: Disk Queue Length
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 07:45 PM
12-09-2004 07:45 PM
			
				
					
						
							Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
I found that 1 disk in my cluster had at one moment a queue length of 90. At 6:00 it was 30, at 6:02 it was 90 and at 6:04 it was 5.
The event is repeated every day at the same time.
The active image is dataserver (Sybase).
Top hot files reports high Non Virtual QIO (about 30% of all IO, mainly writes). This indicates file system activity ?
What can this be ?
(VMS 7.3, GS160, HSG80, MA8000, FDDI)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 08:57 PM
12-09-2004 08:57 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
why not try to collect some info about those IOs with my favourite tool: SDA.
If this happens predictably at the same time every day, just submit a batch job to run at 06:00 and include the following commands:
$ ANAL/SYS
SDA> SET OUT/NOINDEX file1.lis
SDA> SHOW DEV
SDA> EXIT
This should give you the list of IOs in the queue for the device and you can find out, what they are alike, who issued them etc.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 10:26 PM
12-09-2004 10:26 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
1) both cluster nodes have a peak at the same moment : 70 for 1 and 20 for the other.
2) thruput was about 4 MB/sec at the moment of the peak
3) there was also a peak of 35 in "credit waits"
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 10:38 PM
12-09-2004 10:38 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
cluster_credits is at 128. Too low ?
Since the disk is shadowed via FDDI : mscp_buffer is at 16384 and mscp_credits at 128.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 10:46 PM
12-09-2004 10:46 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
CLUSTER_CREDIT = 128 is the maximum value.
Is anything unusual happening with the disk, pathes to the disk or HSG80 ? Mount-verification ? Path switches ?
Check the FC counters with SDA> FC STDT
(QF seen or Seq TMO > 0 ?).
Is there a specific job, which always starts at 06:00 ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 11:25 PM
12-09-2004 11:25 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
Since the disk is shadowed via FDDI : mscp_buffer is at 16384 and mscp_credits at 128.
Maybe a superfluous question, but since you use FDDI, did you set NISCS_MAX_PKTSZ to 4486?
Proost.
Have one on me.
Jan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 11:33 PM
12-09-2004 11:33 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
But I did mon clu and added cr_waits.
There are a few thousand of them but all nodes I checked have them, even in other companies.
Volker : nothing special active. Just an application peak.
Other thing : the FDDI is shared with another cluster. This one had a thruput of about 2-4 MB at the moment of the problem.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 11:38 PM
12-09-2004 11:38 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
here is an article explaining non-virtual QIOs as reported by TNG/PSDC
http://h18000.www1.hp.com/support/asktima/operating_systems/CHAMP_SRC931006004627.html
In your case, this would point to 'database', probabyl doing Logical-IO to some of it's files. And if it is happening every day at 06:00, there must be some 'time-released' job in the application causing this.
Did you check the FC counters ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-09-2004 11:54 PM
12-09-2004 11:54 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
It seems that there are also peaks at other hours. And the peak at 6:00 has been lowered because the interval is 30 min instead of 2.
Between 01:06 and 2:15 several disks have peak queues of 10-70. And at this moment backup is active reading about 15 MB/sec. The SCS traffic during this interval is almost 0. I also saw backup doing lots of IO but not doing any thruput (during 20 minutes) generating queue length of 70 continuously.
Wim
So I guess it is normal behaviour when reading or writing too fast.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2004 12:03 AM
12-10-2004 12:03 AM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
are you running your BACKUP jobs from an account with a very high DIOLM ? You may be overloading your HSG80...
The SDA> FC STDT/ALL counters would give an indication (QF seen).
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2004 12:36 AM
12-10-2004 12:36 AM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
It is 7.3, so no /ALL.
The DIOLM is indeed high (4096).
But how do you tell that the HSG80 is overloaded ? If thruput stays high I don't have a problem with long queues. But I like to know why and who is doing it.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2004 12:54 AM
12-10-2004 12:54 AM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
the only indication of an 'overloaded' HSG80 - that I know about - is seen in the FC counters: QF seen = 'Queue Full seen' or
Seq Tmo = sequential timeouts. You need to check the counters on all your FC pathes from the node to the HSG80:
SDA> FC SHOW FGA0
SDA> FC STDT
SDA> FC SHOW FGB0
SDA> FC STDT
...
If IOs to the disk/path/HSG80 are temporarily stalled, the queue length will stay high, but no IOs will be processed, they will all be pending.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2004 09:39 AM
12-10-2004 09:39 AM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2004 12:33 AM
12-12-2004 12:33 AM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
A side note, not particularly on topic.
Taken to its extreme, a short queue (queue length = 1) is not necessarily a good thing. For example, seek optimization algorithms cannot optimize without a queue to analyze. If the HSG has a problem with long queues, it is a BUG. The driver and the controller should jointly ensure that the queue length does not cause controller problems.
Short of exhausting non-paged dynamic memory on the host OpenVMS system, and the performance problems caused by a dis-porportionate queue length on one particular device, there should be no problems with long queues.
Of course, the individual performance of a particular process is a different question. Most analyses of overall system performance maximize the overall performance of the system.
Using DIOLM and other account quotas to manage the workload on the HSG is crude and imprecise at best, and self-defeating at worst. It is true that for a particular configuration, increasing quotas beyond a certain point yields dramatically decreasing benefits, but that is an entirely different issue.
I hope that the above is helpful.
- Bob Gezelter, http://www.rlgsc.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-12-2004 03:38 AM
12-12-2004 03:38 AM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
Tom : yes running vtdpy could give some extra data but I don' have permission to come at 6:00.
Wim@home
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-01-2005 10:01 PM
11-01-2005 10:01 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
I now have a simular case. FAL was doing non-virtual IO with a thruput of 2 - 6 MByte per second. This during 1 hour, so several GB. But the application guys don't know why.
Any idea's ?
The cluster was rebooted and the problem is gone. But I would like to know what it was.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-01-2005 11:41 PM
11-01-2005 11:41 PM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
What Volker is suspecting but maybe not articulating strongly enough is that there is a serious performance problem once you hit a QFULL condition (credits) on the fibre channel driver. One spike, and you'll slow down after that.
The recovery for that is way too gentle/slow.
This is addressed by VMS732_FIBRE_SCSI-V0700.
The fact that it gets better after reboot kind of confirms this.
That potential alternative to this is supposdly:
SDA> FC SET WTID/WWID=wwid-number/QFTIMED=1
About the 6 am/ hourly spikes possibly related to sybase... Coudl sybase be doing like what oracle calls a checkpoint? A great many IOs in one go to sync memory with disks?
Is there a knob in sybase to limit this? Like a 'max-write-io' perhaps?
Now about Backup. The backup process settings with DIOLM many thousands is 'old school'. It goes well beyond the point of diminishing returns. Please check the current process quota recommendation, or just set it back to 100 and try.
met vriendelijke groetjes,
Hein.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-02-2005 12:12 AM
11-02-2005 12:12 AM
			
				
					
						
							Re: Disk Queue Length
						
					
					
				
			
		
	
			
	
	
	
	
	
It is 7.3 patched until the patches of 12-mar-2003.
Note that the reboot solved the problem that FAL does a lot of non-vir qio. But why ?
The queue length problem is still present and imo caused by high activity (controller saturated). At that moment a backup is busy and also some big FTP. And some smaller Sybase dumps. And the controller is also used by another cluster that is doing backups combined with heavy DWH activity.
I still have the peaks non-virtual qio but currently not when the queue length is high.
And Sybase has no checkpoint activity.
