Operating System - OpenVMS
1754392 Members
2766 Online
108813 Solutions
New Discussion юеВ

Disk trashing, DIOLM and Oracle on esa-12000

 
SOLVED
Go to solution
Miguel Ward
Advisor

Disk trashing, DIOLM and Oracle on esa-12000

Hi:
Have Openvms cluster 7.3-2 with ds25/2cpu+ds20e/2cpu with an esa-12000 disk cabinet.

We now have four Oracle 9.0.1.4 databases on this cluster and are having numerous disk timeouts at different times of the day (disks go into mount verification, etc.).

Oracle states I should have a minimum of 100 for DIOLM but from reading other threads, etc (specifically backup suggestions) this value would seem way to high and I possibly should lower value to 32 at the most or maybe even less.

Has anyone experimented with lower numbers?, I don't want to change this value plus PQL_MDIOLM if it entails any mayor risk.

If need more info please let me know.

Thanks
13 REPLIES 13
Marc Van den Broeck
Trusted Contributor

Re: Disk trashing, DIOLM and Oracle on esa-12000

Hi,

we have two ds25/2cpu in our cluster with disc cabinet (hsz80) and we use the value of 150 for DIOLM (which is the default on Alpha).
The 2 Oracle instances we have on this cluster behaves nice.

Rgds
Marc
Volker Halle
Honored Contributor

Re: Disk trashing, DIOLM and Oracle on esa-12000

Miguel,

please confirm, that you are using HSG80 controllers for accessing your storage.

If so, you can check with the following commands, whether your systems are suffering from QF seen or Seq Tmo:

$ SET TERM/WID=132
$ ANAL/SYS
SDA> FC STDT/ALL
SDA> EXIT

If you see non-zero counters in colums QF seen or Seq Tmo, you may be overloading your HSG80s. You need to issue the above commands on each of your OpenVMS systems.

Volker.
John Abbott_2
Esteemed Contributor

Re: Disk trashing, DIOLM and Oracle on esa-12000


Have you analyzed the error log file for the mnt vfys ? any clues ?

What's in the esa12K; HSZ??, any errors on the controller console(s) ? What ACS version are they running ? SHOW UNIT SHOW xxx anything reconstructing, SHOW THIS SHOW OTHER - consollers OK ? Do you record the HSZ console output ? If not I'd keep a recorded terminal attached and monitor when you get the slow down.

J.
Don't do what Donny Dont does
Miguel Ward
Advisor

Re: Disk trashing, DIOLM and Oracle on esa-12000

Thanks for suggestions.

Running with hsz80,Software V83Z-0, Hardware E04 (is this ok?).

No disks reconstructing, all ok, except for repeated remount operations at certain times of the day for all disks involved (times seem to repeat from day to day, thus would imagine something really io demanding, which also trying to trace...).

Will try to connect console to see if any messages are reported.

Are there any counters I can read on Vms (you suggested a command for hsg80, any equivalent for hsz80?)

anal/err/elv summary gives, for example:

8128 Device Error 21-MAR-2007 06:12:48.71 PM5 DEVICE_ERRORS
8129 Device Error 21-MAR-2007 06:12:48.71 PM5 DEVICE_ERRORS
8130 Device Error 21-MAR-2007 06:12:48.72 PM5 DEVICE_ERRORS
8131 Asynchronous Device Attention 21-MAR-2007 06:17:25.89 PM5 ATTENTIONS
8132 Time Stamp 21-MAR-2007 06:28:40.78 PM5 CONTROL_ENTRIES
8133 Device Error 21-MAR-2007 06:30:24.57 PM5 DEVICE_ERRORS
8134 Asynchronous Device Attention 21-MAR-2007 06:35:28.98 PM5 ATTENTIONS

In just 8 hours I have:
ATTENTIONS 23
CONTROL_ENTRIES 27
DEVICE_ERRORS 44

Thanks
Volker Halle
Honored Contributor

Re: Disk trashing, DIOLM and Oracle on esa-12000

Miguel,

no equivalent commands for HSZ80.

You seem to be having some kind of HW problems. Consider to use DECevent to translate your errorlog. You need to find out, what adapters/devices these errors are being logged for.

Volker.
Bill Hall
Honored Contributor

Re: Disk trashing, DIOLM and Oracle on esa-12000

Miguel,

Make sure you have the latest fibre-scsi ECO for V7.3-2 installed on your cluster. Also, upgrade the firmware on your HSG80s to 8.8-4 (patches 3 and 4 on top of 8.8-2 were the latest I have seen).

If you are already running the latest firmware and VMS ECOs, you may be able to work around the problem by following the VMS V8.2 and above IO tuning recommendations.

Bill
Bill Hall
John Abbott_2
Esteemed Contributor

Re: Disk trashing, DIOLM and Oracle on esa-12000

Hi Miguel

re: Running with hsz80,Software V83Z-0, Hardware E04 (is this ok?).

V83Z-1 (one patch) I believe was the only patch for this version. I think V85Z-4 was the last HSZ ACS release (four patches).

I'll see if I can find out what the -1 patch did for V83.

Please run DECevent as Volker suggests and post the output. Do you know what HBAs you have (KZPCA-AA?)

Regards
John.
Don't do what Donny Dont does
Miguel Ward
Advisor

Re: Disk trashing, DIOLM and Oracle on esa-12000

Thanks to all your help.

DIAG was not working so had been running
anal/err/elv which did not help at all.

Reinstalled DIAG and managed to get the error messages:

To make it short: The reason was that the esa-12000's cache battery has reached it's end of life and was generating all these errors.

Certainly was not expecting THAT to be the reason for all these 'Mount verification is in progress' messages but that is the only recurring error I got from running DIAGNOSE.

I must say it's a weird way of getting my attention but it worked!!!

Best regards from Patagonia, Argentina

Summary of DIAG error below for reference:

**** V3.4 ********************* ENTRY 7541 ********************************


Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V7.3-2
Event sequence number 41774.
Timestamp of occurrence 14-MAR-2007 06:07:57
Time since reboot 6 Day(s) 12:29:59
Host name PM5

System Model COMPAQ AlphaServer DS20E 833 MH

Entry Type 1. Device Error


---- Device Profile ----
Unit $41$DKA2
Product Name HSZ80
Vendor COMPAQ

-- Driver Supplied Info -
Device Firmware Revision V83Z
VMS SCSI Error Type 5. Extended Sense Data from Device
SCSI ID x00
SCSI LUN x00
SCSI SUBLUN x02
Port Status x00000001 NORMAL - normal successful completion
SCSI Command Opcode x00 Test Unit Ready
Command Data
x00
x00
x00
x00
x00

SCSI Status x02 Check Condition
Remaining Byte Length 160.

------- HSx Data -------

Instance Code x028A2301 The CACHE backup battery covering the
mirror cache is near its end of life. The
Memory Address field contains the starting
physical address of the CACHEB1 memory.

Component ID = Value Added Services.
Event Number = x0000008A
Repair Action = x00000023
NR Threshold = x00000001

Template Type x12 Backup Battery Failure.
Template Flags x00 HCE = 0, Event did not occur during Host
Command Execution.
Ctrl Serial # ZG94709635
Ctrl Software Revision V83Z
RAIDSET State x00 NORMAL. All members present and
reconstructed, IF LUN is configured as a
RAIDSET.

Error Code x70 Current Error
Sense Key x06 Unit Attention
ASC & ASCQ xA002 ASC = x00A0
ASCQ = x0002
Backup battery event report.

Memory Address x48000000
Volker Halle
Honored Contributor
Solution

Re: Disk trashing, DIOLM and Oracle on esa-12000

Miguel,

I'm glad we were able to help you diagnose the problem so quickly.

This topic shows a couple of things:

- don't even start to think about changing system parameters, if you don't understand the problem.

- ANAL/ERR/ELV is mostly useless

- DIAGNOSE (DECevent) is still the tool to choose for analyzing disk and IO subsystem errors

- OpenVMS does not initiate mount-verifications without a reason

Volker.