Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

 
Joe Trimble
Advisor

DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Hello, fellow VMS folks...

I just installed a new MSA1000 integrated system for use with OpenVMS. I created 4 new luns, gave them LUN ID numbers, and can seem them on my system as $1$DGA1 thru $1$DGA4.

When I try to initialize or use the new drives with VMS, I am getting hundreds of errors, some random in nature, during the INIT command or while copying (or using backup to copy) file to the new devices.

Here are some notes;
- OpenVMS V8.2
- All relevent patches applied, including VMS82A_FIBRE_SCSI V6.0.
- MSA100 upgraded to firmware version 7
- MSA100 in active/active mode
- 3 luns created using ADG (RAID6)
- 1 lun created using RAID1
- 8 drives (out of 14) hosting the 4 luns
- luns are configured using disks 1-4 and 8-11 to optimize I/O accross the scsi buses.
- each MSA1000 controller has 512MB cache, configured 50% read, 50% write.
- caching is enabled on the configured luns

When I initialize the drives, they will randomly fail as the following example shows:

$ init /system /share /headers=64000 /structure=5 $1$dga3: disk3n
%INIT-F-DATACHECK, write check error

When the drive does init correctly, and then you mount and copy files to the device, these errors are routinely seen.

%BACKUP-E-OPENOUT, error opening DISK3N:[IS$DISK.MISTY.SQL]PZVDEDN.SQL;3 as output
-RMS-E-CRE, ACP file create failed
-SYSTEM-F-DATACHECK, write check error
%BACKUP-E-OPENOUT, error opening DISK3N:[IS$DISK.OPS.MB]MB901S.RPT;1 as output
-RMS-E-CRE, ACP file create failed
-SYSTEM-F-DATACHECK, write check error
%BACKUP-E-OPENOUT, error opening DISK3N:[IS$DISK.OPS.MB]MB911S.RPT;2 as output
-RMS-E-CRE, ACP file create failed
-SYSTEM-W-BADIRECTORY, bad directory file format
%BACKUP-E-CREDIRERR, error creating directory DISK3N:[IS$DISK.OPS.MB.STORED]
-SYSTEM-W-BADIRECTORY, bad directory file format

This is only a small sample.

It seems that copying small files presents the problem more than copying large files. I have successfully copied large (20G) files to a device, and also restored a backup saveset containing several large files to the disk successfully. When I copy lots of small files, like moving user directories over to the new device, then I get hundreds of errors in a short time.

I'm likely going to call HP on this, but wanted to see what your experience and ask for your help.

Thank you!
Joe
29 REPLIES 29
Uwe Zessin
Honored Contributor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Joe,

check the connections on the MSA1000 and make sure they are working with the "OpenVMS" profile.

CLI> show connections
...

CLI> add connection ALPHA1_PGA0 wwpn=10000000-C9244321 profile=OpenVMS
.
Joe Trimble
Advisor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

I have double-checked. All connections are designated profile=OpenVMS

Here's a thought...

I may have switched the names of the two connections between the MSA's and the HBA's in my system since VMS was last booted 2 days ago. The connections are called flash-1 and flash-2. I switched them in the MSA configuration so they would match the order of the HBA cards on the PCI bus in my system. The LUNs and ACLs were created after that switch. Could that make a difference? Could the system be confused? Should I try reinitializing the Alphaserver?

Thanks.
Jon Pinkley
Honored Contributor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Joe,

Disclaimer: I have never used an MSA.

This is only a guess, but because you notice the problems with small files, I would suspect the cache. Does the MSA have any diagnosics it can run to test its memory?

Does the problem go away if you turn off caching?

Summary: My guess is that it is a hardware issue that needs to be fixed.

Jon
it depends
Joe Trimble
Advisor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Thanks, Jon.

I thought about the cache, and that is still on the table, as far as I'm concerned.

I ended up opening a call with HP OpenVMS Support this afternoon. I'm going to get some additional information for the support rep tomorrow morning, and power-cycle the MSA and my AlphaServer system. Unfortunately, today was a work-at-home day, so I'm not physically with the equipment today.

I mentioned the cache to the support rep. He indicated it was a potential problem, but needs additional information, thus my trip to the office early Saturday.

Thanks for the replies so far.. I'll update this thread again when I have more information or questions.

Joe
Rob Leadbeater
Honored Contributor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Hi Joe,

If you've just installed the MSA1000, it's possible that the RAID initialisation is still going on in the background...

This *shouldn't* affect the hosts connected to it, but I guess it might have an effect.

If you're talking to HP support, they'll probably have you hook up the CLI cable to front of the MSA, and do a "show techsupport" (IIRC). Looking through that output, might highlight some issues.

You might also want to check that the firmware on the MSA is current. (5.20 or 7.00 depending on whether you're active/standby or active/active.)

Cheers,

Rob
Rinkens
Advisor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Are these disk write enabled on the controller
maybe a stupid question.

$ init /system /share /headers=64000 /structure=5 $1$dga3: disk3n
%INIT-F-DATACHECK, write check error

here is goes wrong already, you have should first solve this problem.

Check the setting on your msa100

writeback cache
read cache and so on



Joe Trimble
Advisor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Here is an update on this situation. No resolution yet, so it is still being worked by HP Support.

Currently, the Storage team is still examining several log files created over the weekend during several hours of testing. Today it will likely be passed over to the AlphaServer team.

Storage team's initial conclusion is that the MSA1000 is setup and configured correctly. Firmware (V7.0) looks good on both MSA1000's, and is in active/active mode.

I found over the weekend that when I create and connect additional LUNS to another VMS server (DS25, identical VMS V8.2 and patches applied), then that system can write files without errors. So the errors appear to be a problem with the ES40 connection to MSA1000 only.

To answer recent suggestions:
- the initialization is complete on all LUNs; it made no difference in the errors.
- I have tried LUNs with cache both on and off. There seems to be fewer errors with cache turned off, but still plenty of errors are encountered. (From the DS25 test system, no DATACHECK errors occurred regardless of the cache setting.)

I now believe there is some configuration or compatibility issue between my ES40 system and the MSA1000 integrated box. The ES40 has FCA2684 HBA cards installed (same as the DS25), and the firmware is updated to the latest rev (TS1.91X6).

Thanks for your continued assistance and questions. I'll update again later.

Joe
Rob Leadbeater
Honored Contributor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Hi Joe,

Are you using the embedded 2/8 switch in the back of the MSA1000 ?

If so, has the version of FabricOS been checked ?


Cheers,

Rob
Joe Trimble
Advisor

Re: DATACHECK errors with MSA1000 DGA Devices on OpenVMSV8.2

Hi Rob,

Yes, we are using the embedded 2/8 switches.

No, I have not checked the firmware levels there. Do you think that might make a difference? The DS25 is newer hardware than the ES40. Could there be some incompatibility with the ES40?

The HP Storage team has not asked about the switches, at least not yet. I'll try to find out more information on this.

Thanks!
Joe