Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

T4 FC collection/monitor issue

 
Peter Zeiszler
Trusted Contributor

T4 FC collection/monitor issue

We currently have multiple server clusters.  We run BL860c i2 blades in C7000 enclosures, running OpenVMS 8.4.  The cluster is split between 2 rooms with one room housing 2 blades and the other housing 1 blade.  Our SAN environment is EMC storage (DMX, Clarriion, VNX).  For quite some time we have only been able to capture the FC details from the local SAN Array.

 

We have one system now that was recently patched and rebooted, and it now can't even see the local SAN with the FC tool.  The same patches are applied on other servers and they still see their local SAN array.

 

Now that the system can't even see the local SAN the T4 collection file T4$DATA:T4_(Nodename)_(datecode)_FCM.DAT is empty.

 

$ Set Command T4$Sys:T4$Fc

$ Define T4$Fc_Monitor    T4$Sys:T4$Fc_Monitor_New.Exe

$ T4Fc_Monitor monitor $1$DGA114:

%SYSTEM-F-IVDEVNAM, invalid device name

$ T4Fc_Monitor monitor $1$DGA4210:

%SYSTEM-F-IVDEVNAM, invalid device name

 

I have sent email a couple times to the T4 email address.  Hoping they might have an idea.

 

Peter

 

Our physical layer connections are blade -> enclosure switch -> layer switch -> SAN.  The layer switch has ISL connections to the layer switch in the other room and the other SAN in the other room.

 

Was wondering if anyone has seen this before or even know what might be a solution.

12 REPLIES 12
John Gillings
Honored Contributor

Re: T4 FC collection/monitor issue

Peter,

   We interrupt this assistance to bring you a short message to our sponsors... 

 

HP, you really have to do something about Lithium if you want people like me to continue to provide help and support for your customers! Every time I have to login, something goes wrong. "The web page cannot be found" or similar. I always have to play games with different browsers or clearing cookies or other nonsense. One day I'll just give up. (Not that anyone will do anything about it).

 

Back to your problem...

 

Peter, I don't have a T4 FC collection on Itanium, but I've had them successfully running on Alpha for eons. That you're gettng a file at all suggests something is working. Here are a few things to check. First, look for your FCM monitor process

 

$ show sys/proc=*fcm*
OpenVMS V8.3 on node MYNODE 13-JUL-2015 11:16:29.12 Uptime 253 21:17:12
Pid      Process Name   State Pri I/O   CPU           Page flts Pages
229DCCCE T4229E52C0_FCM HIB   15  12523 0 00:00:01.12       513   280 S

 

Now check what files it has open, in particular the process log file

 

analyze/system

OpenVMS system analyzer

SDA> show process/channel/index=229DCCCE
Process index: 00CE Name: T4229E52C0_FCM Extended PID: 229DCCCE
--------------------------------------------------------------------


Process active channels
-----------------------

Channel CCB Window Status Device/file accessed
------- --- ------ ------ --------------------
0010 7FF60000 00000000 DSA44:
0020 7FF60020 82ED0AC0 DSA100:[VMS$COMMON.T4$SYS]T4$FC_MON.COM;1
0030 7FF60040 8340B5C0 DSA44:[T4$DATA.20150713]T4_MYNODE_13JUL2015_0158_2300_SUBP_FCM.LOG;1
0040 7FF60060 00000000 NLA0:
0050 7FF60080 82AD3440 DSA100:[VMS$COMMON.SYSEXE]DCL.EXE;1 (section file)
0060 7FF600A0 82AC4D80 DSA100:[VMS$COMMON.SYSLIB]DCLTABLES.EXE;27 (section file)
0070 7FF600C0 8451C480 DSA100:[VMS$COMMON.T4$SYS]T4$FC_MONITOR_NEW.EXE;1
0080 7FF600E0 82AC6800 DSA100:[VMS$COMMON.SYSLIB]SMGSHR.EXE;1 (section file)
0090 7FF60100 82AC4E80 DSA100:[VMS$COMMON.SYSLIB]LIBOTS.EXE;1 (section file)
00A0 7FF60120 82AC4E00 DSA100:[VMS$COMMON.SYSLIB]LIBRTL.EXE;1 (section file)
00B0 7FF60140 82AC9200 DSA100:[VMS$COMMON.SYSLIB]DECC$SHR_EV56.EXE;1 (section file)
00C0 7FF60160 82AC8A80 DSA100:[VMS$COMMON.SYSLIB]DPML$SHR.EXE;1 (section file)
00D0 7FF60180 82AC7300 DSA100:[VMS$COMMON.SYSLIB]CMA$TIS_SHR.EXE;1 (section file)
00E0 7FF601A0 82C36B80 DSA44:[T4$DATA.20150713]T4_MYNODE_13JUL2015_0158_2300_FCM.DAT;1
00F0 7FF601C0 00000000 PGA0:

Total number of open channels : 15.
SDA>

 

 

Check the log file to see what the process found. It should look something like this

 

(01:58:02) $ On Warning Then Exit
(01:58:02) $ T4FcMon Collect /Begin="13-JUL-2015 01:58:02.50"/End="23:00" -
/Record=T4_MYNODE_13JUL2015_0158_2300_Fcm.Dat -
/Samp=60
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7141:, Unit:7141, DevType:54 (HSV210) Path: PGA0.5000-1FE1-5012-5648
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7142:, Unit:7142, DevType:54 (HSV210) Path: PGA0.5000-1FE1-5012-564A
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7150:, Unit:7150, DevType:54 (HSV210) Path: PGA0.5000-1FE1-5012-5648
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7151:, Unit:7151, DevType:54 (HSV210) Path: PGB0.5000-1FE1-5012-564B
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7152:, Unit:7152, DevType:54 (HSV210) Path: PGB0.5000-1FE1-5012-564B
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7162:, Unit:7162, DevType:54 (HSV210) Path: PGB0.5000-1FE1-5012-5649
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7164:, Unit:7164, DevType:54 (HSV210) Path: PGB0.5000-1FE1-5012-5649
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7165:, Unit:7165, DevType:54 (HSV210) Path: PGA0.5000-1FE1-5012-564A
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7166:, Unit:7166, DevType:54 (HSV210) Path: PGA0.5000-1FE1-5012-564A
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7250:, Unit:7250, DevType:54 (HSV210) Path: PGB0.5000-1FE1-500F-18DB
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7251:, Unit:7251, DevType:54 (HSV210) Path: PGB0.5000-1FE1-500F-18D9
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7252:, Unit:7252, DevType:54 (HSV210) Path: PGB0.5000-1FE1-500F-18DF
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7262:, Unit:7262, DevType:54 (HSV210) Path: PGA0.5000-1FE1-500F-18DE
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7264:, Unit:7264, DevType:54 (HSV210) Path: PGA0.5000-1FE1-500F-18D8
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7265:, Unit:7265, DevType:54 (HSV210) Path: PGA0.5000-1FE1-500F-18D8
%T4FCMON-I-FCDEVDETECTED, FC device detected is _$1$DGA7266:, Unit:7266, DevType:54 (HSV210) Path: PGB0.5000-1FE1-500F-18DD
%T4FCMON-I-HIBER, Hibernating until 13-JUL-2015 01:58:02.50 before starting data collection for all mounted FC disks/tapes ...
%T4FCMON-I-STARTCOLL, Time is now 13-JUL-2015 01:58:02.76 - Starting to collect data for all mounted FC disks/tapes ...
%T4FCMON-I-COLLEVENT, Data Collection event timed at 13-JUL-2015 01:58:02.76 ...
%T4FCMON-I-COLLEVENT, Data Collection event timed at 13-JUL-2015 01:59:02.76 ...
%T4FCMON-I-COLLEVENT, Data Collection event timed at 13-JUL-2015 02:00:02.76 ...
%T4FCMON-I-COLLEVENT, Data Collection event timed at 13-JUL-2015 02:01:02.76 ...

 

Does it find any drives? Are there collection events?

 

If that doesn't help, please post any messages you find in that log file

A crucible of informative mistakes
Hoff
Honored Contributor

Re: T4 FC collection/monitor issue

JG: FWIW, HPE is looking for help with their web site.  There's a job posting presently open for a senior cross-functional social, CMS, web and SEO role in HPE Galactic HQ Cupertino.

 

JG: Passport logins work in Safari on OS X Yosemite, and I'm just (still) getting the usual "Your post has been changed because invalid HTML was found in the message body. The invalid HTML has been removed. Please review the message and submit the message when you are satisfied." hilariousness.

 

PZ: SET WATCH can also sometimes tell what the code is doing.   Or not doing.    SET WATCH/CLASS=ALL FILE or /CLASS=MAJOR and /CLASS=NONE are the usual knobs.  CMKRNL is required.   John's SDA stuff shows you what's open.  This (undocumented) command shows you what XQP operations were tried.

 

PZ: The DCL verb is T4Fc with the first parameter of Monitor, and not the name of the CLD syntax T4Fc_Monitor.  That probably won't adversely effect things here due to the four-character DCL parsing, but I'd tend to stick to what was defined as DCL has more than its share of parsing quirks.  Mostly for grins, I'd also see if giving the command a few qualifiers that might reference devices will get things further along; /DISPLAY and /CSV_FILE.    It's odd that the examples in the T4$Fc.CLD file are themselves dependent on the quirks of DCL verbiage, too.

 

PZ:  Given the HPE roadmap for OpenVMS, you're probably better off directing this question toward the VSI folks, rather than HPE.   Whether the VSI folks will see a posting here?

Peter Zeiszler
Trusted Contributor

Re: T4 FC collection/monitor issue

Hi Guys,

 

I already checked the logfile and it looked ok.  The .dat file is empty.

 

Here is a snippet of the logfile:

(23:59:02) $ On Warning Then Exit

(23:59:02) $ T4FcMon Collect /Begin="13-JUL-2015 00:01:00.00"/End="13-JUL-2015 23:59:00.00" -

/Record=T4_SETTER_13JUL2015_0001_2359_Fcm.Dat -

/Samp=60

%T4FCMON-I-HIBER, Hibernating until 13-JUL-2015 00:01:00.00 before starting data collection for all mounted FC disks/tapes ...

%T4FCMON-I-STARTCOLL, Time is now 13-JUL-2015 00:01:00.00 - Starting to collect data for all mounted FC disks/tapes ...

%T4FCMON-I-COLLEVENT, Data Collection event timed at 13-JUL-2015 00:01:00.00 ...

%T4FCMON-I-COLLEVENT, Data Collection event timed at 13-JUL-2015 00:02:00.00 ...

%T4FCMON-I-COLLEVENT, Data Collection event timed at 13-JUL-2015 00:03:00.00 ...

%T4FCMON-I-COLLEVENT, Data Collection event timed at 13-JUL-2015 00:04:00.00 ...

 

The other two nodes in the cluster are in the same room and see the local SAN.  This is the only one not finding its local SAN array.  Other hosts only see the local SAN (Defining local SAN as SAN that resides in same room).

 

Hoff - Good idea of trying to take this to VSI.  We don't have a support contract with them yet since we are still on the HP version.  Will see.

 

Will try other syntax on the command to see if it makes a difference.

 

We have retired all of our SAN attached alpha servers.  I noticed the FC details awhile ago and sent the information to HP.  It happens on all of our clusters attached to the EMC storage.  Not sure if it is related to a setting on the SAN, SAN switches, or what.

 

John Gillings
Honored Contributor

Re: T4 FC collection/monitor issue

Peter,

 

   Your log file shows that the monitor didn't detect any devices, so I'd expect the monitor file to be empty. Note there are no %T4FCKMON-I-FCDEVDETECTED messages between the T4FCMON command and %T4FCMON-I-HIBER.

 

That's reasonably consistent with your IVDEVNAM errors.

 

What does SHOW DEVICE DGA say?

 

Does the process running T4FCMON have sufficient privilege?

 

A crucible of informative mistakes
Peter Zeiszler
Trusted Contributor

Re: T4 FC collection/monitor issue

I am doing the manual check and the normal t4 collection from SYSTEM - so there is enough privs.

 

The DG devices all show up ok.  System used to see local array.  Was recently rebooted to discover 2 new EMC arrays. 

One other system in cluster was rebooted and it sees arrays.  There is one more yet to reboot and it still sees its local array.

 

I ran the "set file watch" and captured the information.  Captured one that worked.  Have to still compare the information and then will post for more eyes to see.

 

Sent query to VSI to see if they are going to take on the T4 tool.

 

Ran the command with /csv and with /display.  Still no difference on the one that won't see any devices now.

Dennis Handly
Acclaimed Contributor

Re: HP Passport login problems

>you really have to do something about Lithium ... Every time I have to login, something goes wrong. "The web page cannot be found" or similar. I always have to play games with different browsers or clearing cookies

 

I'm not sure it's Lithium, since it is the HP Passport login?

 

Yes, I have that cookie removal in firefox down to a science.  But no problems today.  :-)

This has been reported several times in the feedback forum:

http://h30499.www3.hp.com/t5/Community-Feedback-Suggestions/BAD-GATEWAY/m-p/5316785

http://h30499.www3.hp.com/t5/Community-Feedback-Suggestions/Passport-Login-Problem/m-p/6246353

http://h30499.www3.hp.com/t5/Community-Feedback-Suggestions/Login-giving-blank-page/m-p/6691849

 

>There's a job posting presently open ... in HPE Galactic HQ Cupertino.

 

Are you sure Cupertino?  HP sold that to Apple years ago.  HQ is still in Palo Alto.

Peter Zeiszler
Trusted Contributor

Re: T4 FC collection/monitor issue

I meant to attach the captures but got busy.

 

Data includes SETTER (which fails) and POODLE (Which works on the local rooms' SAN).

 

Peter Zeiszler
Trusted Contributor

Re: T4 FC collection/monitor issue

Spent some time digging into the differences between the hosts.  There are two things I found.

 

The disks that I can't scan with the FC tool all indicate "shadow set virtual unit".

 

On the disks on SETTER the disks are:

Disk $1$DGA114:, device type EMC SYMMETRIX, is online, device has multiple I/O

    paths, member of shadow set DSA4000:, shadow set virtual unit, served to

    cluster via MSCP Server, error logging is enabled.

Disk $1$DGA4210:, device type EMC SYMMETRIX, is online, device has multiple I/O

    paths, member of shadow set DSA4000:, shadow set virtual unit, served to

    cluster via MSCP Server, error logging is enabled.

 

On Poodle the disks are:

Disk $1$DGA114:, device type EMC SYMMETRIX, is online, device has multiple I/O

    paths, member of shadow set DSA4000:, shadow set virtual unit, served to

    cluster via MSCP Server, error logging is enabled.

Disk $1$DGA4210:, device type EMC SYMMETRIX, is online, device has multiple I/O

    paths, member of shadow set DSA4000:, served to cluster via MSCP Server,

    error logging is enabled.

 

On DOBIE the disks are:

Disk $1$DGA114:, device type EMC SYMMETRIX, is online, device has multiple I/O

    paths, member of shadow set DSA4000:, served to cluster via MSCP Server,

    error logging is enabled.

Disk $1$DGA4210:, device type EMC SYMMETRIX, is online, device has multiple I/O

    paths, member of shadow set DSA4000:, served to cluster via MSCP Server,

    error logging is enabled.

 

On Setter they all say “%SYSTEM-F-IVDEVNAM, invalid device name”.  If I try a non-existing device it gives this error: “%SYSTEM-W-NOSUCHDEV, no such device available”.

On Poodle I can monitor DGA4210.

On Dobie I can monitor DGA114 and DGA4210.

 

Please note the highlighted area for the disk shadow set.

 

I mount this disk during startup with the same command on all 3 systems.

    MOUNT/SYSTEM/NOREBUILD/NOASSIST/WIN=30 DSA4000  -

     /SHADOW=($1$DGA114:,$1$DGA4210:) Prom_backup2  Prom_backup2

 

The second thing I found is that the "primary path" on those disks that say "virtual unit" are all MSCP.  Current path are the direct I/O of the SAN.

 

We enable MSCP because we had an instance where someone unplugged the wrong cables.  System only stayed up because the MSCP served disks.

 

Example:

Setter disk:

I/O paths to device 5

Path MSCP (DOBIE), primary

Error count 0 Operations completed 0

Last switched to time: Never Count 0

Last switched from time: 11-JUN-2015 10:46:15.50

Path FGA0.5006-048A-CAFE-410D (SETTER), current

Error count 0 Operations completed 145917506

Last switched to time: 11-JUN-2015 10:46:15.50 Count 1

Last switched from time: Never

Path FGB0.5006-048A-CAFE-4102 (SETTER)

Error count 0 Operations completed 844

Last switched to time: Never Count 0

Last switched from time: Never

Path FGC0.5006-048A-CAFE-410D (SETTER)

Error count 0 Operations completed 844

Last switched to time: Never Count 0

Last switched from time: Never

Path FGD0.5006-048A-CAFE-4102 (SETTER)

Error count 0 Operations completed 844

Last switched to time: Never Count 0

Last switched from time: Never

 

 

 

I can't move the "primary" designation unless I disable MSCP on all nodes and reboot the cluster.  Then I also wouldn't have MSCP.  Odd of someone unplugging wrong cable are slim now.

 

Does this look like something that has to be fixed in the executable level or does anyone have an idea to create a work around?

 

Hoff
Honored Contributor

Re: T4 FC collection/monitor issue

T4 is broken.  Minimally, it is reporting a bogus error.  Whether there's a workaround or a fix beyond that depends greatly on what T4 is doing here.  T4 is heavily based on MONITOR, so try collecting data on these disks using that tool — see if that tool is (also) blowing up, and which would then give you a path to report a bug to HP.  Also see if it's TDC that's blowing up — or if it's not installed, then install TDC and see if that works (around this).

 

This assuming that neither HP nor VSI are particularly  willing to wade into T4.

 

Obvious work-around: find or write your own tool that can fetch the data, possibly compatible with the rest of the T4 giblets.

 

I've not seen much in the way of source code for T4 published.  Asking HP to open-source it might be another option, though that's obviously longer-term and a rather longer shot.

 
Peter Zeiszler
Trusted Contributor

Re: T4 FC collection/monitor issue

Hoff - I think you nailed it being bug with Monitor.  Disk is accessed now from the Current path - we have to force them to specific paths for the EMC arrays.  But the first path detected during boot is the MSCP on some of the disks - usually the remote room's SAN array.

 

From documentation on monitor:

http://h71000.www7.hp.com/doc/84final/6048/6048pro_047.html

HP OpenVMS System Management Utilities Reference Manual

An "R" following the device name indicates that the displayed statistics represent I/O operations requested by nodes using remote access.

If an "R" does not appear after the device name, the displayed statistics represent I/O operations issued by nodes with direct access. These I/O operations might include those issued by the MSCP server on behalf of remote requests.

 

From Monitor - Notice the R after the volume name.

 

OpenVMS Monitor Utility
DISK I/O STATISTICS
on node POODLE
17-JUL-2015 11:37:13.39

I/O Operation Rate CUR AVE MIN MAX
$1$DGA60: (SETTER) IA64COMMON R 0.33 0.69 0.00 5.66

 

Show dev full on the disk (removed the other disk and shadow set info).  Notice Primary (first path found) is MSCP.  But Current is the direct path.

 

Disk $1$DGA60:, device type EMC SYMMETRIX, is online, device has multiple I/O
paths, member of shadow set DSA1000:, shadow set virtual unit, served to
cluster via MSCP Server, error logging is enabled.

Error count 0 Shadow member operation count 9668
Current preferred CPU Id 13 Fastpath 1
Host name "POODLE" Host type, avail HP BL860c i2 (1.60GHz/5.0MB), yes
Alternate host name "SETTER" Alt. type, avail HP BL860c i2 (1.60GHz/5.0MB), yes
Allocation class 1

I/O paths to device 5

Path MSCP (SETTER), primary
Error count 0 Operations completed 0
Last switched to time: Never Count 0
Last switched from time: 17-JUL-2015 10:45:36.48

Path FGA0.5006-048A-CAFE-4102 (POODLE)
Error count 0 Operations completed 53
Last switched to time: 17-JUL-2015 10:45:36.48 Count 1
Last switched from time: 17-JUL-2015 10:45:37.48

Path FGB0.5006-048A-CAFE-410D (POODLE), current
Error count 0 Operations completed 9509
Last switched to time: 17-JUL-2015 10:45:37.48 Count 1
Last switched from time: Never

Path FGC0.5006-048A-CAFE-4102 (POODLE)
Error count 0 Operations completed 53
Last switched to time: Never Count 0
Last switched from time: Never

Path FGD0.5006-048A-CAFE-410D (POODLE)
Error count 0 Operations completed 53
Last switched to time: Never Count 0
Last switched from time: Never

 

Volker Halle
Honored Contributor

Re: T4 FC collection/monitor issue

Peter,

 

T4$FC_MONITOR_NEW.EXE is in no way related to MONITOR.EXE, but it may be using the same or similar code to determine, if a device to be monitored is a FC device with a LOCAL ACCESS PATH !

 

You will get the same '%SYSTEM-F-IVDEVNAM, invalid device name' error message from t4fcmon, if you try to monitor e.g. a local SCSI device. The code checking for a valid FC device with a local access path may be just checking the attributes of the 'primary' path instead the attributes of the 'current' path to the device. If it thinks it found a FC device via a MSCP-served access path, it correctly rejects monitoring that device.

 

Something in your configuration is causing the MSCP-path to the SAN devices to be configured BEFORE the local SAN path during boot. This seems to 'confuse' both MONITOR and T4$FC_MONITOR_NEW in the same way.

 

 

Volker.

Peter Zeiszler
Trusted Contributor

Re: T4 FC collection/monitor issue

It is probably using same code to determine remote vs local.  The MSCP being PRIMARY path is part of the timing we have had since day one using the EMC arrays.  Only now, after this latest reboot, it indicated ALL arrays LUNS primary path as the MSCP - which means it found the MSCP first.  The timing also includes delays in our boot and rescanning via sysman to see all paths for all disks prior to continuing to boot.

 

I tried adding a delay of 1 minute into the boot sequence in sypagswap before we scan for disks but that didn't help on discovering the remote room's SAN path first.   Unfortunately my "test' system is a stand alone system and never had this problem. 

 

When I do a monitor - it is showing all of the disks are R (remote) on this one system.  On other systems it is showing the "remote" disks as those that have primary path as MSCP. 

 

I think it is using the "primary path" when trying to determine what to read, and I think it should be using "current path".