- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: %SCSI_CPU-W-RETRY errors on boot
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-14-2010 02:14 PM
тАО07-14-2010 02:14 PM
The entire cluster (all are Alphas VMS 7.3-2 with patches and EVA SAN storage) was rebooted last week and this node came up with no problems. But when I rebooted this node last night (and two more attempts today), it failed to make a local connection to this Raid cabinet, due to this error:
%SCSI_CPU-W-RETRY, port PKC0 alloclass 3 status 94 inconsistency
That error message shows up five times ("retry n/5)" before the %MSCPLOAD-I-CONFIGSCAN message and then another five times after the %STDRV-I-STARTUP and %STDRV-I-LOG messages.
See the text attachment for the console output
Relavant system parameters are:
ALLOCLASS=1
DEVICE_NAMING=1
In SYS$DEVICES.DAT (shown fully in the text attachment) this node's port PKC is set to allocation class 3.
A >>>SHOW DEVICE properly displays the PKC controller and all the devices that the HSZ70 cabinet present.
On my last boot attempt I used -flags 0,30000 and these lines were displayed immediately before the first %SCSI_CPU message (I do not know if these are normal or not):
%LOADER-I-INIT, initializing SYS$PKQDRIVER.EXE
%PKQDRIVER-I- PKC0, loading firmware version 5.57 from console
%PKQDRIVER-I- PKC0, initialization complete; port online
Nothing relevant was changed since the last reboot. Thanks in advance for any help you can offer.
Jess
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-14-2010 03:51 PM
тАО07-14-2010 03:51 PM
Re: %SCSI_CPU-W-RETRY errors on boot
We no longer have an HSZ70, but when we had a SCSI bus shared between two ES40s, when we did a >>> SHOW DEVICE, the SRM console would display the SCSI adapter of the other VMS system (with VMS in the description). We were not using Port Allocation classes, or new device naming, so that is one difference. We also did not use a SCSI hub.
I have never seen the SCSI_CPU message (that I remember), but the text suggests that the system may be checking if the other systems sharing the buss have the same allocation class. Do the other 3 VMS nodes have allocation class 3 on the SCSI controllers connected to the SCSI hub?
What were the "non-relevant" changes since the last boot? :-)
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-14-2010 08:37 PM
тАО07-14-2010 08:37 PM
Re: %SCSI_CPU-W-RETRY errors on boot
If you look at the message, it looks like the same port allocation class is used for different interfaces.
%SCSI_CPU-W-RETRY, port PKC0 alloclass 3 status 94 inconsistency
Please check if you have set the same port allocation class and do not assign the same port allocation class to different interfaces.
Regards,
Ketan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-15-2010 06:46 AM
тАО07-15-2010 06:46 AM
Re: %SCSI_CPU-W-RETRY errors on boot
I've recently seen this message as well under the following circumstances:
ES40 configured with shared SCSI (HSZ70) and a common SCSI system disk (V7.3-1).
2 HBAs have then been added to this ES40 and a new root has been configured on the SAN system disk, from which the ES40 had then been booted. The SAN system disk was actually a copy of the SCSI system disk (also V7.3-1).
While booting the ES40 from the SAN system disk, I saw those messages. I had checked the system parameters and port allocation classes and they looked o.k. There are 2 systems still booted from the HSZ70 system disk in this cluster, who are still connected to the shared SCSI bus.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-15-2010 08:12 AM
тАО07-15-2010 08:12 AM
SolutionCheck the system parameters and SYS$DEVICES on each node in the cluster.
There could be a node that has the port allocation class configured but is not connected to the HSZ
In [CLUSTER]SCCPUVER there is some code that compares nodes that have a port allocation class configured and and a list of nodes this node can find by looking at whats on the scsi buses.
It will not surprise Volker that I wrote an SDA extension to display that internal list :-)
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-15-2010 11:11 AM
тАО07-15-2010 11:11 AM
Re: %SCSI_CPU-W-RETRY errors on boot
you tell us about it, but don't let us know where it is (download) or how to use it :-) :-)
Please, please ... share !!
p.s. how old or new does my VMS version have to be to use your SDA extension ??
Respectfully,
Verne
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-15-2010 12:36 PM
тАО07-15-2010 12:36 PM
Re: %SCSI_CPU-W-RETRY errors on boot
You were right. I had, of course, carefully checked the systems that were connected to the HSZ. But I had not thought it necessary to check the other nodes in the cluster.
Sure enough, one of them used to be connected to this HSZ and it still had allocation class 3 defined for a SCSI port that currently has nothing connected to it.
I can't reboot that system until late tonight or tomorrow, so I won't know if this fixed my problem with the other node until then. I will post the results and award points after that.
Assuming it does fix it, I must say that I am quite surprised by this restriction. What is this software check protecting me against?
This would also mean that if my two other nodes that are still using the HSZ rebooted, they would not be able to access the HSZ either. That would leave the cluster with no path to access the HSZ until the unconnected node was reconfigured and rebooted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-15-2010 06:46 PM
тАО07-15-2010 06:46 PM
Re: %SCSI_CPU-W-RETRY errors on boot
I then rebooted the problem system and there were no error %SCSI_CPU message and it used its connection to the HSZ70s.
Would be nice if that error message was fully documented.
$ help /message scsi_cpu ! VMS 7.3-2
%MSGHLP-F-NOTFOUND, message not found in Help Message database
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-16-2010 12:29 AM
тАО07-16-2010 12:29 AM
Re: %SCSI_CPU-W-RETRY errors on boot
I've been caught myself by nodes having a port allocation class defined but not being connected to that shared scsi bus.
Each node will take a lock on a resource named after any non-zero port allocation class defined. e.g for PAC=10 a EX lock on resource called IOGEN$_10 is queued. Getting information about these locks allows a list to be created of nodes who have that PAC defined. This can be seen using
SDA SHOW RESOURCE/NAME=IOGEN$_10
This list is compared against a list of nodes that that be seen on shared SCSI buses. If the two nodes do not match then SCSI_CPU-W-RETRY is output to the console and only to the console.
Purely Personal Opinion
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-16-2010 12:37 AM
тАО07-16-2010 12:37 AM
Re: %SCSI_CPU-W-RETRY errors on boot
This check is to "protect" you against a misconfiguration that would allow two different physical devices to have the same ALLDEVNAM. It is a stricter check than the pre-PAC (Port Allocation Class) checks, which allowed multiple devices to have the same name.
If you had a device $3$DKC100: on the HSZ70, you disconnected from the HSZ70, and then plugged in a SCSI disk with SCSI ID 1 to the "unused" SCSI adapter, it would also show up as $3$DKC100: If $3$DKC100: was being MSCP served and was mounted on the system with the "unused" adapter, I am not sure what would happen to the node you just plugged the disk into. The disk may go into mount verification, or perhaps the node would crash. At any rate, it isn't something that I would want to try on a production system. Since the HSZ uses HV differential wide SCSI, you probably wouldn't have a disk that your could easily use with the adapter, but you get the point.
Here's what the code does (paraphrased).
When the SCSI initialization is done, inquiries are sent to every device on bus, and for every "CPU" response, an entry is filled in that has the PAC, SCSSYSTEMID, controller id (letter) and SCSI info. When all the devices have been configured, all CPUs that the system can directly see on the SCSI bus will be in the list. For each PAC in the list, the lock manager is used to determine which cluster nodes have configured an interface with the PAC. If the there is a system (currently in the cluster) that has the PAC configured, but its SCSSYSTEMID isn't in the in the PAC_ID_LIST, then SS$_DUPLNAM status is returned (this is the status 94 in the cryptic message). However, the routine that has determined the offending SCSSYSTEMID does not print the message, and therefore this useful piece of information is not used in the messages printed on the console. If the routine that was checking the locks printed the message for every SCSSYSTEMID that it did not find in the list, it would make the message much more meaningful, i.e. something like
port PKC0 alloclass 3 configured on ES40_1 PKC but not seen on local PKC SCSI bus
The take away message is the following:
If you have PAC configured SCSI adapters, and you plan to remove the connection to the shared bus permanently, plan to reconfigure PAC and reboot the system at the time you remove the cable.
I see that Ian just posted with similar info, including the resource name of the lock used to find the SCSSYSTEMIDs of the current cluster members that have the PAC configured.
Jon