Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

PCM3+ problem

Mohammed Faiz
Honored Contributor

PCM3+ problem

Hi,

I've noticed on a few of our switches that PCM3 is starting a large number of SSH sessions one after the other.
I believe this is contributing towards switch failures (loss of communication with mods, switch reboots etc) that we have started to see since installing PCM3 but can't be sure at the moment.
Here's an example from a log (there's actually about 4 times more entries than the below):

I 02/25/10 23:43:47 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:43:24 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:43:05 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:42:25 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:27:15 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:26:51 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:26:12 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:25:31 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:14:01 mgr: SME SSH from 10.200.1.10 - MANAGER Mode
I 02/25/10 23:13:38 mgr: SME SSH from 10.200.1.10 - MANAGER Mode

We're running PCM3+ on version C.03.10.201 (no updates available that I can see). I don't have any custom scans or poll times set, everything in PCM is at it's defaults with regards to this.
Can anyone explain the SSH behaviour?
8 REPLIES
Hector Manzo
Frequent Advisor

Re: PCM3+ problem

Hello Mohammed,
I've installed PCM 3.1 but have not run into this problem. Are your global CLI Configuration options set to SSH? Are you seeing this same problem with multiple switches? Does execution of "Test Communication Parameters" on switches with problem return with a success under all columns?
Raveek
Occasional Visitor

Re: PCM3+ problem

I have few questions:

1) Are any other plug-ins such as NIM (Network Immunity Manager) also installed in PCM+?

2) Are any other network management applications running on 10.200.1.10, apart from PCM+?

3) You say are seeing this observation on a few switches. Is there anything in common across these switches? Do they belong to same switch family, or are they al gateway switches etc?

4) Do you see the following in PCM agentâ s cs-out.log file?

If there are too many CLI sessions opened by PCM you see the logs as follows:

DEBUG: AbstractCliAlphinity: loginAuthenticate() sending loginCmd= for the device: 10.250.100.10 {enable}
DEBUG: AbstractCliAlphinity: loginAuthenticate() sending loginCmd= for the device: 10.250.100.10 {no page}
DEBUG: AbstractCliAlphinity: loginAuthenticate() sending loginCmd= for the device: 10.250.100.10 {config}
Mohammed Faiz
Honored Contributor

Re: PCM3+ problem

Hi,

Thanks for the responses.

Hector - Yes, we have set the communication options to SSH and the number of connections initiated seems to vary with each switch.
One switch in particular crashes each time PCM tries to talk to it so we've had to exclude it (but that's another issue!)
The communications test passes fine, PCM doesn't seem to think there's a problem with any of the switches.

Raveek - No, we're not using any other PCM modules like NIM.
That server is solely running PCM at the moment.
They are all gateway switches (see other comment below) and I couldn't see anything like that in the cs-out.log file.

After some deeper investigation I think the ssh sessions might be caused by the discovery agent connecting to each of the gateway addresses on the switch. I'm trying to remove those subnets from the "managed subnets" groups by re-addressing the switches in those subnets in order to test this theory.
Does that make sense or is there a better way of doing this?
I'm suprised that PCM doesn't detect the switches that have management addresses on other subnets via LLDP rather than me having to specify each managed subnet manually.
Hector Manzo
Frequent Advisor

Re: PCM3+ problem

Hello Mohammed,

I would be very interested in getting a copy of the crash from your switch. What kind of switch was it that crashed? We need to quickly identify what the switch encountered that it deemed it necessary to crash.

Rather than moving the device to an unmanaged subnet you could also disable discovery through the agent manager?

How aggressive do you have Discovery set. Have you made any changes to the discovery timers for any of the discovery mechanisms? By default they should each be set to trigger daily at 10pm.
- ARP Discovery
- Device Attributes Discovery
- Neighbor Discovery
- Ping Sweep

-Hector
Mohammed Faiz
Honored Contributor

Re: PCM3+ problem

Hi Hector,

It was a 5308xl running E.11.10.
It crashes with the following in the logs:

M 10/02/09 22:03:19 sys: 'NMI event SW:IP=0x00368754 MSR:0x0000b032 LR:0x00368794 Task='mSnmpCtrl' Task ID=0x1352790'
I 01/01/90 00:00:00 system: --------------------------------------------------
I 01/01/90 00:00:00 system: System went down: 10/02/09 22:03:19
I 01/01/90 00:00:00 system: NMI event SW:IP=0x00368754 MSR:0x0000b032 LR:0x003

I'd raised a call with HP support and they wanted us to update to pre-release E.11.15 code but I couldn't afford the downtime for the reboot (seeing as it had crashed so many times before then!)

> Rather than moving the device to an unmanaged subnet you could also disable discovery through the agent manager?

Sorry I should have been clearer. I haven't moved the other switches to an unmanaged subnet, just the same management subnet as the majority of our other switches (to reduce the total number of managed subnets).

The discovery schedules and timings are all at the defaults and all 4 discovery types are enabled.
Hector Manzo
Frequent Advisor

Re: PCM3+ problem

Hello Mohammed,

The crash experienced by your switch has been fixed in newer code. You could open a case with HP support to get the code release. The code you'll need is not available on the web.

Providing the crash information from your 5300 should be enough to identify the fix. Should be fixed in E.11.15 or later.
Hector Manzo
Frequent Advisor

Re: PCM3+ problem

Hello Mohammed,

Were you able to get in touch with ProCurve support and get a copy of the newer 5300 release code?

Thanks,
-Hector
Mohammed Faiz
Honored Contributor

Re: PCM3+ problem

Hi Hector,

I'm not too worried about that switch as it doesn't need to be managed in PCM for the moment. Also, I don't really like running pre-release code on any production hardware unless it's absolutely necessary.
That particular 5308 is scheduled to be replaced with a 5406 in the next 2 months anyway .. Thanks.
Back to the original issue, it looks like the fact that that switch held a lot of gateway addresses within managed subnets is the problem. I'll just have to continue trimming the number of managed subnets down.