Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

PCM+ V2.0 Questions/Thoughts

SOLVED
Go to solution
Terry Kirk
Advisor

PCM+ V2.0 Questions/Thoughts

I've just upgraded my PCM+ from V1.6 to V2.0 and have noticed some odd things:
1) The traffic monitor seems to take about 27 minutes to start showing any data after a configuration change. This seems to be true whether I have one switch port or 56. The manual says it should take about five. I'm running PCM+ on a DL380, so I don't think power is the problem.
2) It's very confused about the latest firmware versions; I have 9.22 on my 5308 switches and 10.02 is available on HP's web site but PCM+ is telling me that the latest is 8.50. I found the file and that is actually what is downloaded from HP. PCM+ 1.6 was working - I used it to upgrade my 5300s to 9.22.
3) PCM+ still can't understand my 420 Access Points, no matter what version of firmware they are running; no snmp traps or syslog. (There is a known problem with PCM+ and V2.1 of the AP firmware in getting the config).
4) The Reports menu only has one report in it, and no way to create new ones.

Any answers/questions/thoughts would be appreciated.

Terry
Terry
18 REPLIES
Les Ligetfalvy
Esteemed Contributor

Re: PCM+ V2.0 Questions/Thoughts

1. My Traffic Monitor went from bad to worse following an in-place upgrade from 1.6a to 2.0 so I ended up scrubbing all traces of PCM from the drive and registry and installed it as recontituted virgin. That seemed to clear up most of the Traffic Monitor issues.

2. Did you download the latest PRP file using the "Download Now" button. I did, and it now lists the latest version in the selection box but the selection does not stick. I have my 2524 and 2848 at the latest version so they report fine, but my 5308 switches, I have one rev back at 9.22 and PCM will not let me set that as preferred.

#3, I cannot help with and #4, I have not checked out yet.
Steve Britt
Respected Contributor

Re: PCM+ V2.0 Questions/Thoughts

Terry,

I'm sorry that you're having problems, and I'll try to address issue #1 that you list. Assuming that the GUI screens used to configure Traffic Monitor have responded relatively quickly then the issue is likely that the traffic data collector is not being restarted in a timely manner, that when it is restarted it's taking a long time to find a device it can communicate with, or that Traffic Monitor isn't handling the size of your network well. The first and last possibilities would affect data no matter what means of data collection you've specified (polled statistics or sampling), while the second case should only manifest itself with ports that statistics are being polled from.

Let's start with the second case first, as it seems the most likely. Assuming that the collector is restarted in a timely manner following configuration changes (this *should* happen with no intervention from you) then a delay in procuring data usually indicates that Traffic Monitor is trying to communicate with one or more unreachable devices, and that unfortunately the device(s) are near the beginning of the list that it's working through. If you look at the interconnect devices table (by clicking on the Interconnect Devices node in the PCM tree and checking the Devices List tab in the right pane that's displayed) you can see the communication status of each device that PCM knows about. Make sure that any that are red (indicating that they're unreachable or have other serious issues) are *not* in the list of those to be monitored by Traffic Monitor. Traffic Monitor could be smarter about how it communicates with a device - today it basically treats every port individually, timing out on each unreachable port of a device even if it has already concluded that the device is unreachable from previous ports - and thus unresponsive devices can really prolong the amount of time before it works its way to a responding device. This behavior should only apply to ports that you requested polled statistics from by the way - if there are ports that you requested sampled data on you should see data arrive relatively quickly (2 minutes or so) from those ports irrespective of whether the polled statistics are subject to the problem I described.

The first (and based on testing anyway) less likely case is that following a configuration change the traffic data collector is not getting restarted in a timely manner. You can force such a restart if you wish by restarting the HP ProCurve Traffic Launch Service in your services control panel. Failure to restart in a timely manner would mean that you're getting no data at all - neither sampled or polled statistics - for your 27 minutes.

There last possible explanation is that you're stretching the scalability limits of Traffic Monitor in PCM 2.0. I don't know how many ports you have in all, but Traffic Monitor seems to work best (assuming that your devices are responding anyway) in environments with 3500 or fewer ports. And that's assuming that you're fairly selective about which ones you request sampled data on, with polled statistics being gathered from most - it takes about 30x as many resources on the system for each segment that sampled data is gathered from compared to that same segment if polled statistics only were gathered for it. We improved the robustness and added SNMPv3 capability to the traffic data collector in PCM 2.0, but I'm afraid that it cost us some scalability to do so; we definitely intend to improve the scalability in subsequent releases.

The behaviors that *I* described are not new to PCM 2.0 by the way. I'm not sure why you're seeing the monitoring delay crop up with your upgrade - perhaps it's coincidence (regarding an unreachable device) or perhaps your problem is simply not caused by any of the issues I described. But as a PCM developer I can say that we are acutely aware of the issues I described and a number of other quirks that Traffic Monitor has relating to topological and configuration changes, and that they are absolutely on our list of things to consider for upcoming releases of PCM.
Drew_38
Frequent Advisor

Re: PCM+ V2.0 Questions/Thoughts

If we're throwing in PCM+ 2.0 problems and questions I might as well add mine.

Commenting on Terry's issues first though 2.0 seems a lot more zippy than 1.6 and I did an install straight over my 1.6 installation. So I'm happy with the resposiveness.

However while 2.0 is working ok as far as monitoring HP and Cisco switches I find that the 3Com switches we have (only 6 though) are being marked as 'unreachable'. I can re-discover them and they go green, but then after a few hours they go red and 'unreachable'. 1.6 worked fine with them. I realise in 2.0 there is the possibility to generate add in files to manage non-HP equipment, but I don't want to do that; I just want it to work with 3Com like 1.6 did.

On a different tip who here is annoyed at having to pay for the upgrade? We've only had PCM+ 1.6 for three months and having to shell out for the upgrade license seems most unfortunate. We'll be moaning to our UK HP Procurve about this soon.

Drew
Les Ligetfalvy
Esteemed Contributor

Re: PCM+ V2.0 Questions/Thoughts

In response to Drew, my old Nortel 350 switches that were green in 1.6 now also report as unreachable. I too can rediscover them and they go green for a short time but revert to unreachable.

As for paying for an upgrade, it is customary when the major version increments as it did here from 1 to 2, that it is not free. That said, with a version increase from 1 to 2 one would expect there to be significant new features and it may be debated for some time whether the number and quality of these new features warrant the 2.0 version and if it is worth the price.

Bug fixes are expected at no cost but there is a growing trend to have software maintenance that includes suppport and upgrades. I am accustomed to paying an annual fee for support and upgrades on many of my products but also expect good service in return.

There is generally in the industry, a cutoff window from when a new version is nearing release, that customers buying the current Gold version are entitled to a free upgrade. Although 3 months seems a bit of a stretch, I would ask your HP sales rep whether HP has such a policy.
Drew_38
Frequent Advisor

Re: PCM+ V2.0 Questions/Thoughts

In reponse to Les; all good points you make.

Tying my two issues together although I think a couple of the version 2 features are going to be very useful and are worth paying for, the fact that a least one thing appears to have been broken in the upgrade (my 3Com and Les's Nortel switch lack of proper monitoring) makes me a bit cross at having to pay for it!

Yes we will be taking this up with our HP sales rep to see if there is some formula for those who have only recently bought PCM+.
Terry Kirk
Advisor

Re: PCM+ V2.0 Questions/Thoughts

Steve;

One question: Is there any way to get a list of all of the devices/ports that Traffic Monitor is watching? I have not been able to find such a list - possibly a Senior Analyst moment :{)

Response in 2.0 seems quite snappy.

I tried restarting TLS service manually, but this does not seem to have changed anything. On the Traffic Monitor screen, it says "Segments responding 0 of 61" in the lower left and "Processing update..." in the lower right.
While this is happening, Trafficd.exe and mysqld-max-nt.exe are using most of the cpu time.
The drop-down arrow on the Selected segment field keeps blinking in and out.
Update: it took 29 minutes for some of the nodes to respond.

In terms of network size, I have 67 Network Devices, 386 End Nodes, three subnets and 10 VLANs. The three subnets are at three different sites linked by a WAN. I don't know how many ports I have, but it shouldn't be anywhere close to 3500.

I don't think it's a device down problem as none of my switches currently show as down. I did try this with only one port (I removed all of the automatic traffic monitoring and went through and manually removed each switch to be sure - so that the Traffic Monitor page said "Segments responding 0 of 0". The port I configured was on the 2524 in my office. It was the uplink port (port 24) and I know it was up because my laptop is connected to it.

Drew & Les;

I have a couple of Cisco routers and a switch card in our BL10e blade server chassis with two switches (not made by HP Procurve) that show up fine with green status. Not sure why you would be having problems. Possibly a MIB issue?
Terry
Hector Manzo
Frequent Advisor

Re: PCM+ V2.0 Questions/Thoughts

Thanks Steve for the great information on traffic.

I'll try to answer the other three question from the initial posting.

2) Firmware upgrades.

The file used to match the switch with the lastest firmware release available is incorrect. HP is working on making the needed changes and will be posted soon. This file is tied to posting software on the web so it should be kept updated.

3) 420AP
PCM should find and map 420 w/o a problem.

I'll have to test the receipt of traps and syslog messages.

4) The Reports menu only has one report in it, and no way to create new ones.

You're right PCM does not allow you to create new reports. The number of reports are limited at the moment. Could you provide some input on the type of reports that would be useful to you?
Drew_38
Frequent Advisor

Re: PCM+ V2.0 Questions/Thoughts

In response to Terry's comment a couple of messages up regarding various non-HP kit going red in PCM+ v2. We have a few Cisco APs and switches and Foundry switches as well. v2 can deal with those no problem. It's just the 3com and Les' Nortel which it's got problems with. I agree there might be a solution involving fiddling around with MIBs. However PCM+ v1.6 didn't have this problem so I'd really rather HP come up with a solution rather than me have to work out what they have broken in an upgrade I have to pay for!!!
Terry Kirk
Advisor

Re: PCM+ V2.0 Questions/Thoughts

Hector;

I'll keep an eye on the firmware listing. I was hoping to upgrade this weekend as we are down, but it can wait until Labour Day.

On the 420, PCM can find and map it with no problem, I just can't seem to do much with it. The Device Manager now has a 'Trap Receiver' tab, which was missing in V1.6, but neither traps or syslog messages seem to get through. I did find a log file with messages that indicated that PCM was receiving the traps from the APs, but did not understand them. I took a quick look, but could not find the log file at the moment. The traps are being sent because I'm able to receive them using Kiwi syslogd.

For reports, I was looking for a way to list the MAC address in the list of Access Points. I was able to print the list, but I had to go through one-by-one and write the MAC address on the report. In short, some way to add/remove columns from the displayed list.

Given that PCM now stores info in a mysql database, could I write report scripts in Perl?

Thanks for the help!
Terry
Steve Britt
Respected Contributor
Solution

Re: PCM+ V2.0 Questions/Thoughts

Terry,

There isn't a nicely formatted report or table that indicates the segments that Traffic Monitor is *trying* to monitor. The best representation at the moment is really the Traffic Devices tab that is displayed (in the right pane) when you have clicked on the "Interconnect Devices" node in PCM's tree (on the left). This shows a list of devices and, when each device is expanded, the ports on the devices that data collection has been requested for. It also shows the last time (if any) that data was obtained from each port. Do you find this insufficient, and if so what would be a better way to convey the information?

So do I understand that you're not getting traffic from your one single port that you tried to collect from as a test? Or did it just take an inordinately long time? As far as the size of your network, BTW, it doesn't sound like Traffic Monitor should have an issue in terms of scalability if we can get this more fundamental issue that's blocking you resolved.

Which brings me to this ... I conclude from what you've said that you have the Traffic Wizard (automatic traffic configuration) enabled. Its job is essentially to identify the interswitch links in your topology and set up monitoring on those. Unfortunately we have seen some issues with this wizard as it can continually keep tweaking the set of ports that the data collector is to monitor, causing the data collector to restart quite often, consequently disrupting the continuity of data collection and causing high CPU consumption of trafficd and mySQL (as you report). I suspect that this may be the root of your problem, and I suggest that you disable the Traffic Wizard by disabling automatic traffic configuration. You can always re-enable it for a short period after a topology change if you wish, then disable it again.

You can confirm that the collector, trafficd, is being restarted too often to get any traction by looking at a log file it keeps as it runs. The file is in the server\logs subdirectory of your PCM server installation directory and is called "Traffic.log"; new data is appended at the end of it. When the "Traffic.log" file reaches its size limit it is renamed to "Traffic.old" and a new file started. Note that this log file is normally not something that a user will (or should need to) examine, but it should help us determine what's happening in your case. Each time trafficd is restarted it logs information to this file about the number of segments (segment = device port) it's been asked to monitor with sampling, the number it was asked to poll statistics from, and then it logs information about it's contact with each port from that point forward. So when you first open the file and scroll to the end you should see some messages that indicate that trafficd is periodically trying to contact your device ports that it can't reach, and hopefully some information about the problem that is preventing it from doing so. If possible, could you check the log file, starting from the end, to find the last time the data collector restarted? You should see a message of the form "Traffic data collector 2.0 started" for each restart, followed by messages about how many sampling segments and polled segments there are and then finally followed by the "steady state" messages regarding the state of data acquisition on each requested segment.

If the Traffic Wizard is causing your woes then you should see frequent restarts (each log entry is timestamped). This can be remedied as I suggest above by disabling automatic traffic configuration - note that you can retain the configuration the wizard has already set up for you even after you disable it. If this doesn't solve your problem, I have talked to Hector Manzo and at this point it would probably be best to open a support call with ProCurve so we can get into gorier details without making everyone else read my inordinately long messages ...

Regards,

SVB
Steve Britt
Respected Contributor

Re: PCM+ V2.0 Questions/Thoughts

Drew and Les,

Thanks for the information on the third-party devices that are green for a while and then go inexplicably red. We will be checking into it ...

Regards,

SVB
Terry Kirk
Advisor

Re: PCM+ V2.0 Questions/Thoughts

Steve;

Sorry for the delay, couple of other projects got in the way.

The Traffic Devices tab was what I was looking for. I don't usually click on Interconnect Devices so I missed it.

The problem is that, when a change is made, it takes about 27 minutes for the Traffic Monitor to show anything.

I tried disabling the Automatic Traffic Configuration and that seems to have gotten rid of the intermittant stops in reporting. These would occur when the traffic configuration was changed or when PCM got started again and also would happen periodically through the day. The long pause at the beginning is still there though. When this pause is occurring, I don't see any messages in the log file.

Disabling the Automatic Traffic Configuration seems to have fixed the most annoying part of the problem; one of the pauses in collection always seemed to happen just when I was trying to monitor something.

While poking around, I noticed two other problems:
1) In the remote client, when creating a policy, you can't select the groups it is to apply to - the window is too small (only a couple of pixels high). This is not a problem when running the client directly on the PCM server.
2) In the Misconfiguration report (from the Network Consistancy module), everything is reported by IP address. DNS names would be much nicer :{)

Thanks for your help!
Terry
Ted Nguyen
Advisor

Re: PCM+ V2.0 Questions/Thoughts

Drew and Les,

Thanks for the information on the third-party devices that are green for a while and then go inexplicably red. We will be checking into it ...

These third-party devices might only support SNMPv1. Could you confirm this? Could you delete these devices from PCM and use Manual Device Discovery Wizard to discover these devices to see if this makes a diffence?
Drew_38
Frequent Advisor

Re: PCM+ V2.0 Questions/Thoughts

Ted;

I'm just trying deleting a device and re-discovering it for you. I'll report back.

Are you saying though that devices which don't have SNMP v2 support can't be discovered and their status got by PCM+ v2? I ask as PCM+ v1.6 discovered and got the status for these devices no problem at all.

Drew
Drew_38
Frequent Advisor

Re: PCM+ V2.0 Questions/Thoughts

For Ted and others; just commenting on my own message here.

As I mentioned I deleted the 3Com switch which was stuck on red (unavailable) from PCM+ v2. I was having some trouble re-adding it and I ended up having to deal with another problem and leaving this.

In the meantime PCM+ v2 re-discovered the device itself and since it did it's been ok, that is green not red. So maybe I guess that some non-HP items which were discovered by PCM+ v1.6 don't get properly transferred over to v2. Just a guess. Anyhow I'll keep monitoring the situation and report if it changes.

Drew
Ted Nguyen
Advisor

Re: PCM+ V2.0 Questions/Thoughts

Drew,

Thanks a lot for your feedback. As you suspected, any devices with only SNMPv1 supported which were discovered with PCM 1.6 will have this "red" status indication issue when upgraded to PCM 2.0. This is because of the changes made to PCM 2.0 to support SNMPv3. In PCM 2.0, each device discovered is flagged to indicate which SNMP version is used to communicate with the device. The devices in PCM 1.6 do not have this version associated. The upgrade process from 1.6 to 2.0 defaulted the devices to SNMPv2, thus causing the issue you indicated.

So, the work-around is to delete any devices with "red" status indication and either manually discover them or wait for Discovery to rediscover the devices.

Hope this helps.
Regards,
Ted
Drew_38
Frequent Advisor

Re: PCM+ V2.0 Questions/Thoughts

Ted;

Thanks for confirmation of this. Well then it's not too much of a problem for me as I only had two devices in this state.

Regards,

Drew
Les Ligetfalvy
Esteemed Contributor

Re: PCM+ V2.0 Questions/Thoughts

Likewise for me, the few old Nortel 350 switches I have will be going away soon. I simply added them to the list of excluded devices along with the other SNMP devices that PCM 2.0 does not play nice with.