1832869 Members
3516 Online
110048 Solutions
New Discussion

Vparmon

 
Lai Nee Shyang_1
Frequent Advisor

Vparmon

Hi there,

My server (RP7420) running 3 vpars crash this week out of the blues. The HP support folks found no HW errors nor OS/SW errors than can lead to crashing of the entire complex.

Apart from sharing the same HW among the 3 vpars, am I right to say they run on the same vparmon. As such, is there anyway I can check if the vparmon detects any anormalies ?

Anyone any idea. Thanks in advance for sharing.


Lai
If it doesn't work, We'll make it work. If it works, We'll make it work better.
12 REPLIES 12
Ted Buis
Honored Contributor

Re: Vparmon

Any tombstones?
Mom 6
Lai Nee Shyang_1
Frequent Advisor

Re: Vparmon

Yup, there are tombstones on all three vpars.
If it doesn't work, We'll make it work. If it works, We'll make it work better.
Ted Buis
Honored Contributor

Re: Vparmon

Did HP support analyze the tombstones?
Mom 6
Lai Nee Shyang_1
Frequent Advisor

Re: Vparmon

Yes, they did, but they can't find anything interesting.
If it doesn't work, We'll make it work. If it works, We'll make it work better.
Ted Buis
Honored Contributor

Re: Vparmon

I would ask HP support to help you set up the system so that if this ever happened again, they would be able to help isolate the problem from crash dumps, if they can't from your present configuration. If the environment got too hot all partitions would shutdown. Likewise if the sensor thought it got too hot, it would behave the same way. I suppose a power spike could have a similar effect. There could be a hardware problem that the diagnostics didn't detect, and there could be a software bug. Did you request an escalation?
Mom 6
Lai Nee Shyang_1
Frequent Advisor

Re: Vparmon

Yes, it is escalated to their next higher level.
According to their initial findings, they don't think it is a HW problem, or overheating. But they can't find any fault in the OS.

That's why it prompts me to think about vparmon. Any idea if vparmon fails, there's any logs or any form of indication ?

Thanks man.

Lai
If it doesn't work, We'll make it work. If it works, We'll make it work better.
Ted Buis
Honored Contributor

Re: Vparmon

Do you have any hard partition configured? There is some hardware isolation for nPars. You don't get as much flexibility, but you might put 1 instance in one hard partition and 2 instances in the other hard partition. That would help prevent all three from going down together.
Mom 6
Ted Buis
Honored Contributor

Re: Vparmon

I don't know about vparmon logs/dump, but WTEC should, if it went that high, and I would expect it did. Likely they have debug versions that they could provide to help capture the issue, but those slow performance and maybe it isn't worth it, but if it happens again then it might be worthwhile. There is also the chance of intermittent hardware problems. These are the worst to isolate, but they typically increase in frequency until there is a constant failure condition. Another good reason to cut the system into two hard partitions, assuming you have two cell boards.
Mom 6
Lai Nee Shyang_1
Frequent Advisor

Re: Vparmon

Hi Ted,

I'm not sure if it reach WTEC yet, but if they can't provide any findings, I'm quite sure I'll be hearing from WTEC soon enough.

U have a point on spliting the 3 VPARs onto 2naprs. I've to check if it is viable to switch the system.

thanks for your analysis and recommendations Ted. Have a nice day.

Lai
If it doesn't work, We'll make it work. If it works, We'll make it work better.
Devesh Pant_1
Esteemed Contributor

Re: Vparmon

Do you have icod 6.0 or 6.1 running along with patch PHCO_29832 installed. If yes, this could be a major problem. To resolve the problem patch PHCO_29832 needs to be removed. It can be installed again after 6.02 of ICOD is installed.

thanks
Devesh
David Child_1
Honored Contributor

Re: Vparmon

Lai,

Did HP or you check the server logs? Maybe something there could help HP close in on the root cause.

In case you are not familiar with it, here is how you can access the logs;

1. log into the GSP
2. GSP> SL
3. examine the (E)rror logs

They are basically impossible for us to read, but if you can find some entries for the date/time when the crashes occured you should send them to HP for analysis.

Also, check the GSP date. It may not match the OS date/time in which case you will need to take that into consideration when looking for entries.

GSP> date

David
Lai Nee Shyang_1
Frequent Advisor

Re: Vparmon

Hi Devesh,
Thanks for your reply. No, my system doesn't have Icod install.

Hi David Child,
Yes, the HP folks did check the GSP->SL logs. I was there when they did the check, according to them, there's no entires log prior and during the server crash.

Thanks.

Lai
If it doesn't work, We'll make it work. If it works, We'll make it work better.