- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- superdome partitions down
Categories
Company
Local Language
Forums
Discussions
Knowledge Base
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2007 01:22 AM
01-25-2007 01:22 AM
I'm having a HP9000 superdome with 3 partitions.
Suddenly two of the partition has gone down and one of the partitions console log had this following error.
**********************Auto-Port Aggregation/9000 Networking*****************@#%
Thu Jan 25 GMT 2007 12:38:43.547231 DISASTER Subsys:HP_APA Loc:00000
<1006> HP Auto-Port Aggregation product found that ports in failover
group lan901 are no longer connected to each other. Port 2 did
not receive any poll packets.
(N)ext or
And strangely i'm not able to take control f the partitiond thru GSP also.
I reseted one of the partition but it stops at this point.
"Start host agent" .
Anyhelp is appreciated.
Thanks,
Siva
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2007 01:42 AM
01-25-2007 01:42 AM
Re: superdome partitions down
I think you want to boot the system in single user mode, comment or rename APA startup script(/sbin/init.d/hpapa), once the server is up you want to access the server thro the console and try to start the APA script and find out why it is failing.You might laso want to look at the 2 apa scripts in /etc/rc.config.d
hp_apaconf
hp_apaportconf
Rgds
HGN
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2007 11:17 PM
01-25-2007 11:17 PM
Re: superdome partitions down
The partitions are up and infact there was a network (Switch )problem but that doesnt solve all my woes.
When i was not able to connect to the partitions i tried to log into them using GSP.
1. But strangely i was able to login intio only one partition and i didnt get the Console login for other two partitions.
2. The VFP showed all three partitions having heartbeat
3. parstatus showed all three partitions as active
4. Thinking that the partitions have hung i resetted one of the partition (I used RS and not TOC , so no dumps available) it unusually stuck at ane point saying "Staring host agent" but the machine came up after almost 40 minutes by which time the network problem was resolved. Interestingly the host agent service was the last one started in the boot process which i was ale to see from rc.log
5. My worry is if there is a network problem how can GSP cannot allow me loggin into two of the partitions.
6.What are all the possibilities which prevents from logging into the partitions from GSP.
7. Is there any relation between this GSP problem and the APA errors logged?
Lanconfig file:
===============
--> /etc/lanmon/lanconfig.ascii:
# ********************************************************
# *********** LAN MONITOR CONFIGURATION FILE *************
# *** For complete details about the parameters and how **
# *** to set them, consult the lanqueryconf(1m) manpage **
# *** or your manual. **
# ********************************************************
NODE_NAME taapup01
POLLING_INTERVAL 10000000
DEAD_COUNT 3
FAILOVER_GROUP lan900
STATIONARY_IP 10.179.3.102
PRIMARY lan0 5
STANDBY lan1 3
FAILOVER_GROUP lan901
STATIONARY_IP 10.179.1.102
PRIMARY lan2 5
STANDBY lan3 3
APA Statistics:
===============
LAN INTERFACE STATUS DISPLAY
Fri, Jan 26,2007 07:58:58
PPA Number = 900
Description = lan900 Hewlett-Packard LinkAggregate Interface
Type (value) = ethernet-csmacd(6)
MTU Size = 1500
Speed = 100000000
Station Address = 0x306e4a54cc
Administration Status (value) = up(1)
Operation Status (value) = up(1)
Last Change = 7714
Press
Inbound Octets = 2147877465
Inbound Unicast Packets = 3904252297
Inbound Non-Unicast Packets = 124049569
Inbound Discards = 22366
Inbound Errors = 0
Inbound Unknown Protocols = 740649
Outbound Octets = 1529491229
Outbound Unicast Packets = 4183298218
Outbound Non-Unicast Packets = 16527
Outbound Discards = 0
Outbound Errors = 0
Outbound Unknown Protocols = 0
Specific = 0
LAN INTERFACE STATUS DISPLAY
Fri, Jan 26,2007 07:58:58
PPA Number = 901
Description = lan901 Hewlett-Packard LinkAggregate Interface
Type (value) = ethernet-csmacd(6)
MTU Size = 1500
Speed = 100000000
Station Address = 0x306e2d2a8f
Administration Status (value) = up(1)
Operation Status (value) = up(1)
Last Change = 7733
Press
Inbound Octets = 1202288746
Inbound Unicast Packets = 4088047809
Inbound Non-Unicast Packets = 1236824993
Inbound Discards = 720235378
Inbound Errors = 64
Inbound Unknown Protocols = 1095399
Outbound Octets = 2734665484
Outbound Unicast Packets = 4260250021
Outbound Non-Unicast Packets = 1020856
Outbound Discards = 0
Outbound Errors = 0
Outbound Unknown Protocols = 0
Specific = 0
Please note the second PPA is showing 64 inbound errors.
Any suggestions what might have caused this GSP issue.
Thanks,
Siva
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2007 11:26 PM
01-25-2007 11:26 PM
Re: superdome partitions down
The GSP may need to be reset or could have been impacted by the switch issue.
Most GSP's will show up on cstm and can be tested for hardware problems.
I think a total reset and hardware test on the GSP should be sufficient. If the GSP is flakey or fails the hardware test, have it replaced.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2007 11:40 PM
01-25-2007 11:40 PM
Re: superdome partitions down
The GSP was not frozen or hung, infact i was log into the GSP , do all the things like collecting the logs , viewing vfp , doing everything , even it allowed me console login for one of the partition , the thing is it it didnt give me the console login prompt for only two partitions.
If it was a switch problem or whatever it is it should not have allowed console login for any of the partition right?
And strangely it is giving me the console login prompt after the switch issue was resolved.
Definitely there seems to be a realtion with the Switch issue.. But how to relate it ? That too with GSP which i hope never uses the external network interfaces connected to the switch to collect information of the partition.
Aah!!! . now i have i a doubt?
1. I believe GSP doesnt use the network interfaces to collect the information about the partition but
2. How do the GSP allow login to the partitions? IS it through the network interfaces which are normally used to connect to the partitions or is there any other funda involved?
Thanks,
Siva
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-26-2007 12:41 AM
01-26-2007 12:41 AM
SolutionI have come across a similar issue before. When the switch had a problem, something on those two partitions may have been causing the OS to become unresponsive. If the OS is unresponsive, the console will appear unresponsive. This would explain why the VFP indicated there was an OS heartbeat.
The trick is to determine the cause. First you would want to determine if the OS was starved for memory, etc.
Since you don't have a crash dump to work with, try checking any historical performance data using 'extract' (if measureware is installed and was running) or 'sar' (if sar is constantly collecting data).
David
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-26-2007 01:00 AM
01-26-2007 01:00 AM
Re: superdome partitions down
at that time , the dmesg on one of the partition was reporting /tmp as full.
I would look
into the memory and other resource issue.
Thanks for guiding me.
Thanks,
Siva