- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Rough weekend: Looking for diagnostic help
Categories
Company
Local Language
Forums
Discussions
Knowledge Base
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Knowledge Base
Forums
Discussions
- Cloud Mentoring and Education
- Software - General
- HPE OneView
- HPE Ezmeral Software platform
- HPE OpsRamp
Knowledge Base
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 05:33 AM
03-22-2004 05:33 AM
Rough weekend: Looking for diagnostic help
9:56 a.m. Saturday morning, D380/2 drops of the network according to snmp. Its the only box being monitored but it came back 4 seconds later according to snmp. Thats the time it was pingable until 10 a.m. this morning.
This boxes console locked up and had to be reset with the old power off hold down the d at power up trick.
There was nothing useful in the syslog.log or OLDsyslog.log on any of my 4 systems.
At 2:00 a.m. Monday all Veritas backups failed on a network timeout. The L class boxes remained on the network and were user accessible on Monday morning.
At 9:30 a.m. Saturday morning, the building next door cut its own power due to a construction accident.
Our switches so no record of an event or power cycle.
I have run mstm excercize on all relavent hardware and it checks out perfectly fine. Maybe I have a flakey fiber card on one box.
So.
Given the same set of circumstances, what would you do next?
It seems obvious there was a power problem but no Windows servers were affected, only my serial console. Because of the nature of the problem next door, nobody believes there was a voltage drop or surge in this building.
I'm, stumped.
What would you do next.
Points awarded for all suggestions, even if I already tried it an forgot to post it.
Bunnies for the step or steps that unravels this mystery.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 05:48 AM
03-22-2004 05:48 AM
Re: Rough weekend: Looking for diagnostic help
1)I suppose all your equipment on UPS.
Are all on same UPS ?
2) "Monday all Veritas backups failed on a network timeout"
What is the defined timeout ?
Are sure the 2 events are related or is it speculation ?
Regards,
Jean-Luc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 05:50 AM
03-22-2004 05:50 AM
Re: Rough weekend: Looking for diagnostic help
Also, I would check on the Fibre Card types and see if there is a newer FC patch. I would do some analysis on the Netbackup Logs, to see if there is any hangs on media or timeouts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 05:58 AM
03-22-2004 05:58 AM
Re: Rough weekend: Looking for diagnostic help
Can't do TOC, because its a production server. It is possible someone was fiddling around and hit that button, but it would not have effected other servers.
Jean:
1) The equipment is on different UPS, the D380's UPS may have a little more hardware on it than the UPS is rated to handle.
2) Network timeout is defined as Veritas trying to back up for 5 minutes. That is user defined. We believe the problem is related. The Console is on building power. The one next to it was rendered useless by me swittching the L2000 boxes to web consoles.
The L2000 boxes were off the network as far as Veritas was concerned but were user accessible by users.
Seem's like we're headed to an unsolvable head scratcher. I have the Veritas man and the Network man trolling for more data.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 06:49 AM
03-22-2004 06:49 AM
Re: Rough weekend: Looking for diagnostic help
Ctrl-B (I think) on the console and check for errors in there? On each of your servers.
Been awhile since I have been on a D server but I think you can still do that.
There could be something in those logs.
also check the master and client logs for veritas, sometimes there is some useful info hidden in with the junk.
In my experience with netbackup if you got error 54 I think thats one network timeout error. Seems to just be a catchall.
I had problems with that error even when the network had nothing to do it.
Not sure how your setup there, which server is your backup master. maybe recycle the netbackup daemons on all your servers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 06:53 AM
03-22-2004 06:53 AM
Re: Rough weekend: Looking for diagnostic help
I am checking the others.
Bunny potential.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 06:56 AM
03-22-2004 06:56 AM
Re: Rough weekend: Looking for diagnostic help
UPS worked at least on the L boxes.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 09:43 AM
03-22-2004 09:43 AM
Re: Rough weekend: Looking for diagnostic help
We had a similar unexplainable incident a few years ago on a V2600. We eventually traced it to (or at least blamed it on) the security people. They did a full port scan against the production network (without telling anyone) and completely hosed it. The V2600 dumped core and crashed. Of course they blamed us for having downrev/unpatched systems, but that is just finger-pointing. Check the syslog again for broken pipes and buffer overflows. Even one or two might indicate some sort of attack--even if unintentional.
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 09:53 AM
03-22-2004 09:53 AM
Re: Rough weekend: Looking for diagnostic help
On all your HP boxes that were it with this network timeout, did the nettl log show anything? The nettl log is located in /var/adm and is called nettl.LOG000....To read this simply do a netfmt -t 10 -v nettl.LOG000....see if there is any useful info there about the disconnects....I would see if there are time matches between the servers to see if you can pinpoint an exact time that this occured...Just grasping for straws;)
Geno
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-22-2004 11:04 AM
03-22-2004 11:04 AM
Re: Rough weekend: Looking for diagnostic help
The box in question had a 12 month freeze on software upgrade in hopes that it would be pulled from production. I'm stuck with it for another year and patched it to December 2003 a week ago and will be upgrading applications and setting up EMS traps to get more data.
I think its unlikely anyone ran a portscan, we're pretty much a Sabbath observant Jewish business. My manager was on premesis just after the initial event but insists nothing was touched and what he and his crew were doing could not have caused a problem.
I have no evidence this time, but will set up for evidence next time. Points will be awarded shortly.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com