Operating System - HP-UX
1753401 Members
7402 Online
108792 Solutions
New Discussion юеВ

Re: new Superdome has crashed

 
Musa123
Advisor

new Superdome has crashed

hi,

Actually i got a tkt with below description.
" That RDBMS has now
crashed due to some strange reason and we need you to
investigate why тАж This may ultimately effect the production "

* Her i have paste some output on Logs pls help to solve this issuse

o CKM00101800493 is dead.
Jul 15 11:55:54 maahpxpe2p001 vmunix: emcp:Mpx:Error: Killing bus 2 to Clariion CKM00101800493 port SP B11.
Jul 15 11:55:54 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath202 to CKM00101800493 is dead.
Jul 15 11:55:54 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath210 to CKM00101800493 is dead.
Jul 15 11:55:54 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath208 to CKM00101800493 is dead.
Jul 15 11:55:54 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath203 to CKM00101800493 is dead.
Jul 15 11:55:54 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath214 to CKM00101800493 is dead.
Jul 15 11:55:54 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath199 to CKM00101800493 is dead.
Jul 15 11:55:54 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath212 to CKM00101800493 is dead.


Target path (class=tgtpath, instance=60) has gone online. The target path h/w path is 0/0/5/0/0/0/0.0x500601623b2449a4
Jul 15 11:56:22 maahpxpe2p001 vmunix:
Jul 15 11:56:22 maahpxpe2p001 vmunix: class : tgtpath, instance 63
Jul 15 11:56:22 maahpxpe2p001 vmunix: Target path (class=tgtpath, instance=63) has gone online. The target path h/w path is 0/0/5/0/0/0/0.0x5006016b3b2449a4
Jul 15 11:56:49 maahpxpe2p001 vmunix: 0/0/5/0/0/0/1: Fibre Channel Driver received Link Dead Notification.
Jul 15 11:56:49 maahpxpe2p001 vmunix: class : tgtpath, instance 71
Jul 15 11:56:49 maahpxpe2p001 vmunix: Target path (class=tgtpath, instance=71) has gone offline. The target path h/w path is 0/0/5/0/0/0/1.0x500601643b2449a4
Jul 15 11:56:49 maahpxpe2p001 vmunix: class : tgtpath, instance 74
Jul 15 11:56:49 maahpxpe2p001 vmunix: Target path (class=tgtpath, instance=74) has gone offline. The target path h/w path is 0/0/5/0/0/0/1.0x5006016d3b2449a4
Jul 15 11:56:49 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath216 to CKM00101800493 is dead.
Jul 15 11:56:49 maahpxpe2p001 vmunix: emcp:Mpx:Error: Killing bus 3 to Clariion CKM00101800493 port SP A12.
Jul 15 11:56:49 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath232 to CKM00101800493 is dead.
class : tgtpath, instance 71
Jul 15 11:57:17 maahpxpe2p001 vmunix: Target path (class=tgtpath, instance=71) has gone online. The target path h/w path is 0/0/5/0/0/0/1.0x500601643b2449a4
Jul 15 11:57:17 maahpxpe2p001 vmunix:
Jul 15 11:57:17 maahpxpe2p001 vmunix: class : tgtpath, instance 74
Jul 15 11:57:17 maahpxpe2p001 vmunix: Target path (class=tgtpath, instance=74) has gone online. The target path h/w path is 0/0/5/0/0/0/1.0x5006016d3b2449a4
Jul 15 12:01:11 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath192 to CKM00101800493 is alive.
Jul 15 12:01:11 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath182 to CKM00101800493 is alive.
Jul 15 12:01:11 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath188 to CKM00101800493 is alive.
Jul 15 11:57:17 maahpxpe2p001 vmunix:
Jul 15 12:01:11 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath183 to CKM00101800493 is alive.
Jul 15 12:01:11 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath186 to CKM00101800493 is alive.
Jul 15 12:01:11 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath184 to CKM00101800493 is alive.
: SSH: Server;Ltype: Version;Remote: 10.15.170.159-24472;Protocol: 2.0;Client: OpenSSH_3.9p1
Jul 15 12:27:14 maahpxpe2p001 sshd[11748]: error: PAM: Authentication failed for root from 10.15.170.159
Jul 15 12:27:32 maahpxpe2p001 sshd[11748]: Failed keyboard-interactive/pam for root from 10.15.170.159 port 24472 ssh2
Jul 15 12:27:39 maahpxpe2p001 sshd[11748]: Failed password for root from 10.15.170.159 port 24472 ssh2
Jul 15 12:27:32 maahpxpe2p001 sshd[11748]: error: PAM: Authentication failed for root from 10.15.170.159
Jul 15 12:30:27 maahpxpe2p001 above message repeats 2 times
Jul 15 14:36:18 maahpxpe2p001 sshd[16614]: SSH: Server;Ltype: Version;Remote: 10.239.101.126-2999;Protocol: 2.0;Client: PuTTY_Release_0.60
Jul 15 14:36:24 maahpxpe2p001 sshd[16614]: Accepted keyboard-interactive/pam for oracle from 10.239.101.126 port 2999 ssh2
Jul 15 15:36:19 maahpxpe2p001 sshd[16616]: SSH: Server;Ltype: Kex;Remote: 10.239.101.126-2999;Enc: aes256-ctr;MAC: hmac-sha1;Comp: none
Jul 15 16:22:26 maahpxpe2p001 vmunix: class : tgtpath, instance 60
Jul 15 16:22:26 maahpxpe2p001 vmunix: Target path (class=tgtpath, instance=60) has gone offline. The target path h/w path is 0/0/5/0/0/0/0.0x500601623b2449a4
Jul 15 16:22:26 maahpxpe2p001 vmunix:
Jul 15 16:22:32 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath176 to CKM00101800493 is dead.
Jul 15 16:22:32 maahpxpe2p001 vmunix: emcp:Mpx:Error: Killing bus 2 to Clariion CKM00101800493 port SP A10.
Jul 15 16:22:32 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath192 to CKM00101800493 is dead.
class : tgtpath, instance 63
Jul 15 16:30:09 maahpxpe2p001 vmunix: Target path (class=tgtpath, instance=63) has gone online. The target path h/w path is 0/0/5/0/0/0/0.0x5006016b3b2449a4
Jul 15 16:30:09 maahpxpe2p001 vmunix:
Jul 15 16:30:25 maahpxpe2p001 vmunix: class : tgtpath, instance 74
Jul 15 16:30:25 maahpxpe2p001 vmunix: Target path (class=tgtpath, instance=74) has gone offline. The target path h/w path is 0/0/5/0/0/0/1.0x5006016d3b2449a4
Jul 15 16:30:26 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath238 to CKM00101800493 is dead.
Jul 15 16:30:26 maahpxpe2p001 vmunix: emcp:Mpx:Error: Killing bus 3 to Clariion CKM00101800493 port SP B13.
Jul 15 16:30:26 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath240 to CKM00101800493 is dead.
Jul 15 16:30:25 maahpxpe2p001 vmunix:
Jul 15 16:30:26 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath255 to CKM00101800493 is dead.
Jul 15 16:30:26 maahpxpe2p001 vmunix: emcp:Mpx:Error: Path lunpath249 to CKM00101800493 is dead.
emcp:Mpx:Info: Restored volume 60060160C9102900F81B1F69277EDF11 to default: SPB
Jul 20 16:33:24 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath254 to CKM00101800493 is alive.
Jul 20 16:33:24 maahpxpe2p001 vmunix: emcp:Mpx:Info: Restored volume 60060160C91029005A5B07FE267EDF11 to default: SPB
Jul 20 16:33:24 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath250 to CKM00101800493 is alive.
Jul 20 16:33:25 maahpxpe2p001 vmunix: emcp:Mpx:Info: Restored volume 60060160C9102900B625D47E267EDF11 to default: SPB
Jul 20 16:33:25 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath252 to CKM00101800493 is alive.
Jul 20 16:33:25 maahpxpe2p001 vmunix: emcp:Mpx:Info: Restored volume 60060160C9102900FE434DBD267EDF11 to default: SPB
Jul 20 16:33:25 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath248 to CKM00101800493 is alive.
Jul 20 16:33:25 maahpxpe2p001 vmunix: emcp:Mpx:Info: Restored volume 60060160C910290096DD8E36267EDF11 to default: SPB
Jul 20 16:33:25 maahpxpe2p001 vmunix: emcp:Mpx:Info: Path lunpath255 to CKM00101800493 is alive.


can any one pls help!!!

Thanks in Advance!!!
6 REPLIES 6
Steven E. Protter
Exalted Contributor

Re: new Superdome has crashed

Shalom,

Was the OS pre-loaded or were you building it. If you were building were you in vpar build or npar stage.

Looks to me like a storage access issue. Fiber Channel Cable unplugged or an I/O card went bad.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
melvyn burnard
Honored Contributor

Re: new Superdome has crashed

Well you could supply a little more info here, but it appears as though something serious happened out on the SAN

If the system actuallt crashed, get the panic dump analysed by HP
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Bill Hassell
Honored Contributor

Re: new Superdome has crashed

It appears that someone may have made changes to your SAN environment, either the SAN switches or the underlying disks. A single change on a SAN switch or disk array can rip multiple disks out of the computer in a fraction of a second. Since a lot of the paths seem to have returned, the problem could also be an intermittent failure in the SAN or disk arrays.

It appears that the problem occurred on Jul 15 and was fixed on Jul 20. Check your change control documents to see what work may have been done during those periods.


Bill Hassell, sysadmin
Musa123
Advisor

Re: new Superdome has crashed

Actuallu 20/07/2010 Morning server got below logs..

Jul 20 09:32:36 maahpxpe2p001 vmunix: DIAGNOSTIC SYSTEM WARNING:
Jul 20 09:32:36 maahpxpe2p001 vmunix: The diagnostic logging facility has started receiving excessive
Jul 20 09:32:36 maahpxpe2p001 vmunix: errors from the I/O subsystem. I/O error entries will be lost
Jul 20 09:32:36 maahpxpe2p001 vmunix: until the cause of the excessive I/O logging is corrected.
Jul 20 09:32:36 maahpxpe2p001 vmunix: If the diaglogd daemon is not active, use the Daemon Startup command
Jul 20 09:32:36 maahpxpe2p001 vmunix: in stm to start it.
Jul 20 09:32:36 maahpxpe2p001 vmunix: If the diaglogd daemon is active, use the logtool utility in stm
Jul 20 09:32:36 maahpxpe2p001 vmunix: to determine which I/O subsystem is logging excessive errors.
Jul 20 09:32:36 maahpxpe2p001 vmunix: DIAGNOSTIC SYSTEM WARNING:
Jul 20 09:32:36 maahpxpe2p001 vmunix: The diagnostic logging facility is no longer receiving excessive
Jul 20 09:32:36 maahpxpe2p001 vmunix: errors from the I/O subsystem. 10 I/O error entries were lost.
Jul 20 09:38:34 maahpxpe2p001 sshd[633]: SSH: Server;Ltype: Kex;Remote: 10.47.12.171-3394;Enc: aes256-ctr;MAC: hmac-sha1;Comp: none
RAJD1
Valued Contributor

Re: new Superdome has crashed

>Jul 20 09:32:36 maahpxpe2p001 vmunix: DIAGNOSTIC SYSTEM WARNING:
Jul 20 09:32:36 maahpxpe2p001 vmunix: The diagnostic logging facility has started receiving excessive
Jul 20 09:32:36 maahpxpe2p001 vmunix: errors from the I/O subsystem. I/O error entries will be lost
- To debug above you need to go through few checks and analysis :
- How many partition are there in the superdome. # parstatus output. 
- Since when these errors are comming, What was the output #uptime , before crash.
- Is there any dump data in the /var/adm/crash/
- what  /etc/shutdownlog showing,
- Also what is the OS version , and the system model can give more information. 
- What is showing on MP logs, you can filter with error level 7 , then 5, Fatal then Criticals.
Hth,
Raj D.
mvpel
Trusted Contributor

Re: new Superdome has crashed

You should also make sure you have the latest FibrChanl driver bundles for the system. If you have an old enough version the drivers, there are some known issues that can cause LUNs to go offline for no apparent reason.

 

What HP-UX release is this, and what are your FibrChanl driver versions?

 

"swlist | grep FibrChanl"