Operating System - HP-UX
1833007 Members
3265 Online
110048 Solutions
New Discussion

Re: Server Crashing, need help

 
SOLVED
Go to solution
Kevin Farrell_4
Frequent Advisor

Server Crashing, need help

The past 2 weeks, we come in in the AM and the server is unresponsive. We can ping it but can't log in to it via telnet, never get the prompt to log in. We power cycle the machine and start up the web and oracle form server and all is well. Here is the log from OLDsyslog.log

Any help is appreciated.
Sep 1 04:00:00 falcon su: + tty?? root-piltmgr
Sep 1 04:41:06 falcon vmunix: DIAGNOSTIC SYSTEM WARNING:
Sep 1 04:41:06 falcon vmunix: The diagnostic logging facility has started re
ceiving excessive
Sep 1 04:41:06 falcon vmunix: errors from the I/O subsystem. I/O error entr
ies will be lost
Sep 1 04:41:06 falcon vmunix: until the cause of the excessive I/O logging i
s corrected.
Sep 1 04:41:06 falcon vmunix: If the diaglogd daemon is not active, use the
Daemon Startup command
Sep 1 04:41:06 falcon vmunix: in stm to start it.
Sep 1 04:41:06 falcon vmunix: If the diaglogd daemon is active, use the logt
ool utility in stm
Sep 1 04:41:06 falcon vmunix: to determine which I/O subsystem is logging ex
cessive errors.
Sep 1 04:41:15 falcon vmunix: SCSI: Request Timeout -- lbolt: 30598580, dev: 1f
020000
Sep 1 04:41:15 falcon vmunix: lbp->state: 4060
Sep 1 04:41:15 falcon vmunix: lbp->offset: ffffffff
Sep 1 04:41:15 falcon vmunix: lbp->uPhysScript: f87ba000
Sep 1 04:41:15 falcon vmunix: From most recent interrupt:
Sep 1 04:41:15 falcon vmunix: ISTAT: 22, SIST0: 00, SIST1: 04, DSTAT:
00, DSPS: f87ba580
Sep 1 04:41:15 falcon vmunix: lsp: 0000000043d95d00
Sep 1 04:41:15 falcon vmunix: bp->b_dev: 1f020000
Sep 1 04:41:15 falcon vmunix: scb->io_id: 26a0a88
Sep 1 04:41:15 falcon vmunix: scb->cdb: 2a 00 04 e0 47 50 00 00 10 00
Sep 1 04:41:15 falcon vmunix: lbolt_at_timeout: 30595449, lbolt_at_sta
rt: 30595449
Sep 1 04:41:15 falcon vmunix: lsp->state: 10d
Sep 1 04:41:15 falcon vmunix: lbp->owner: 0000000043d95d00
Sep 1 04:41:15 falcon vmunix: scratch_lsp: 0000000000000000
Sep 1 04:41:15 falcon vmunix: Pre-DSP script dump [fffffffff87ba020]:
Sep 1 04:41:15 falcon vmunix: 00000000 00000000 41000000 f87ba290
Sep 1 04:41:15 falcon vmunix: 78347100 0000000a 78350800 00000000
Sep 1 04:41:15 falcon vmunix: Script dump [fffffffff87ba040]:
Sep 1 04:41:15 falcon vmunix: 0e000004 f87ba580 e0100004 f87ba7c4
Sep 1 04:41:15 falcon vmunix: 870b0000 f87ba2d8 0a000000 f87ba588
Sep 1 04:41:15 falcon vmunix: SCSI: Abort abandoned -- lbolt: 30598580, dev: 1f
020000, io_id: 26a0a88, status: 200
Sep 1 04:41:16 falcon vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1f020000) is POWERFA
ILED
root(falcon)/var/adm/syslog:
#

12 REPLIES 12
Steven E. Protter
Exalted Contributor
Solution

Re: Server Crashing, need help

You have a dead disk.

Based on how your system is acting, I'd say its a boot disk or there is not a quorum on the volume group.

The fact that you have a powerfail might help diagnosis.

See if there is a disk that should be lit that is not. Check its cables and make sure if its a hot swap that it is firmly in its spot.

There are hardware diagnostics you can use such as sea and such by interputing at the console on the boot 10 second prompt, that can let you compare to documentation and nail down which disk is bad.


SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Rajesh SB
Esteemed Contributor

Re: Server Crashing, need help

Hi Kevin,

It sounds like one of server Hard disk is becoming faulty. This is the symptom of disk failing.

First you backup the server!

Run OnlineDiag tool "stm" and verify for the faulty disk.

There is a hint in the log about fault disk.
i.e. falcon vmunix: LVM: vg[1]: pvnum=0 (dev_t=0x1f020000)

Wish you good luck.

Regards,
Rajesh
TwoProc
Honored Contributor

Re: Server Crashing, need help

I think you're problem is an lbolt from either a single disk, or the whole scsi controller.

The address 1f020000

"1f" is 31 major dev number - which is sdisk.

"02" is card number -

So you're problem is near/at device:
/dev/dsk/c2t0d0

This disk is probably going bad, or the scsi controller, or maybe it's a termination issue for the bus(I had one of these just last week).

My bet is on the disk at /dev/dsk/c2t0d0 (mainly b/c of the complaints at the bottom by LVM).

Do an "lssf /dev/dsk/c2t0d0" to get the hardware path to the disk.
We are the people our parents warned us about --Jimmy Buffett
Doug O'Leary
Honored Contributor

Re: Server Crashing, need help

>>Sep 1 04:41:15 falcon vmunix: SCSI: Request Timeout -- lbolt: 30598580, dev: 1f
020000

Hey; Based on that lbolt address, your c2t0d0 disk went south. That's probably a boot drive. Someone will end up posting a link to the docuemnt "when good disks go bad" or something like that, which will help you get your vg back up to date when you replace the disk.

HTH;

Doug

------
Senior UNIX Admin
O'Leary Computers Inc
linkedin: http://www.linkedin.com/dkoleary
Resume: http://www.olearycomputers.com/resume.html
Kevin Farrell_4
Frequent Advisor

Re: Server Crashing, need help

Yes, HP is coming to replace it. thanks for the help

Kevin
Doug O'Leary
Honored Contributor

Re: Server Crashing, need help

Here's a link to a thread that discusses the document that I mentioned...

Good luck.

Doug

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=926691

------
Senior UNIX Admin
O'Leary Computers Inc
linkedin: http://www.linkedin.com/dkoleary
Resume: http://www.olearycomputers.com/resume.html
Pedro Cirne
Esteemed Contributor

Re: Server Crashing, need help

Hi Kevin,

Is this hapenning every day in the last weeks around 04:00 AM?

I agree that the problem should be in one of the disks...but if it's true this is happening every day at 04:00 that's a strange thing...do you have external disks on this server?

Please post:

#vgdisplay -v vg01

Enjoy :)

Pedro
RAC_1
Honored Contributor

Re: Server Crashing, need help

You have disk provblems. The disk in question is (dev_t=0x1f020000)

ll /dev/dsk | grep -i 020000
The disk should be c2t0d0. The disk has gone bad and is creating problem.

when you have such system problems, instead of system reboots, you should do TC from GSP, so that you can do analysis later on.
There is no substitute to HARDWORK
Kevin Farrell_4
Frequent Advisor

Re: Server Crashing, need help

Yes, we do have an external array. I forget the name of it.



#vgdisplay -v vg01
--- Volume groups ---
VG Name /dev/vg01
VG Write Access read/write
VG Status available
Max LV 255
Cur LV 2
Open LV 2
Max PV 16
Cur PV 1
Act PV 1
Max PE per PV 17501
VGDA 2
PE Size (Mbytes) 4
Total PE 17499
Alloc PE 17499
Free PE 0
Total PVG 0
Total Spare PVs 0
Total Spare PVs in use 0

--- Logical volumes ---
LV Name /dev/vg01/lvol
LV Status available/sync
LV Size (Mbytes) 65900
Current LE 16475
Allocated PE 16475
Used PV 1

LV Name /dev/vg01/lvsw
LV Status available/sync
LV Size (Mbytes) 4096
Current LE 1024
Allocated PE 1024
Used PV 1


--- Physical volumes ---
PV Name /dev/dsk/c2t0d
PV Status available
Total PE 17499
Free PE 0
Autoswitch On


root(falcon)/var/adm:
#
Pedro Cirne
Esteemed Contributor

Re: Server Crashing, need help

Hi,

If this is happening every day around 04:00...and if c2t0d0 is in your external storage...I think that every night something is cuting power to the storage, I had this mistery here :-))

Enjoy :)

Pedro
Kevin Farrell_4
Frequent Advisor

Re: Server Crashing, need help

I believe that disk is internal, that's what HP on the phone said? How do I tell? Sorry I'm an oracle DBA, not a unix guy, but unfortunately, I'm the unix guy too, LOL

#ioscan -fnC disk
Class I H/W Path Driver S/W State H/W Type Description
=====================================================================
disk 0 0/0/1/1.2.0 sdisk CLAIMED DEVICE SEAGATE ST173404LC
/dev/dsk/c1t2d0 /dev/rdsk/c1t2d0
disk 3 0/0/2/0.0.0 sdisk CLAIMED DEVICE SEAGATE ST173404LC
/dev/dsk/c2t0d0 /dev/rdsk/c2t0d0
disk 1 0/0/2/0.2.0 sdisk CLAIMED DEVICE SEAGATE ST173404LC
/dev/dsk/c2t2d0 /dev/rdsk/c2t2d0
disk 2 0/0/2/1.2.0 sdisk CLAIMED DEVICE HP DVD-ROM 305
/dev/dsk/c3t2d0 /dev/rdsk/c3t2d0
root(falcon)/var/adm:
Borislav Perkov
Respected Contributor

Re: Server Crashing, need help

Hi Kevin,

Here is the thread link how can you decode your device file from lbolt error device identification. then you can find which disk it is.
http://forums1.itrc.hp.com/service/forums/questionanswer.do?admit=716493758+1093012932131+28353475&threadId=219110
Regards,
Borislav