Operating System - HP-UX
1825512 Members
1850 Online
109681 Solutions
New Discussion юеВ

Re: specific disk fail again and again

 
SOLVED
Go to solution
KCS_1
Respected Contributor

specific disk fail again and again

Hi,all

please, help me.
I running hpux11.0(32bit)on one D380 server box.
a month ago, I changed my server's internal disk(not bootable disk) cause by when i was running "ioscan" command received "NO_HW" and some messages about this disk fail in /var/opt/resmon/log/event.log and syslog. also adjusted pvchange command's timeout parameter (pvchange -t 180)
but, still ocurred the problem again and again that time stamp have some period.

i am writting more deatail sympton under:

when i have disk failed some message from the system, put in and put out the disk,again at once. at that time, that disk is very normal without any logs and message in the box.
(os: 11.0,STM:A.30.00,EMS:A.03.20,FW:38.40,Disk fw:HP05)

how can i do ?

i am waiting some solutions from specialist.

Happy Cristams!!
Easy going at all.
11 REPLIES 11
Michael Tully
Honored Contributor

Re: specific disk fail again and again

Hi Patrick,

Please post a copy of

# ioscan -fnC disk

Regards
Michael
Anyone for a Mutiny ?
KCS_1
Respected Contributor

Re: specific disk fail again and again

in addition,

i have been changed my disk's location to new another bay in the box.but, still going above mentioned like that.

Easy going at all.
Michael Tully
Honored Contributor

Re: specific disk fail again and again

Please post it anyway, stating which is the old path (if it is reported) and the new path.
Anyone for a Mutiny ?
KCS_1
Respected Contributor

Re: specific disk fail again and again

sorry,,Tully!

I don't have logs include ioscan log..

Easy going at all.
Michael Tully
Honored Contributor

Re: specific disk fail again and again

Post the current output of 'ioscan -fnC disk'
Anyone for a Mutiny ?
Animesh Chakraborty
Honored Contributor

Re: specific disk fail again and again

Hi,
It looks like a loose contact problem.
Try changing SCSI cable and card too.

Did you take a backup?
Gerhard Roets
Esteemed Contributor

Re: specific disk fail again and again

Patrick,

Looking at your problem.

1.The disk stopped working.
2. You downed the machine reseated the disk and braught the machine back up.
3.The disk keeps failing.

Well if you never changed the disk. I would start looking that way that disk seems to be in its "last legs".

hth
Gerhard Roets
Eugeny Brychkov
Honored Contributor
Solution

Re: specific disk fail again and again

Patrick,
if would you like us to help you then please post the following outputs here (as attachment, merge them in one file)
- 'ioscan -fn';
- 'diskinfo -v' for 'bad' disk;
- recent EMS messages (from syslog.log);
- if 'dd if=/dev/rdsk/cXtYdZ of=/dev/null bs=4096k' completed successfully for 'bad' disk, without I/O error reported;
- output of STM information for 'bad' disk: run 'cstm', type 'map', note disk device number # (first column), type 'sel dev #', type 'info', type 'il', look till the end of output and write it to the file hitting 's' key
Eugeny
Steven E. Protter
Exalted Contributor

Re: specific disk fail again and again

I am an extensive vetran of these kind of issues on a D320 box that went through hell. I still run a D380 but its been reliable.

Possible problems:

1) Another disk unused in the drive cage or up or down the SCSI chain thats totally dead. This can cause lbolts on the disk you're working with which might not be bad.

2) Bad drive cage.
There is a part, like a metal box that holds the drive enclosure on the D380's internal drives. It provides power and a connection to the SCCI cable on D class servers. It can be bad. sometimes it takes a little pressure to get HP to replace it.

3) SCSI cable is going, or bent pins.

4) SCSI controller card or pin settings are wrong.

5) Internal problem with the drive itself.

This list is not in priority order. Since the internal drives are hot swappable, have hardware support send you a new disk. If you don't have hardware suppport, get some money and buy another one. If the problem continues, investigate the cause I outline above.

Good luck.
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
T. M. Louah
Esteemed Contributor

Re: specific disk fail again and again

As mentioned above, disk dump 'dd' is a good test, you should get the same nbr of rec in & out. I would investigate syslog & OLDsyslog for scsi resets or power fail msg:
# grep -Ei "scsi|power|lbolt" /var/adm/syslog/syslog.log
an output similar to: ... dev_T 0x01F021000
this can represent a specific disk c2t1d0
for example, if pvchange was run on this disk & syslog still report issues about it you d rather replace it.
One other command to run is:

# diskinfo /dev/rdsk/c#t#d#

It might return the right info (Vedor ID ..etc.) but zero in size (not byte per sector), this is definitely indicating a bad disk.

Last use cstm --> ru --> logtool option --> rs , now check the number of IO errors & for which device they've occured. You can reset the logN.raw.cur to start loggin new errors by typing SL (for SwitchLog) at logutility prompt.

Cheers,
T??
Little learning is dangerous!
KCS_1
Respected Contributor

Re: specific disk fail again and again

I cound't posted any logs because I could collected system logs that ioscan,EMS,syslog.log,dmesg,STM ..etc into DAT tape on my bad system of remote site . but, DAT is broken. I won't connect to remote machine again cause my customer may be don't open for their security.

anyway,,thanks..i will try to any above posted solutions.

Really Really sorry for without any logs hope to help from someone. expecially, Tully

Happy Cristmas!!
Easy going at all.