1833356 Members
3401 Online
110051 Solutions
New Discussion

Re: problem with scsi resets- real problem or not?

 
Mark Vollmers
Esteemed Contributor

problem with scsi resets- real problem or not?

Hi, all. Got a question for you. Maybe I've asked this before; I don't remember, and I apologize if I have. I have two thinkgs that I have seen with STM every so often (once or twice a week) but I havn't seen any actual effect. First, I get message, run during backup, that there was a scsi reset on the raid drive. Backup gives no errors (there is a file that doesn't get backup now and then, but it isn't anything real important). I also get major warnings from stm concerning the two hard drives (which are mirrored) (oddly enough, I only got one before I mirrored, now I get two, one per drive). I will attach the syslog that has the stm error messages as well as the normal stuff. I wonder if these are just issues between the RAID (old driver untested on 11.0 but shows no problems) and the OS during backups or if there are actual issues. I'd hate to have to go beg for 20K to replace the drive if it wasn't neccesary (actually, I'd love to replace it with a hp model, but back to the money issue...). as always, any thoughts appreciated! Thanks

Mark
"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"
8 REPLIES 8
James R. Ferguson
Acclaimed Contributor

Re: problem with scsi resets- real problem or not?

Hi Mark:

Of concern to me is that you have some power problems -- your UPS going on and off battery and some power spikes seen. I'd have your UPS inspected.

Regards!

...JRF...
S.K. Chan
Honored Contributor

Re: problem with scsi resets- real problem or not?

I've seen this before .. and ended up replacing the disk. However there can be a few things causing this ..
1) cables, scsi controller, firmware, disk failure, patches, timeout too fast.
I was lucky, for me it's a disk problem.

A. Clay Stephenson
Acclaimed Contributor

Re: problem with scsi resets- real problem or not?

Hi Mark,

I would use pvchange -t to increase the timeout on your array. This is a common requirement on arrays. Typically, the timeout values are simply set to the default. You can use pvdisplay -v to show the current IO Timeout value. A setting of about 60 seconds might be a good place to start. However, I do not like to see the power spikes. I would have an electrician come out and monitor your lines.
If it ain't broke, I can fix that.
Sanjay_6
Honored Contributor

Re: problem with scsi resets- real problem or not?

Mark Vollmers
Esteemed Contributor

Re: problem with scsi resets- real problem or not?

James-

I guess that I should have prefaced some of this. The UPS spikes can be attributed to when the AC kicks on in the morning for the room where the servers are located. Never was an issue as far as losing power, it's just that the UPS picks up the momentary spike. Our building's wired backasswards.

Clay-

I have already set the pvchange out to 180 due to other reset problems. I'm not sure what it's hitting when it resets, since I can't find a pattern that causes it (time, day, etc). Only that it happens during backup.

S.K.-
I did the custom patch bundle a few months ago, so I'm pretty good there, but I could look again. The SCSI cable on the drive was changed three or four months ago. The controller has always been an issue, but this falls under the category of "old hardware that is not verified for new OS". There have been no performance issues (crashes, data loss, etc).

Mark
"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"
Mark Vollmers
Esteemed Contributor

Re: problem with scsi resets- real problem or not?

Sanjay-

I've run fsck both during reboot and manually in single users mode (not recently, but this problem has been around for a while) and it either finds no errors or finds a few inode issues and fixes them. Whether or not I have a reoccuring issue that I fix and it happens again I don't know.

Mark
"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"
S.K. Chan
Honored Contributor

Re: problem with scsi resets- real problem or not?

Hi Mark,
I looked at the log file one more time and I realized, this might not be due to any disk failure because the whole bus was reset. Two things ..
1) Ignore it because bus resets can happen on a normal system due to timeouts or delays. This is provided there is no other indication of disk failure.
2) Problem with your Logtool (Online Diag.), I would suggest getting it updated to the latest version.

rgds
Mark Vollmers
Esteemed Contributor

Re: problem with scsi resets- real problem or not?

Thanks for the thoughts so far. I'll go ahead and grab the latest Online Diag. and install it and I'll probably take a look at updating the patches as well. I'm thinking that it's just an imaginary problem right now, unless anyone else thinks it's somethine else. Thanks, all.
"We apologize for the inconvience" -God's last message to all creation, from Douglas Adams "So Long and Thanks for all the Fish"