- Community Home
- >
- Servers and Operating Systems
- >
- HPE ProLiant
- >
- ProLiant Servers (ML,DL,SL)
- >
- Re: Smart Array 641 Issues (Proliant ML350 G4)
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2012 07:52 AM - edited 03-13-2012 08:43 AM
03-13-2012 07:52 AM - edited 03-13-2012 08:43 AM
Smart Array 641 Issues (Proliant ML350 G4)
Hello-
I'm running an old ML350 G4 server with Windows SBS 2003 and a Smart Array 641 controller on firmware 2.34 B (I know that there is newer firmware but until just recently I've had no problems). I've experienced the following sequence of events twice now in the past month:
Event Log
--------------
19:26 - The device, \Device\Scsi\cpqcissm1, did not respond within the timeout period.
19:27 - Drive Array Physical Drive Status Change. The physical drive in Slot 3, SCSI Port 1 Drive 5 with serial number "XXX", has a new status of 3.
19:27 - Physical Drive on DEVICE ID 5 on Port 1 of Array Controller in slot 3 has failed. Failure Code: 0x07
19:45 - Environment Abnormality Auto Shutdown (EAAS) initiated due to thermal reasons, either resulting from the system overheating, or from the loss of cooling.
Smart Array Log
----------------------
19:27:13 - SCSI bus fault occurred on Storage Box box 0, , Port 0 of Array Controller in slot 3. This may result in a "downshift" in transfer rate for one or more hard drives on the bus.
19:27:13 - Physical Drive on DEVICE ID 5 on Port 1 of Array Controller in slot 3 has failed. Failure Code: 0x07.
19:27:13 - Logical Drive 2 of Array Controller in slot 3 has changed from status code 0 to status code 3.
20:07:59 - The Event Notification driver Cpqcisse.sys of Array Controller in slot 3 has started.
20:08:29 - Logical Drive 2 of Array Controller in slot 3 has changed from status code 3 to status code 4.
20:08:29 - Logical Drive 2 of Array Controller in slot 3 has changed from status code 4 to status code 5.
21:25:00 - Logical Drive 2 of Array Controller in slot 3 has changed from status code 5 to status code 0.
As indicated by the 19:45 event, the server overheats and then restarts. It's worth noting that this does happen once in a while, but it always accompanies this drive failure. After it automatically restarts it restores that failed drive and everything is fine.
It's also worth noting that I get the cpqcissm1 event ("The device, \Device\Scsi\cpqcissm1, did not respond within the timeout period.") maybe once or twice a week.
Now, for a little history:
At the beginning of this year one of my hard drives truly did fail and I replaced it with a "new" one (I say that because it was used but new to the server). That fixed the failed drive, obviously, but ever since that point the cpqcissm1 events began appearing. This may be coincidence and not consequence, but it could also be indicative of a bigger problem--a problem that these recent events are revealing. (Also, the original failed drive is not the same one as the one from the recent events).
I'm more of a server administrator by necessity than by education, so I've come here for your advice: what do you think is going on? I'm pretty sure that there's a hardware issue, but judging from these events I don't know whether it's a hard drive, the SCSI controller or the Smart Array controller (or perhaps something else entirely).
If it helps, I'll give a quick rundown of the RAID setup:
Logical Drive 1: 3x 72.8 GB, all on firmware HPB3 (it was one of these drives that originally failed)
Logical Drive 2: 3x 146 GB, all on firmware HPB4 (it is one of these drives that fails in the recent events)
Thanks for any help you can provide.
EDIT: I'm looking into buying a replacement controller just in case but I'm coming across two different part numbers: 291966-B21 and 305414-001. Which one should I be getting?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2012 12:09 AM
03-14-2012 12:09 AM
Re: Smart Array 641 Issues (Proliant ML350 G4)
hi,
Can you please attach an ACU report from the server, but please update the ACU version first.
regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2012 06:18 AM
03-15-2012 06:18 AM
Re: Smart Array 641 Issues (Proliant ML350 G4)
Will do. In the meantime I'd like to get a spare controller anyway, so do you know which is the correct part number (291966-B21 or 305414-001, or does either work)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2012 06:23 AM
03-15-2012 06:23 AM
Re: Smart Array 641 Issues (Proliant ML350 G4)
hi,
the correct spare should be :
305414-001 Smart Array 641 Ultra320 Controller with 64MB cache - 64-bit, 133MHz, PCI-X PC board - Does not include a cache module or a battery
regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2012 01:08 PM
03-15-2012 01:08 PM
Re: Smart Array 641 Issues (Proliant ML350 G4)
Thanks again. This might be a dumb question, but I can just use the cache module from my existing card, correct? There aren't any issues with transferring the module between controllers?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-15-2012 11:41 PM
03-15-2012 11:41 PM
Re: Smart Array 641 Issues (Proliant ML350 G4)
hi,
Sure , you can use the cache from the old controller,if it is not defect.
regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-19-2012 06:36 AM
03-19-2012 06:36 AM
Re: Smart Array 641 Issues (Proliant ML350 G4)
I have that ACU diagnostic report ready, but before I publish it I need to ask: is there anything sensitive in it that I should remove first? I did a quick lookover and nothing appears like it should be hidden but I want to make sure first.
Also, in other news I looked at those cpqcissm event log entries again and the period in which the controller goes "unresponsive" is also the time that we run run a backup of our largest database (>12 GB). I've confirmed this by moving the backup time by about an hour and a half; the cpqcissm events followed to that new time as well. I don't think that this is a coincidence. Would an extended time (about 15 minutes) of I/O-heavy operations cause the controller to be slow enough to respond as to trigger this event?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2012 04:45 AM
03-27-2012 04:45 AM
Re: Smart Array 641 Issues (Proliant ML350 G4)
There's nothing sensitive in the ADU report.
Do i understand correctly, that the probelm ocoured after the disk replacement?
If so, it could be a bad spare drive.
The System Management Homepage is your best freind.
On the SMH you can read the statistics of the drives, in an easy to read manner.
Check all 5 drives, but do pay attension to Drive ID5 and the previous failing drive.
I will recommend you to upgrade BIOS / FW and drivers.
On the Array controller there's fixes to bus downshift problems etc.
Also, consider to reseat the previous failing drive.
BR
/jag
- Tags:
- SMH