- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: Disk write errors
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2011 12:51 AM
тАО06-01-2011 12:51 AM
Disk write errors
In my HP Proliant server , i have an issue .
The server is showing disk write errors .
I'm attaching the logs for that
OS is RHEL 5 .
May i get the reason and a solution for this issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2011 02:06 AM
тАО06-01-2011 02:06 AM
Re: Disk write errors
cciss/c1d1p1, cciss/c1d2p1 and cciss/c1d3p1).
Is this related to your earlier post about hpacucli?
Because filesystem errors have been detected, you will probably have to run a full filesystem check to filesystems on those disks after the root cause is fixed. A reboot might do that automatically.
Is there a common element that would affect all the physical disks corresponding to those logical disks?
For example, if all the corresponding physical disks are in an external enclosure, you should check the health of the enclosure: are all the cables connected, power supplies OK/not OK, etc.
Try to reboot the system and pay attention to the BIOS boot messages: the SmartArray controller might print out informative error messages.
Also check the firmware versions. If you're not running the latest firmware, read the version history of the firmware package of your SmartArray controller model, going backwards from the latest version until you reach the version you've running now. If the version history indicates important fixes that would seem to be relevant to your current issue, consider updating the SmartArray firmware.
I once had a DL380 G5 with a SmartArray P800 and an external MSA50 enclosure. At boot time, the SmartArray displayed these messages:
-----
1777-Slot 4 Drive Array - Storage Enclosure Problem Detected
Port 1E: Box 1: Enclosure Processor Not Detected or Responding
Turn system and storage enclosure power OFF and turn them back ON to retry. If this error persists, upgrade the enclosure firmware or replace the I/O module.
1784-Slot 4 Drive Array - Drive Failure
The following disk drive(s) are failed and should be replaced:
Missing Port/Box 1: Bays 1,2,3,4,5,6,7,8,9,10
On-Line Spare Drive Failed
-----
In this case, all the disks of the external MSA50 enclosure were "failed" because of a problem in the enclosure itself. A visual inspection of the I/O module revealed the cause. See the attached picture of the I/O module and note the burned component in the foreground.
After the I/O module was replaced, it turned out the disks were OK. A full filesystem check was still required, because the I/O module had failed in mid-operation.
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2011 02:32 AM
тАО06-01-2011 02:32 AM
Re: Disk write errors
This query is not about hpacucli issue .This issue is related to another server .
This server is directly connected to MSA20 Storage through SCSI cable . But there is no issue detected in any of the harddrive present in the storage . Can i suspect a possiblity of issue with storage controller .
Also Server is showing the erros as below
May 3 02:31:33 localhost kernel: Aborting journal on device cciss/c1d2p1.
May 3 02:31:33 localhost kernel: ext3_abort called.
May 3 02:31:33 localhost kernel: EXT3-fs error (device cciss/c1d2p1): ext3_journal_start_sb: Detected aborted journal
May 3 02:31:33 localhost kernel: Remounting filesystem read-only
May 3 02:31:33 localhost kernel: cciss: cmd ffff810037e87290 is reported invalid
May 3 02:31:33 localhost kernel: cciss: cmd ffff810037e87500 is reported invalid
Also
May 2 12:33:09 localhost kernel: EXT3-fs error (device cciss/c1d2p1) in ext3_ordered_commit_write: IO failure
May 2 12:33:09 localhost kernel: cciss: cmd ffff810037e80000 has CHECK CONDITION byte 2 = 0x5
May 2 12:33:09 localhost last message repeated 2 times
May 2 12:33:09 localhost kernel: ext3_abort called.
May 2 12:33:09 localhost kernel: EXT3-fs error (device cciss/c1d1p1): ext3_journal_start_sb: Detected aborted journal
May 2 12:33:09 localhost kernel: Remounting filesystem read-only
May 2 12:33:10 localhost kernel: cciss: cmd ffff810037e80270 is reported invalid
May 2 12:33:10 localhost kernel: cciss: cmd ffff810037e804e0 is reported invalid
May 2 12:33:11 localhost kernel: cciss: cmd ffff810037e80750 is reported invalid
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-01-2011 03:16 AM
тАО06-01-2011 03:16 AM
Re: Disk write errors
Also i'm getting errors as below
May 2 11:27:14 localhost auditd[3157]: Audit daemon rotating log files
May 2 12:33:08 localhost kernel: cciss: cmd ffff810037e926f0 is reported invali
d
May 2 12:33:08 localhost kernel: Buffer I/O error on device cciss/c1d1p1, logic
al block 5557037
And
May 19 11:28:45 localhost kernel: audit: audit_backlog=321 > audit_backlog_limit=320
May 19 11:28:45 localhost kernel: audit: audit_lost=2 audit_rate_limit=0 audit_backlog_limit=320
May 19 11:28:45 localhost kernel: audit: backlog limit exceeded
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-02-2011 02:15 AM
тАО06-02-2011 02:15 AM
Re: Disk write errors
how to check smart array firmware version
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-03-2011 02:30 PM
тАО06-03-2011 02:30 PM
Re: Disk write errors
# hpacucli
> controller all show config detail
[...all the information about the SmartArray you can hope for...]
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-03-2011 02:42 PM
тАО06-03-2011 02:42 PM
Re: Disk write errors
The audit error messages are probably because the system cannot write the audit logs to the disk. This is because your filesystems have switched to read-only mode (as indicated by the previous errors) because of SmartArray errors.
Yes, the failure of the controller is a possibility. Without knowing more about the controller's current state (model, firmware level, configuration, disks OK/NotOK) it's hard to say for sure.
MK