- Community Home
- >
- Servers and Operating Systems
- >
- HPE BladeSystem
- >
- BladeSystem - General
- >
- BL685c file system corruption of in-memory data de...
BladeSystem - General
1748080
Members
5331
Online
108758
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-06-2010 09:58 AM
05-06-2010 09:58 AM
BL685c file system corruption of in-memory data detected
We have a sticky problem that keeps coming up on some of our BL685 blades. We have 6 of these blades that are all purposed the same and running SLES 10 sp2. At seemingly random times, we see the following messages appearing the /var/log/messages file:
May 4 12:38:21 hpbp04 kernel: Filesystem "cciss/c0d0p4": XFS internal error xfs_trans_cancel at line 1175 of file fs/xfs/xfs_trans.c. Caller 0xffffffff880c0a0c
May 4 12:38:21 hpbp04 kernel:
May 4 12:38:21 hpbp04 kernel: Call Trace:{:xfs:xfs_trans_cancel+91}
May 4 12:38:21 hpbp04 kernel:{:xfs:xfs_create+1395} {:x fs:xfs_vn_mknod+429}
May 4 12:38:21 hpbp04 kernel:{vfs_create+390} {open_nam ei+421}
May 4 12:38:21 hpbp04 kernel:{do_filp_open+28} {do_sys_ open+69}
May 4 12:38:21 hpbp04 kernel:{cstar_do_call+27}
May 4 12:38:21 hpbp04 kernel: xfs_force_shutdown(cciss/c0d0p4,0x8) called from line 1176 of file fs /xfs/xfs_trans.c. Return address = 0xffffffff880b9475
May 4 12:38:21 hpbp04 kernel: Filesystem "cciss/c0d0p4": Corruption of in-memory data detected. Shutting down filesystem: cciss/c0d0p4
May 4 12:38:21 hpbp04 kernel: Please umount the filesystem, and rectify the problem(s)
The partition is unusable until it is fixed. To remedy the problem, I have to unmount the partition and mount again. Running xfs_check and xfs_repair does not show any problem. This is what appears in the log after I do the umount/mount:
May 4 16:02:07 hpbp04 kernel: XFS mounting filesystem cciss/c0d0p4
May 4 16:02:07 hpbp04 kernel: Starting XFS recovery on filesystem: cciss/c0d0p4 (logdev: internal)
May 4 16:02:09 hpbp04 kernel: Ending XFS recovery on filesystem: cciss/c0d0p4 (logdev: internal)
The red flag is the memory corruption error. That error points me to either the raid controller cache, the hard drive caches or the RAM used for the file system cache.
There was a hot spot detected in the data center that showed intake temps of ~90 degrees, which is still within HP's tolerance level, but makes me suspicious of the hardware.
All the firmware is verified against the latest FW DVD 9.0. and all up-to-date. All the diagnosis passed.
Any ideas? Could this be an issue with a bad memory chip? What hardware part(s) should we replace?
May 4 12:38:21 hpbp04 kernel: Filesystem "cciss/c0d0p4": XFS internal error xfs_trans_cancel at line 1175 of file fs/xfs/xfs_trans.c. Caller 0xffffffff880c0a0c
May 4 12:38:21 hpbp04 kernel:
May 4 12:38:21 hpbp04 kernel: Call Trace:
May 4 12:38:21 hpbp04 kernel:
May 4 12:38:21 hpbp04 kernel:
May 4 12:38:21 hpbp04 kernel:
May 4 12:38:21 hpbp04 kernel:
May 4 12:38:21 hpbp04 kernel: xfs_force_shutdown(cciss/c0d0p4,0x8) called from line 1176 of file fs /xfs/xfs_trans.c. Return address = 0xffffffff880b9475
May 4 12:38:21 hpbp04 kernel: Filesystem "cciss/c0d0p4": Corruption of in-memory data detected. Shutting down filesystem: cciss/c0d0p4
May 4 12:38:21 hpbp04 kernel: Please umount the filesystem, and rectify the problem(s)
The partition is unusable until it is fixed. To remedy the problem, I have to unmount the partition and mount again. Running xfs_check and xfs_repair does not show any problem. This is what appears in the log after I do the umount/mount:
May 4 16:02:07 hpbp04 kernel: XFS mounting filesystem cciss/c0d0p4
May 4 16:02:07 hpbp04 kernel: Starting XFS recovery on filesystem: cciss/c0d0p4 (logdev: internal)
May 4 16:02:09 hpbp04 kernel: Ending XFS recovery on filesystem: cciss/c0d0p4 (logdev: internal)
The red flag is the memory corruption error. That error points me to either the raid controller cache, the hard drive caches or the RAM used for the file system cache.
There was a hot spot detected in the data center that showed intake temps of ~90 degrees, which is still within HP's tolerance level, but makes me suspicious of the hardware.
All the firmware is verified against the latest FW DVD 9.0. and all up-to-date. All the diagnosis passed.
Any ideas? Could this be an issue with a bad memory chip? What hardware part(s) should we replace?
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP