- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: SLES 10 SP1, DRBD, and rsync
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-06-2007 07:49 AM
тАО11-06-2007 07:49 AM
SLES 10 SP1, DRBD, and rsync
I have two Heartbeat/DRBD clusters configured identically (other than names, of course). I was trying to sync the data between the current non-clustered nodes and the new clusters. On one cluster, this appears to work perfectly using rsync. On the other, the file system becomes read-only at seemingly random times. I've been running a tar piped via ssh to do a blind copy of the data to this cluster and that's been running smoothly for a while now, but any attempt to use rsync results in the file system going read-only. Not ideal for the final cut over.
I didn't find anything useful with a Google search. Any ideas?
Jeff Traigle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-06-2007 08:22 AM
тАО11-06-2007 08:22 AM
Re: SLES 10 SP1, DRBD, and rsync
Looks to me like a network inconsistency. A filesystem in Linux goes read only when there is a problem.
Here is what I'd check on:
1) dmesg, look for a problem related to the disk that underlies the filesystem.
2) fsck the filesystem itself after umounting it. Same as hp-ux, no fsck with the filesystem hot.
3) Consider building a new ram disk (mkinitrd) on the effected system.
There is evidence, I'd like to see it to help guide the diagnosis.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-06-2007 10:28 AM
тАО11-06-2007 10:28 AM
Re: SLES 10 SP1, DRBD, and rsync
It is about half way down titled
Why does the ext3 filesystems on my Storage Area Network (SAN) repeatedly become read-only?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-07-2007 02:29 AM
тАО11-07-2007 02:29 AM
Re: SLES 10 SP1, DRBD, and rsync
host1:~ # fsck /dev/drbd0
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/drbd0: recovering journal
The filesystem size (according to the superblock) is 53739520 blocks
The physical size of the device is 53287936 blocks
Either the superblock or the partition table is likely to be corrupt!
This seems to correlate with some syslog messages I found this morning:
Nov 6 17:09:46 fshare1 kernel: attempt to access beyond end of device
Nov 6 17:09:46 fshare1 kernel: drbd0: rw=0, want=426770448, limit=426303488
Nov 6 17:09:46 fshare1 kernel: EXT3-fs error (device drbd0): read_inode_bitmap: Cannot read inode bitmap - block_group = 1628, inode_bitmap = 53346305
Nov 6 17:09:46 fshare1 kernel: Aborting journal on device drbd0.
Nov 6 17:09:46 fshare1 kernel: EXT3-fs error (device drbd0) in ext3_ordered_writepage: IO failure
Nov 6 17:09:47 fshare1 kernel: ext3_abort called.
Nov 6 17:09:47 fshare1 kernel: EXT3-fs error (device drbd0): ext3_journal_start_sb: Detected aborted journal
Nov 6 17:09:47 fshare1 kernel: Remounting filesystem read-only
Nov 6 17:09:50 fshare1 kernel: EXT3-fs error (device drbd0) in ext3_new_inode: IO failure
Nov 6 17:09:50 fshare1 kernel: EXT3-fs error (device drbd0) in ext3_create: IO failure
Nov 6 17:10:09 fshare1 kernel: __journal_remove_journal_head: freeing b_committed_data
All of which led me to the discovery this morning that the available space for the RAID-5 LUN on the internal arrays don't match (203.4GB on one and 205.0GB on the other). First rule of clusters is make everything identical. :) So I'm reformatting the LVs to the lower capacity so they match and will test that. I'm pretty sure that should fix the problem though.
Jeff Traigle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО11-09-2007 12:55 AM
тАО11-09-2007 12:55 AM
Re: SLES 10 SP1, DRBD, and rsync
1. The mismatched physical size of the LUNs (203.4GB on one and 205.0GB on the other.
2. The matching LVs defined using these LUNs (both 203.4GB).
3. DRBD configured to use these matching LVs.
I got the same "attempt to access beyond end of device" error registered in syslog a few hours ago. This time, fsck found no problems, however:
hostname1:~ # fsck /dev/mapper/vg01-data
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/mapper/vg01-data: recovering journal
/dev/mapper/vg01-data contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/mapper/vg01-data: 175608/26673152 files (1.7% non-contiguous), 36824434/53320704 blocks
So I'm confused. Based on the message, it seems like DRBD is trying to write to the space on the LUN on the primary system that isn't being used in the underlying LV?
Jeff Traigle