- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- GFS hangs after a couple of days
Operating System - Linux
1753665
Members
5488
Online
108798
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-07-2008 12:57 PM
11-07-2008 12:57 PM
GFS hangs after a couple of days
Hello all,
I have set up a new 7-node GFS cluster (details below). After a couple of days one or more of the three GFS file systems hangs. On some of the nodes a 'df' will hang and on others it will work, but an 'ls' of the file system hangs on every node. At the time of the hang I run 'cman_tool services' and everything looks okay;
cman_tool services
type level name id state
fence 0 default 0001000c none
[10 11 12 13 14 18 19]
dlm 1 clvmd 0001000b none
[10 11 12 13 14 18 19]
dlm 1 u01 0002000a none
[10 11 12 13 14 18 19]
dlm 1 u02 0004000a none
[10 11 12 13 14 18 19]
dlm 1 u03 0006000a none
[10 11 12 13 14 18 19]
gfs 2 u01 0001000a none
[10 11 12 13 14 18 19]
gfs 2 u02 0003000a none
[10 11 12 13 14 18 19]
gfs 2 u03 0005000a none
[10 11 12 13 14 18 19]
(yes, the node IDs are not sequential from 0 - long story so unless that has anything to do with it I will not go into it)
All we get in the messages file out of the ordinary is;
Nov 5 23:48:47 server10 openais[5292]: [TOTEM] Retransmit List: ad0
Nov 5 23:48:47 sever10 openais[5292]: [TOTEM] Retransmit List: ae3
Nov 5 23:48:47 server10 openais[5292]: [TOTEM] Retransmit List: ae3
Nov 5 23:48:48 server10 openais[5292]: [TOTEM] Retransmit List: aff
I tried running 'gfs_tool' with various options on the affected file system(s), but they hang if I do. I then performed an strace while running 'df' and it hangs while doing a stat call;
15925 stat("/u01",
When it's working correctly it looks more like this;
15877 stat("/u01", {st_mode=S_IFDIR|0775, st_size=3864, ...}) = 0
This is a brand new cluster and has only been up for less than a week so it's only happened twice. Unfortunately to keep the project going I have been unable to keep it in this locked up state for any length of time. I have had to reboot the entire cluster and get everything mounted again for the applicaton team to continue their work. I can see no pattern at this point as to when it's locking up.
Environment:
--------------
Hardware:
Two HP c-class chassis
HP Virtual Connect for network (10g uplink using VLAN tagging in a shared uplink set)
HP VIrtual Connect for SAN connectivity (connected to EMC Symmetrix)
Seven HP BL480c blade servers (5 servers in chassis0 and 2 in chassis1)
Software:
RHEL 5.2
Native multipathing
NIC bonding on the cluster interconnect
Any idea's?
Thanks,
David
I have set up a new 7-node GFS cluster (details below). After a couple of days one or more of the three GFS file systems hangs. On some of the nodes a 'df' will hang and on others it will work, but an 'ls' of the file system hangs on every node. At the time of the hang I run 'cman_tool services' and everything looks okay;
cman_tool services
type level name id state
fence 0 default 0001000c none
[10 11 12 13 14 18 19]
dlm 1 clvmd 0001000b none
[10 11 12 13 14 18 19]
dlm 1 u01 0002000a none
[10 11 12 13 14 18 19]
dlm 1 u02 0004000a none
[10 11 12 13 14 18 19]
dlm 1 u03 0006000a none
[10 11 12 13 14 18 19]
gfs 2 u01 0001000a none
[10 11 12 13 14 18 19]
gfs 2 u02 0003000a none
[10 11 12 13 14 18 19]
gfs 2 u03 0005000a none
[10 11 12 13 14 18 19]
(yes, the node IDs are not sequential from 0 - long story so unless that has anything to do with it I will not go into it)
All we get in the messages file out of the ordinary is;
Nov 5 23:48:47 server10 openais[5292]: [TOTEM] Retransmit List: ad0
Nov 5 23:48:47 sever10 openais[5292]: [TOTEM] Retransmit List: ae3
Nov 5 23:48:47 server10 openais[5292]: [TOTEM] Retransmit List: ae3
Nov 5 23:48:48 server10 openais[5292]: [TOTEM] Retransmit List: aff
I tried running 'gfs_tool' with various options on the affected file system(s), but they hang if I do. I then performed an strace while running 'df
15925 stat("/u01",
When it's working correctly it looks more like this;
15877 stat("/u01", {st_mode=S_IFDIR|0775, st_size=3864, ...}) = 0
This is a brand new cluster and has only been up for less than a week so it's only happened twice. Unfortunately to keep the project going I have been unable to keep it in this locked up state for any length of time. I have had to reboot the entire cluster and get everything mounted again for the applicaton team to continue their work. I can see no pattern at this point as to when it's locking up.
Environment:
--------------
Hardware:
Two HP c-class chassis
HP Virtual Connect for network (10g uplink using VLAN tagging in a shared uplink set)
HP VIrtual Connect for SAN connectivity (connected to EMC Symmetrix)
Seven HP BL480c blade servers (5 servers in chassis0 and 2 in chassis1)
Software:
RHEL 5.2
Native multipathing
NIC bonding on the cluster interconnect
Any idea's?
Thanks,
David
1 REPLY 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-09-2008 02:37 AM
11-09-2008 02:37 AM
Re: GFS hangs after a couple of days
Shalom,
Use RHN/yum to update to the latest versions.
Make sure there is only one version of the gfs-kernel package installed.
SEP
Use RHN/yum to update to the latest versions.
Make sure there is only one version of the gfs-kernel package installed.
SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
News and Events
Support
© Copyright 2024 Hewlett Packard Enterprise Development LP