HPE GreenLake Administration
- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Re: server hang
Operating System - Linux
1828038
Members
1935
Online
109973
Solutions
Forums
Categories
Company
Local Language
back
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
back
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Blogs
Information
Community
Resources
Community Language
Language
Forums
Blogs
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-14-2010 12:13 AM
12-14-2010 12:13 AM
server hang
HP Proliant DL580 G5 server geeting hang.
OS Linux 2.6.18-92.1.18
Please verify the logs and help me to find out the root cause
OS Linux 2.6.18-92.1.18
Please verify the logs and help me to find out the root cause
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-14-2010 03:54 AM
12-14-2010 03:54 AM
Re: server hang
Because you've saved the log file in SSH packet dump mode, it's rather difficult to read the text.
But the cause of the hang is pretty obvious: the system is running out of memory (RAM).
Text from the beginning of the log dump:
>Eaudispd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
[the rest of the log seems to contain only oom-killer debugging information]
When a Linux system is using all the RAM and swap and automatically shrunk all OS caches to the minimum possible size, the system is truly out of memory. At this time, the kernel starts an "oom-killer" procedure, that is intended to find a process to kill and gain some memory that way. Ideally, the oom-killer should find the process that has caused the system to run out of memory and kill it, but this is actually very hard to implement reliably.
In practice, the oom-killer may kill processes somewhat randomly. It might kill the SSH daemon, or other processes essential for logging on to the system through the network. In this case, it may be very difficult to do anything other than reboot the system.
After the reboot, you can only examine the logs and/or improve the monitoring on the system, so that next time you will catch the memory shortage *before* it gets so bad that oom-killer starts.
When a system runs for a long time (days/weeks/months), the memory usage of all its applications should eventually stabilize to some value if the workload of the system remains constant. But if an application has a bug, it might keep allocating more and more memory without any limit at all, because it cannot re-use the memory it already has, or "forgets" that it has the memory. This is a "memory leak".
If you draw a graph of the application's memory usage, a memory leak presents a characteristic "sawtooth" pattern: when an application is started, its memory usage rapidly climbs to some initial value, then keeps steadily growing after that. Once the application is stopped and restarted, the memory usage again returns to the initial value (even if the workload is exactly the same as before the restart), then resumes the slow growth.
"atop" and its non-interactive component "atopsar" are good tools for catching memory leaks. There is also an optional "atopscripts" package, which contains a "findleak" script that can interpret the data collected by atopsar and list the processes whose size has been growing, ordered by the speed of growth.
http://www.atoptool.nl/
http://www.atoptool.nl/download/atopscripts-1.1.tgz
Because stopping and restarting the application "resets" the memory leak, restarting the application periodically each night or weekend can be used as a work-around. But such a leak is always a bug in application and should be fixed.
(It was often thought that older versions of Windows required a "maintenance reboot" every now and then. This practice covered up a lot of memory leaks and allowed a culture of poor programming to develop.)
MK
But the cause of the hang is pretty obvious: the system is running out of memory (RAM).
Text from the beginning of the log dump:
>Eaudispd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
[the rest of the log seems to contain only oom-killer debugging information]
When a Linux system is using all the RAM and swap and automatically shrunk all OS caches to the minimum possible size, the system is truly out of memory. At this time, the kernel starts an "oom-killer" procedure, that is intended to find a process to kill and gain some memory that way. Ideally, the oom-killer should find the process that has caused the system to run out of memory and kill it, but this is actually very hard to implement reliably.
In practice, the oom-killer may kill processes somewhat randomly. It might kill the SSH daemon, or other processes essential for logging on to the system through the network. In this case, it may be very difficult to do anything other than reboot the system.
After the reboot, you can only examine the logs and/or improve the monitoring on the system, so that next time you will catch the memory shortage *before* it gets so bad that oom-killer starts.
When a system runs for a long time (days/weeks/months), the memory usage of all its applications should eventually stabilize to some value if the workload of the system remains constant. But if an application has a bug, it might keep allocating more and more memory without any limit at all, because it cannot re-use the memory it already has, or "forgets" that it has the memory. This is a "memory leak".
If you draw a graph of the application's memory usage, a memory leak presents a characteristic "sawtooth" pattern: when an application is started, its memory usage rapidly climbs to some initial value, then keeps steadily growing after that. Once the application is stopped and restarted, the memory usage again returns to the initial value (even if the workload is exactly the same as before the restart), then resumes the slow growth.
"atop" and its non-interactive component "atopsar" are good tools for catching memory leaks. There is also an optional "atopscripts" package, which contains a "findleak" script that can interpret the data collected by atopsar and list the processes whose size has been growing, ordered by the speed of growth.
http://www.atoptool.nl/
http://www.atoptool.nl/download/atopscripts-1.1.tgz
Because stopping and restarting the application "resets" the memory leak, restarting the application periodically each night or weekend can be used as a work-around. But such a leak is always a bug in application and should be fixed.
(It was often thought that older versions of Windows required a "maintenance reboot" every now and then. This practice covered up a lot of memory leaks and allowed a culture of poor programming to develop.)
MK
MK
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-14-2010 04:26 AM
12-14-2010 04:26 AM
Re: server hang
Hello,
Thank you so much for your input . This will help us proceed further.
Thank you so much for your input . This will help us proceed further.
The opinions expressed above are the personal opinions of the authors, not of Hewlett Packard Enterprise. By using this site, you accept the Terms of Use and Rules of Participation.
Company
Support
Events and news
Customer resources
© Copyright 2025 Hewlett Packard Enterprise Development LP