Operating System - Linux
1753797 Members
7386 Online
108805 Solutions
New Discussion юеВ

Re: rx4640 RHEL 2.1AS hangs frequently

 
SOLVED
Go to solution
Yeo Eng Hee_1
New Member

rx4640 RHEL 2.1AS hangs frequently

Hi,

Did not find anything related to the above in the search engine, so I'm putting this up for comments from ITRC members.

My place just got a bunch of rx4640s (Itanium2 1.5GHz and 1.3GHz) and we finally settled to run RedHat Enterprise Linux 2.1 AS, after finding out that Enterprise 3.0 does not support most of the software that we wanted to run.

There seems to be a few mysterious problems that with the setup. If someone else have similar problems, I'd be glad to hear of any solutions.

The major problem we have is with the systems hanging after a while. We run mainly batch compute intensive jobs on these servers, either C, FORTRAN, or commercial ones like Fluent and Abaqus.

The thing about the new systems is that it does not provide any facility to dump core and allow the support engineers to analyse what's causing the system to hang. Unlike Compaq Alpha systems, whereby we can halt the system and issue a 'crash' command at the P0>>> prompt.

The second annoyance is that LSF fails when the date is out of sync with our time (NTP) server. I managed to work around this by stopping the ntpd daemons and using ntpdate in a cron job, and I also restart LSF daemons every night.

Regards,

EH Yeo
National Universiy of Singapore
5 REPLIES 5
dirk dierickx
Honored Contributor
Solution

Re: rx4640 RHEL 2.1AS hangs frequently

i would give 3.0 a try never the less. i would be surprised if your needed apps would not be working on that version, as there are several ways to run the applications in 'compatibility' mode (couldn't find a better word for it).

anyway, how is the machine hanging? it just stops processing and the terminal window freezes or do you get a panic?

if you get a panic, look first in the syslog for any information that may point to an explenation. you can also use the ksymoops command to get more information about the panic.
Yeo Eng Hee_1
New Member

Re: rx4640 RHEL 2.1AS hangs frequently

The system just freezes. No other information was seen in the syslog, at least not to a non-expert like me :(
Celso Medina Kern
Trusted Contributor

Re: rx4640 RHEL 2.1AS hangs frequently

Hi Yeo,

Hang is a general definition for a broad range of causes.

I would try keep catching some information on this system on-the-fly up to the hang. I know some hangs occurs because os CPU load goes to the sky suddenly, so if you look at this info there may be a clue.

Try out in a loop that directs to a file:
top -b > /tmp/top.log

Other times you have network hang that seems like a system hang. It is important to have a physical console to try an access.

By knowing what exaclty was running on the system in the time of hang is a crucial information to understand and solve the cause of the hang. Perhaps some kind of auditing could help, but i unfortunately does not know these kind of tool in linux.

Good luck!

Celso
God bless pessimists, they did the backup!
Celso Medina Kern
Trusted Contributor

Re: rx4640 RHEL 2.1AS hangs frequently

Yeo, found good stuf for your problem.

There is a magig key that you can enable in your kernel, so that when the hang occurs you can try out somethig else than simply rebooting:

* What are the 'command' keys?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
'r' - Turns off keyboard raw mode and sets it to XLATE.

'k' - Secure Access Key (SAK) Kills all programs on the current virtual
console. NOTE: See important comments below in SAK section.

'b' - Will immediately reboot the system without syncing or unmounting
your disks.

'o' - Will shut your system off (if configured and supported).

's' - Will attempt to sync all mounted filesystems.

'u' - Will attempt to remount all mounted filesystems read-only.

'p' - Will dump the current registers and flags to your console.

't' - Will dump a list of current tasks and their information to your
console.

'm' - Will dump current memory info to your console.

'0'-'9' - Sets the console log level, controlling which kernel messages
will be printed to your console. ('0', for example would make
it so that only emergency messages like PANICs or OOPSes would
make it to your console.)

'e' - Send a SIGTERM to all processes, except for init.

'i' - Send a SIGKILL to all processes, except for init.

'l' - Send a SIGKILL to all processes, INCLUDING init. (Your system
will be non-functional after this.)

'h' - Will display help ( actually any other key than those listed above will display help. but 'h' is easy to remember :-)

To configure your magic key, look at
http://www.linux.com/howtos/Remote-Serial-Console-HOWTO/security-sysrq.shtml

To see more information on RedHat sysrq documentation, look at
/usr/src/linux/Documentation/sysrq.txt in your system.

Best regards,

Celso
God bless pessimists, they did the backup!
Yeo Eng Hee_1
New Member

Re: rx4640 RHEL 2.1AS hangs frequently

Thanks for the tips. Noticed that my KVM uses the PrintScreen/sysRq button to switch between system consoles. I wonder if this will affect the magic key operations ...

BTW, any experts on KVMs? My KVM seems to have problems when switching to console 1 using the preset buttons at the top row.