Operating System - Linux
1825729 Members
2712 Online
109687 Solutions
New Discussion

kernel: ping(9044): unaligned access errors

 
iinfi1
Super Advisor

kernel: ping(9044): unaligned access errors

hi all,

wish u all a v.happy new year .... hope 2011 brings peace to the world more than anything else ...:)

could you plz help me here.
do these errors mean there is some issue with the kernel which is running.
we have a oracle rac running on HP Itanium servers

SUSE Linux Enterprise Server 10 SP2 (ia64)
Linux oracledb2 2.6.16.60-0.21-default #1 SMP Tue May 6 12:41:02 UTC 2008 ia64 ia64 ia64 GNU/Linux

kernel: ping(5628): unaligned access

this is what i am worried about. google searches take me on a wild goose chase. could someone plz throw more light on this


[code]
Dec 21 23:59:46 oracledb2 kernel: ping(5628): unaligned access to 0x607fffffff5af8d5, ip=0xa00000010048a720
Dec 22 00:00:03 oracledb2 kernel: oracle(5860): floating-point assist fault at ip 40000000079c0262, isr 0000020000001001
Dec 22 00:00:03 oracledb2 kernel: oracle(5860): floating-point assist fault at ip 40000000079c0262, isr 0000020000001001
Dec 22 00:00:03 oracledb2 kernel: oracle(5860): floating-point assist fault at ip 40000000079c0262, isr 0000020000001001
Dec 22 00:00:03 oracledb2 kernel: oracle(5860): floating-point assist fault at ip 40000000079c0262, isr 0000020000001001
Dec 22 00:00:03 oracledb2 kernel: oracle(5860): floating-point assist fault at ip 40000000079c0262, isr 0000020000001001
Dec 22 00:00:16 oracledb2 kernel: ping(5997): unaligned access to 0x607fffffffd1b8d5, ip=0xa00000010048a720
Dec 22 00:00:47 oracledb2 kernel: oracle(6366): floating-point assist fault at ip 40000000080d3922, isr 0000020000001001
Dec 22 00:00:47 oracledb2 kernel: oracle(6366): floating-point assist fault at ip 40000000080d3922, isr 0000020000001001
Dec 22 00:00:47 oracledb2 kernel: oracle(6366): floating-point assist fault at ip 40000000080d3922, isr 0000020000001001
Dec 22 00:00:47 oracledb2 kernel: oracle(6366): floating-point assist fault at ip 40000000080d3922, isr 0000020000001001
Dec 22 00:00:47 oracledb2 kernel: oracle(6366): floating-point assist fault at ip 40000000080d3922, isr 0000020000001001
Dec 22 00:00:47 oracledb2 kernel: ping(6417): unaligned access to 0x607fffffff8cf8d5, ip=0xa00000010048a720
Dec 22 00:01:17 oracledb2 kernel: ping(6761): unaligned access to 0x607fffffffad38d5, ip=0xa00000010048a720
Dec 22 00:01:47 oracledb2 kernel: ping(7167): unaligned access to 0x607ffffffeed38d5, ip=0xa00000010048a720
Dec 22 00:02:17 oracledb2 kernel: ping(7508): unaligned access to 0x607ffffffe5178d5, ip=0xa00000010048a720
Dec 22 00:02:48 oracledb2 kernel: ping(7980): unaligned access to 0x607fffffff5838d5, ip=0xa00000010048a720
Dec 22 00:03:18 oracledb2 kernel: ping(8304): unaligned access to 0x607ffffffe9478d5, ip=0xa00000010048a720
Dec 22 00:03:48 oracledb2 kernel: ping(8709): unaligned access to 0x607ffffffef578d5, ip=0xa00000010048a720
Dec 22 00:04:19 oracledb2 kernel: ping(9044): unaligned access to 0x607fffffff18f8d5, ip=0xa00000010048a720
Dec 22 00:04:49 oracledb2 kernel: ping(9509): unaligned access to 0x607fffffff74f8d5, ip=0xa00000010048a720

[/code]
12 REPLIES 12
Matti_Kurkela
Honored Contributor

Re: kernel: ping(9044): unaligned access errors

Both the "unaligned access" and "floating-point assist fault" messages mean that the kernel has caught an user-space process doing something in an inefficient way. So it is not a kernel problem: it is an application problem.

In both cases, the kernel intervenes, figures out what the process was trying to do, does it the correct way and hands the result back to the process. This works, but is significantly slower than doing it the right way in the first place.

The log message is intended to tell the sysadmin that a program has an Itanium-specific bug and a bug report should be sent to the appropriate software vendor/developer.

Please see:
http://h21007.www2.hp.com/portal/site/dspp/menuitem.863c3e4cbcdc3f3515b49c108973a801?ciid=62080055abe021100055abe02110275d6e10RCRD

Is this a problem? I'd say it depends: if the problematic code is run relatively rarely (say, a few times per second) it might be unimportant to the overall performance of the program. But if the problem is in a critical piece of code that runs e.g. 500 000 times per second, it will have a significant performance impact and should be fixed ASAP.

I'd look for patches to the "ping" command and Oracle: there's a good chance that these problems have already been reported and fixed.

MK
MK
iinfi1
Super Advisor

Re: kernel: ping(9044): unaligned access errors

thank you Matti ... thanks a lot for the detailed reply

always wonder how u guys have answers to almost all questions posted here :)
have a wonderful 2011 ...
iinfi1
Super Advisor

Re: kernel: ping(9044): unaligned access errors

hi .. i m back :)

we have four database instances running on the two Oracle RAC nodes.
the four databases have found different OS users.
i was investigating this issue and found a few things.
i see that the /etc/security/limits.conf file doesnt have entries for three of the four OS users on node1, while its empty for node2.
Only the default oracle user which comes with suse installation (on selecting oracle packages) is present.

could this be the cause of the issue. i am planning to raise an SR with oracle on this.

thanks
Matti_Kurkela
Honored Contributor

Re: kernel: ping(9044): unaligned access errors

> could this be the cause of the issue.

The kernel messages indicate the programs are doing something wrong, that's certain. But perhaps that piece of code is run only if the limits are not set correctly? Only someone with access to Oracle source code (i.e. the programmers and maybe high-level support people at Oracle) can know for sure.

The messages include PID numbers: you could use them to see which Oracle instance(s) is/are having the problem. for example, if you see a message like "...kernel: oracle(6366): floating-point assist fault...", then the PID of the process was 6366 and you can use "ps -fp 6366" to check which user owns that process.

If that user doesn't have the limits set in Oracle-recommended way, I think it should be easy enough to try setting the limits correctly and restarting that Oracle instance. If the messages no longer appear after that, you have a solution for your issue.

MK
MK
emha_1
Valued Contributor

Re: kernel: ping(9044): unaligned access errors

Hi,

it can be rather considered as warning message then application error.

eg. ORACLE in note 279456.1 says it can be ignored or even supressed by syslog setup.
The note is based on following HP's doc http://h21007.www2.hp.com/portal/site/dspp/menuitem.863c3e4cbcdc3f3515b49c108973a801?ciid=62080055abe021100055abe02110275d6e10RCRD

However, there are reported some bugs as well for some DB releases. you may want to check it against your actual version.

emha.
Dennis Handly
Acclaimed Contributor

Re: kernel: ping(9044): unaligned access errors

>do these errors mean there is some issue with the kernel which is running.

These indicate sloppy programming and lack of adequate FP hardware support.

> ping(5628): unaligned access to 0x607fffffff5af8d5, ip=0xa00000010048a720

The address ...8d5 isn't 2, 4 or 8 byte aligned. If it occurs once per program run, it isn't worth worrying about.

>floating-point assist fault at ip 40000000079c0262, isr 0000020000001001

This indicates lack of full support for IEEE FP operations. (The Standard allows faults into the kernel for exceptional cases.) Perhaps the data is so small you have denorms? There isn't much you can do to the application, since it is data dependent.

Both of of these cases can be expensive if they occur a lot. For denorms, if you enable flush to zero, you'll get better performance.

>MK: ... "floating-point assist fault" messages mean that the kernel has caught an user-space process doing something in an inefficient way.

This isn't the case for a FP assist fault. It's due to lemon hardware design. :-)

>But if the problem is in a critical piece of code that runs e.g. 500000 times per second, it will have a significant performance impact and should be fixed ASAP.

Yes, I've seen that with denorms.
Matti_Kurkela
Honored Contributor

Re: kernel: ping(9044): unaligned access errors

>>MK: ... "floating-point assist fault" messages mean that the kernel has caught an user-space process doing something in an inefficient way.

>Dennis: This isn't the case for a FP assist fault. It's due to lemon hardware design. :-)


:-) I appreciate your view, and sort of agree.

Assuming that a program is produced for some known hardware (instead of a theoretical abstraction), the known misfeatures of the hardware should be taken into account.

In the case of Itanium, any floating-point code should decide in advance what to do with denorms: either explicitly ignore them if it's safe for that particular code (using "fesetenv (FE_NONIEEE_ENV);" or whatever), or explicitly catch the appropriate FP fault if flushing the denorms is not safe.

In a production-quality software, I think ignoring the hardware platform's lemon FP implementation and relying on the OS's "safety net" to fix it qualifies as stupid.

OK, I understand porting IEEE-compliant FP code to a non-IEEE-compliant hardware platform can be painful.

Multi-platform code already needs to deal with FP endianness questions and quite likely other platform-specific FP oddities anyway, so I see no excuse for ignoring Itanium's FP misfeatures. For the programmers of a mature multi-platform software like Oracle, having to deal with something like this should have been nothing new.

MK
MK
Dennis Handly
Acclaimed Contributor

Re: kernel: ping(9044): unaligned access errors

>MK: Assuming that a program is produced for some known hardware, the known misfeatures of the hardware should be taken into account.

It is hard to do that with denorms. Especially if they are created by FP instructions.

>In the case of Itanium, any floating-point code should decide in advance what to do with denorms: either explicitly ignore them if it's safe for that particular code (using fesetenv(FE_NONIEEE_ENV);),

Yes but that is the only thing you can do.

>or explicitly catch the appropriate FP fault if flushing the denorms is not safe.

These faults only go to the kernel and that is the choice Oracle made.

>I think ignoring the hardware platform's lemon FP implementation and relying on the OS's "safety net" to fix it qualifies as stupid.

Except you don't have any good choices except for flush to zero.
And I wouldn't call this a "safety net" like the misaligned case.

>I understand porting IEEE-compliant FP code to a non-IEEE-compliant hardware platform can be painful.

We don't have that case here.

>I see no excuse for ignoring Itanium's FP misfeatures.

There may be lots of platforms with this. PA-RISC also has this problem too.

>For the programmers of a mature multi-platform software like Oracle, having to deal with something like this should have been nothing new.

Other than flush to zero, there are no good choices. Other than make it configurable.
iinfi1
Super Advisor

Re: kernel: ping(9044): unaligned access errors

thanks for that detailed discussion. :)

do you think its better to ask Oracle about this? or is it ignorable?
i v read the notes which u v given. thanks for the same
Dennis Handly
Acclaimed Contributor

Re: kernel: ping(9044): unaligned access errors

>do you think its better to ask Oracle about this? or is it ignorable?

I doubt you'll get anywhere on the denorm issue. Especially if there are so few? faults.
If there are a whole bunch, you could ask for a configuration value to enable flush to zero.

Look at MK's URL on how to turn it off the logging.
Matti_Kurkela
Honored Contributor

Re: kernel: ping(9044): unaligned access errors

It would be absurd to log kernel error messages for non-fatal application behavior if the behavior is unavoidable.

I'm not a ia64 programmer, but with a bit of Googling, I easily found out that there _is_ a way for a program to decide how the FP handling should work for it on Linux-IA64. It's just the matter of telling the kernel what you wish done.

The relevant system call is prctl(2).

http://www.kernel.org/doc/man-pages/online/pages/man2/prctl.2.html

Within your application code, you can make the FP assist faults happen silently with:

prctl(PR_SET_FPEMU, PR_FPEMU_NOPRINT, 0L, 0L, 0L);

Or if you want, you can make the kernel send a SIGFPE instead of using the emulation, with:

prctl(PR_SET_FPEMU, PR_FPEMU_SIGFPE, 0L, 0L, 0L);

According to the man page, these features have existed since Linux 2.4.18 and 2.5.9, and are specific to ia64.

There's also a way to do the same to "unaligned access" errors. That's even older: it's supported on ia64 since Linux 2.3.48.

To make the system silently fix unaligned access errors:
prctl(PR_SET_UNALIGN, PR_UNALIGN_NOPRINT, 0L, 0L, 0L);

To make the system generate a SIGBUS on unaligned access:
prctl(PR_SET_UNALIGN, PR_UNALIGN_SIGBUS, 0L, 0L, 0L);

MK
MK
Dennis Handly
Acclaimed Contributor

Re: kernel: ping(9044): unaligned access errors

>MK: It would be absurd to log kernel error messages for non-fatal application behavior if the behavior is unavoidable.

Just to let you know about poor performance? Or in the unaligned case, where to fix your code.

>Within your application code, you can make the FP assist faults happen silently with:

Ok, that's something that could be suggested to Oracle.

>you can make the kernel send a SIGFPE instead of using the emulation

Not much you can do with this except print a message and abort. Otherwise it will be even slower.