Operating System - HP-UX
1751914 Members
5395 Online
108783 Solutions
New Discussion юеВ

Re: "false" Oracle corruption messages

 
Dirk Fieremans
Advisor

"false" Oracle corruption messages

Hi,
platform HP-UX 11.00 on N-class and K-class + Oracle 8.1.5

I'm getting a lot of messages like the attached .jpg on my Vantive application. This normally shows hardware errors on a disk. I'm getting the error however on 4 different servers!
HP checked the disks and there is nothing wrong with it. I applied all the latest patch bundles + a special patch set as supplied by HP to possibly fix this problem.
Also, the error seems to fix itself after a few minutes. This is however very annoying for the application users who:
- cannot save data for a while
- think that the system is corrupted

Anyone experienced this before? Oracle is no help either on this one.

thanks,
Dirk
16 REPLIES 16
A. Clay Stephenson
Acclaimed Contributor

Re: "false" Oracle corruption messages

Hi Dirk:

This is not a hardware problem but a software problem. Errno 9 indicates that an I/O operation (read, write, seek) was attempted on a file descriptor that is not open. My best guess is that you need to increase NFILES. Do a sar -v and look for overflows.

If it ain't broke, I can fix that.
Dirk Fieremans
Advisor

Re: "false" Oracle corruption messages

It gives me the following output:
HP-UX xxxxxx B.11.00 U 9000/800 02/01/02

17:19:13 text-sz ov proc-sz ov inod-sz ov file-sz ov
17:19:14 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:15 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:16 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:17 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:18 N/A N/A 452/5000 0 7048/7048 0 3909/52058 0
17:19:19 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:20 N/A N/A 452/5000 0 7048/7048 0 3909/52058 0
17:19:21 N/A N/A 452/5000 0 7048/7048 0 3909/52058 0
17:19:22 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:23 N/A N/A 452/5000 0 7048/7048 0 3912/52058 0

the system is momentarily very quiet (Friday evening in Belgium). I'll run it again next week to see whether it increases...
I must admit I'm very bad at kernel tuning. I once ordered a kernel audit from HP, but I 'm still not confident about the setup.
Basically the server runs Vantive 'application server' and its database...

Dirk
Dirk Fieremans
Advisor

Re: "false" Oracle corruption messages

I ran "tusc" on a process that gave this false Oracle message.
It's generating the following output. (ERR#246 EWOULDBLOCK). Anyone knows this?

[17982] select(250, 0x6fff0778, NULL, NULL, NULL) ........ [sleeping]
[17982] select(250, 0x6fff0778, NULL, NULL, NULL) ........ = 1
[17982] getnumfds() ...................................... = 13
[17982] ioctl(0, I_XTI_RCV, 0x6fff1110) .................. = 0
[17982] ioctl(0, I_XTI_SND, 0x6fff11c8) .................. = 0
[17982] select(250, 0x6fff0778, NULL, NULL, NULL) ........ = 1
[17982] getnumfds() ...................................... = 13
[17982] ioctl(0, I_XTI_RCV, 0x6fff1110) .................. = 0
[17982] sigvec(SIGALRM, 0x6fff1308, 0x6fff1318) .......... = 0
[17982] alarm(5) ......................................... = 0
[17982] ioctl(11, FIONBIO, 0x6fff25c8) ................... = 0
[17982] select(0, NULL, NULL, NULL, 0x6fff1378) .......... = 0
[17982] write(11, "\0e8\0\006\0\0\0\0\011x d501\0\0".., 232) = 232
[17982] read(11, 0x400f6256, 2064) ....................... ERR#246 EWOULDBLOCK
[17982] open("/opt/oracle/product/8.1.5/rdbms/mesg/oraus.msb", O_RDONLY, 0) = 13
[17982] fcntl(13, F_SETFD, 1) ............................ = 0
[17982] lseek(13, 0, SEEK_SET) ........................... = 0
[17982] read(13, "1513" 011303\t\t\0\0\0\0\0\0\0\0".., 256) = 256
[17982] lseek(13, 512, SEEK_SET) ......................... = 512
[17982] read(13, "1d1 [ z x 0e\0\0\0\0\0\0\0\0\0\0".., 512) = 512
[17982] lseek(13, 1024, SEEK_SET) ........................ = 1024
[17982] read(13, "\018\0$ \07 \0@ \0J \0V \0a \0j ".., 512) = 512
[17982] lseek(13, 98304, SEEK_SET) ....................... = 98304
[17982] read(13, "\0\n\f+ \0\0\0D \f, \0\0\0r \f- ".., 512) = 512
[17982] close(13) ........................................ = 0
[17982] select(0, NULL, NULL, NULL, 0x6fff1378) .......... = 0
[17982] read(11, "\0be\0\006\0\0\0\0\00602\0\b\0\0".., 2064) = 190
[17982] ioctl(11, FIONBIO, 0x6fff2548) ................... = 0
[17982] alarm(0) ......................................... = 5
[17982] sigvec(SIGALRM, 0x6fff1308, 0x6fff1318) .......... = 0
[17982] sigvec(SIGALRM, 0x6fff1308, 0x6fff1318) .......... = 0
[17982] alarm(5) ......................................... = 0
[17982] ioctl(11, FIONBIO, 0x6fff25c8) ................... = 0
[17982] select(0, NULL, NULL, NULL, 0x6fff1378) .......... = 0
[17982] write(11, "\0[ \0\006\0\0\0\0\003^ d7\0\0\0".., 91) = 91
[17982] read(11, 0x400f7e1e, 2064) ....................... ERR#246 EWOULDBLOCK
[17982] open("/opt/oracle/product/8.1.5/rdbms/mesg/oraus.msb", O_RDONLY, 0) = 13
[17982] fcntl(13, F_SETFD, 1) ............................ = 0
[17982] lseek(13, 0, SEEK_SET) ........................... = 0
[17982] read(13, "1513" 011303\t\t\0\0\0\0\0\0\0\0".., 256) = 256
[17982] lseek(13, 512, SEEK_SET) ......................... = 512
[17982] read(13, "1d1 [ z x 0e\0\0\0\0\0\0\0\0\0\0".., 512) = 512
[17982] lseek(13, 1024, SEEK_SET) ........................ = 1024
[17982] read(13, "\018\0$ \07 \0@ \0J \0V \0a \0j ".., 512) = 512
[17982] lseek(13, 98304, SEEK_SET) ....................... = 98304
[17982] read(13, "\0\n\f+ \0\0\0D \f, \0\0\0r \f- ".., 512) = 512
[17982] close(13) ........................................ = 0

thanks
Dirk
Rita C Workman
Honored Contributor

Re: "false" Oracle corruption messages

Thank you, Thank you, Thank you...

I got the same error while running an upgrade on something ... I knew the hardware was solid, but the DBA was intent that a "bad block had occurred..data could (is) lost...yada yada yada..".
Of course they checked everything when we were done..and couldn't find anything missing...but...

I'm going to enjoy 'sharing' this with him.

You have made a cold Tuesday warmer !

I love this Forum,
Rita

..no points here...the joy of this is enough !
A. Clay Stephenson
Acclaimed Contributor

Re: "false" Oracle corruption messages

Hi Dirk:

Strangely, man'ing the read system call does not indicate that a 246 errno is ever set but obviously it is. I would look at nflocks (you could be running out of system-wide file lock structures). I would also look at the semaphore settings. The one other thing I would look at is a timeslice value of 1 rather than 10. Some of the tuned parameter sets for database environments have very stupidly set the timeslice to 1 and this can cause all sorts of very strange semaphore problems and I suspect it could also cause file lock problems as well. I would also look through all the system call man pages to see if there are any that set errno to EWOULDBLOCK. In the meantime, I'll look into the ioctl on fdes 11 that precedes this read.

Regards, Clay


If it ain't broke, I can fix that.
Dirk Fieremans
Advisor

Re: "false" Oracle corruption messages

If it can help, I attached my current kernel config.

Dirk
A. Clay Stephenson
Acclaimed Contributor

Re: "false" Oracle corruption messages

Hi Dirk:

Your timeslice is set to 1; also nflocks is rather low for your nfiles setting. I would set timeslice to 10 and nflocks to 10000. You can do all this within SAM and build a new kernel.
If it ain't broke, I can fix that.
Dirk Fieremans
Advisor

Re: "false" Oracle corruption messages

we're getting there.
The user connection is twofold: 1 Application process and 1 Oracle connection.
The tusc I sent before contained the appl. process. Now I managed to trap the Oracle connection with tusc (see attachment).
I cannot apply your kernel changes immediately. The DB cannot be brought down easily...

regards,
Dirk
Dirk Fieremans
Advisor

Re: "false" Oracle corruption messages

I doesn't seem to relate to file locks. I ran this script provided by HP:
# sh /tmp/lock.sh
# cat outputfile
Tue Feb 5 17:08:21 GMT 2002
Number of used file lock table entries : 393
Tue Feb 5 17:19:47 GMT 2002
Number of used file lock table entries : 393

I'll try to get some downtime to increase the timeslice to 10...

Dirk