- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- "false" Oracle corruption messages
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2002 09:04 AM
02-01-2002 09:04 AM
"false" Oracle corruption messages
platform HP-UX 11.00 on N-class and K-class + Oracle 8.1.5
I'm getting a lot of messages like the attached .jpg on my Vantive application. This normally shows hardware errors on a disk. I'm getting the error however on 4 different servers!
HP checked the disks and there is nothing wrong with it. I applied all the latest patch bundles + a special patch set as supplied by HP to possibly fix this problem.
Also, the error seems to fix itself after a few minutes. This is however very annoying for the application users who:
- cannot save data for a while
- think that the system is corrupted
Anyone experienced this before? Oracle is no help either on this one.
thanks,
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2002 09:14 AM
02-01-2002 09:14 AM
Re: "false" Oracle corruption messages
This is not a hardware problem but a software problem. Errno 9 indicates that an I/O operation (read, write, seek) was attempted on a file descriptor that is not open. My best guess is that you need to increase NFILES. Do a sar -v and look for overflows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-01-2002 09:21 AM
02-01-2002 09:21 AM
Re: "false" Oracle corruption messages
HP-UX xxxxxx B.11.00 U 9000/800 02/01/02
17:19:13 text-sz ov proc-sz ov inod-sz ov file-sz ov
17:19:14 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:15 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:16 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:17 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:18 N/A N/A 452/5000 0 7048/7048 0 3909/52058 0
17:19:19 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:20 N/A N/A 452/5000 0 7048/7048 0 3909/52058 0
17:19:21 N/A N/A 452/5000 0 7048/7048 0 3909/52058 0
17:19:22 N/A N/A 452/5000 0 7048/7048 0 3908/52058 0
17:19:23 N/A N/A 452/5000 0 7048/7048 0 3912/52058 0
the system is momentarily very quiet (Friday evening in Belgium). I'll run it again next week to see whether it increases...
I must admit I'm very bad at kernel tuning. I once ordered a kernel audit from HP, but I 'm still not confident about the setup.
Basically the server runs Vantive 'application server' and its database...
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 05:35 AM
02-05-2002 05:35 AM
Re: "false" Oracle corruption messages
It's generating the following output. (ERR#246 EWOULDBLOCK). Anyone knows this?
[17982] select(250, 0x6fff0778, NULL, NULL, NULL) ........ [sleeping]
[17982] select(250, 0x6fff0778, NULL, NULL, NULL) ........ = 1
[17982] getnumfds() ...................................... = 13
[17982] ioctl(0, I_XTI_RCV, 0x6fff1110) .................. = 0
[17982] ioctl(0, I_XTI_SND, 0x6fff11c8) .................. = 0
[17982] select(250, 0x6fff0778, NULL, NULL, NULL) ........ = 1
[17982] getnumfds() ...................................... = 13
[17982] ioctl(0, I_XTI_RCV, 0x6fff1110) .................. = 0
[17982] sigvec(SIGALRM, 0x6fff1308, 0x6fff1318) .......... = 0
[17982] alarm(5) ......................................... = 0
[17982] ioctl(11, FIONBIO, 0x6fff25c8) ................... = 0
[17982] select(0, NULL, NULL, NULL, 0x6fff1378) .......... = 0
[17982] write(11, "\0e8\0\006\0\0\0\0\011x d501\0\0".., 232) = 232
[17982] read(11, 0x400f6256, 2064) ....................... ERR#246 EWOULDBLOCK
[17982] open("/opt/oracle/product/8.1.5/rdbms/mesg/oraus.msb", O_RDONLY, 0) = 13
[17982] fcntl(13, F_SETFD, 1) ............................ = 0
[17982] lseek(13, 0, SEEK_SET) ........................... = 0
[17982] read(13, "1513" 011303\t\t\0\0\0\0\0\0\0\0".., 256) = 256
[17982] lseek(13, 512, SEEK_SET) ......................... = 512
[17982] read(13, "1d1 [ z x 0e\0\0\0\0\0\0\0\0\0\0".., 512) = 512
[17982] lseek(13, 1024, SEEK_SET) ........................ = 1024
[17982] read(13, "\018\0$ \07 \0@ \0J \0V \0a \0j ".., 512) = 512
[17982] lseek(13, 98304, SEEK_SET) ....................... = 98304
[17982] read(13, "\0\n\f+ \0\0\0D \f, \0\0\0r \f- ".., 512) = 512
[17982] close(13) ........................................ = 0
[17982] select(0, NULL, NULL, NULL, 0x6fff1378) .......... = 0
[17982] read(11, "\0be\0\006\0\0\0\0\00602\0\b\0\0".., 2064) = 190
[17982] ioctl(11, FIONBIO, 0x6fff2548) ................... = 0
[17982] alarm(0) ......................................... = 5
[17982] sigvec(SIGALRM, 0x6fff1308, 0x6fff1318) .......... = 0
[17982] sigvec(SIGALRM, 0x6fff1308, 0x6fff1318) .......... = 0
[17982] alarm(5) ......................................... = 0
[17982] ioctl(11, FIONBIO, 0x6fff25c8) ................... = 0
[17982] select(0, NULL, NULL, NULL, 0x6fff1378) .......... = 0
[17982] write(11, "\0[ \0\006\0\0\0\0\003^ d7\0\0\0".., 91) = 91
[17982] read(11, 0x400f7e1e, 2064) ....................... ERR#246 EWOULDBLOCK
[17982] open("/opt/oracle/product/8.1.5/rdbms/mesg/oraus.msb", O_RDONLY, 0) = 13
[17982] fcntl(13, F_SETFD, 1) ............................ = 0
[17982] lseek(13, 0, SEEK_SET) ........................... = 0
[17982] read(13, "1513" 011303\t\t\0\0\0\0\0\0\0\0".., 256) = 256
[17982] lseek(13, 512, SEEK_SET) ......................... = 512
[17982] read(13, "1d1 [ z x 0e\0\0\0\0\0\0\0\0\0\0".., 512) = 512
[17982] lseek(13, 1024, SEEK_SET) ........................ = 1024
[17982] read(13, "\018\0$ \07 \0@ \0J \0V \0a \0j ".., 512) = 512
[17982] lseek(13, 98304, SEEK_SET) ....................... = 98304
[17982] read(13, "\0\n\f+ \0\0\0D \f, \0\0\0r \f- ".., 512) = 512
[17982] close(13) ........................................ = 0
thanks
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 06:06 AM
02-05-2002 06:06 AM
Re: "false" Oracle corruption messages
I got the same error while running an upgrade on something ... I knew the hardware was solid, but the DBA was intent that a "bad block had occurred..data could (is) lost...yada yada yada..".
Of course they checked everything when we were done..and couldn't find anything missing...but...
I'm going to enjoy 'sharing' this with him.
You have made a cold Tuesday warmer !
I love this Forum,
Rita
..no points here...the joy of this is enough !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 06:42 AM
02-05-2002 06:42 AM
Re: "false" Oracle corruption messages
Strangely, man'ing the read system call does not indicate that a 246 errno is ever set but obviously it is. I would look at nflocks (you could be running out of system-wide file lock structures). I would also look at the semaphore settings. The one other thing I would look at is a timeslice value of 1 rather than 10. Some of the tuned parameter sets for database environments have very stupidly set the timeslice to 1 and this can cause all sorts of very strange semaphore problems and I suspect it could also cause file lock problems as well. I would also look through all the system call man pages to see if there are any that set errno to EWOULDBLOCK. In the meantime, I'll look into the ioctl on fdes 11 that precedes this read.
Regards, Clay
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 07:54 AM
02-05-2002 07:54 AM
Re: "false" Oracle corruption messages
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 07:58 AM
02-05-2002 07:58 AM
Re: "false" Oracle corruption messages
Your timeslice is set to 1; also nflocks is rather low for your nfiles setting. I would set timeslice to 10 and nflocks to 10000. You can do all this within SAM and build a new kernel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 08:10 AM
02-05-2002 08:10 AM
Re: "false" Oracle corruption messages
The user connection is twofold: 1 Application process and 1 Oracle connection.
The tusc I sent before contained the appl. process. Now I managed to trap the Oracle connection with tusc (see attachment).
I cannot apply your kernel changes immediately. The DB cannot be brought down easily...
regards,
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 09:18 AM
02-05-2002 09:18 AM
Re: "false" Oracle corruption messages
# sh /tmp/lock.sh
# cat outputfile
Tue Feb 5 17:08:21 GMT 2002
Number of used file lock table entries : 393
Tue Feb 5 17:19:47 GMT 2002
Number of used file lock table entries : 393
I'll try to get some downtime to increase the timeslice to 10...
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 09:39 AM
02-05-2002 09:39 AM
Re: "false" Oracle corruption messages
Hi:
This does look very strange:
The lseek is failing on fdes 0; EBADF indicates that fdes is not open
lseek(0, 33832960, SEEK_SET) ..... ERR#9 EBADF
...
...
open("/usr/lib/nls/msg/C/strerror.cat", O_LARGEFILE, 0177777) .......... = 0
This open returns 0 as a file descriptor; open returns the lowest available file descriptor.
You need to do some more digging to find where close(0) is called before the lseek that fails.
This is looking more and more like a software bug but I would definitely set timeslice to 10 because your current setting can cause all sorts of very stange behavior. If timeslice doesn't fix this, it's probably time to call Oracle support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-05-2002 11:13 AM
02-05-2002 11:13 AM
Re: "false" Oracle corruption messages
I recall a similar message on 8.1.6 ( more or less). The is a pacth from Oracle. See oracle??s alert.log on database server.
sar -v reports that inode is on his high value. If you are using HFS filesystems you need to raise ninode kernel parameter.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2002 12:03 AM
02-06-2002 12:03 AM
Re: "false" Oracle corruption messages
I checked with Oracle and they recommend to "upgrade to 8.1.7.2 and than apply patch for BUG 1247796."
This is a patch related to ASYNCH_IO which we're not using however. It will be very difficult to upgrade since the product is only supported on Oracle 8.1.5 (which is no longer supported by Oracle, so I'm as usual stuck between a rock and a hard place!)
I'll try to convince our Change Control Board to give me some downtime to change the timeslice value.
many thanks,
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2002 12:29 AM
02-06-2002 12:29 AM
Re: "false" Oracle corruption messages
I'm interested in your lock.sh script. Would you please post this script as attachement?
Thanks a lot
Ruediger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2002 12:49 AM
02-06-2002 12:49 AM
Re: "false" Oracle corruption messages
#cp /stand/vmunix /stand/vmunix.orig
#q4pxdb /stand/vmunix
in the script change the line /usr/contrib/bin/q4 /stand/vmunix /dev/mem with /usr/contrib/Q4/bin/q4 /stand/vmunix /dev/kmem
#sh /tmp/lock.sh
#cat /tmp/outputfile
regards,
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2002 02:45 AM
02-06-2002 02:45 AM
Re: "false" Oracle corruption messages
The lseek() tries to seek on a file which has gotten filedescriptor 0 from the process. The lseek() can't open the filedescriptor because the filedescriptor isn't open anymore. (in other words was closed before with the close() command) How do we know this, because the filedescriptor 0 is assigned to the first file that is opened.
The result of this EBADF is that, oracle sends to the iwserver process a message that it gets the EBADF.
The message is passed via the file "/opt/oracle/product/8.1.5/rdbms/mesg/oraus.msb".
file. The iwserver then further communicates this to the iwclientprocess on the pc.
with the following open questions:
questions :
1/ When is the filedescriptor 0 of the "oracleVANPROD" process closed ?
2/ Why is the filedescriptor closed ?
3/ Why doesnt the oracle process notice that the filedescriptor is closed and persists in doing a lseek() ?
Doing a continuous tusc on the oracle process may reveal when its closed and by who.
Dirk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2002 07:29 AM
03-11-2002 07:29 AM
Re: "false" Oracle corruption messages
We implemented a workaround of this package and this solved the "false" corruptions.
Dirk