- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: SAN - HPUX - Oracle Issue
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-08-2005 03:30 AM
11-08-2005 03:30 AM
SAN - HPUX - Oracle Issue
I have somewhat of a complex issue.
We have a couple of N-class servers with SAN attached disks. All of our Oracle database and other info is stroed on these disks.
During the night, there seems to have been a problem on the SAN which caused the systems to switch to their alternate disks. This keps going back and forth. I also go syslog error messages as the ones below:
msgcnt 649 vxfs: mesg 016: vx_ilisterr - /database/DEVXXXX file system error re
ading inode 31
msgcnt 650 vxfs: mesg 037: vx_metaioerr - /dev/vgXXX/lvXXX file system meta
data read error
msgcnt 651 vxfs: mesg 017: vx_ilisterr - /apps file system inode 5716 marked bad
msgcnt 652 vxfs: mesg 016: vx_ilisterr - /apps file system error reading inode 5
716
msgcnt 653 vxfs: mesg 016: vx_ilisterr - /apps file system error reading inode 5
716
I have looked at these errors and most of them refer to a disk failure which is what the SAN failure would produce. The system kept switching over and caused more of the inode problems.
When we tried to run Oracle in the morning, it would not run. We found that there was a problem with the Shared Libraries that Oracle uses like this:
/usr/lib/dld.sl: Invalid shared library file: /apps/oracle81/lib/libobk.sl
/usr/lib/dld.sl: No such device or address
Even some of my other apps, not oracle based had problems and would create core files. I narrowed it down to one filesystem where the apps and shared libraries sit.
Right now, I have all disks failed over to the SAN port that is working. We also found that one particular SAN port is having problems staying up and is getting bit errors.
Does anyone know what may have caused this problem on the HPUX and Oracle side? It seems that a problem on the SAN caused a problem in HPUX which had a knock-on effect on Oracle.
How can I fix it?
I think the problem maybe related to inodes on the filesystem going bad but why does this happen and what can I do prevent / fix it?
Any help would be appreciated.
Thanks in advance,
Sanjay.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-08-2005 03:37 AM
11-08-2005 03:37 AM
Re: SAN - HPUX - Oracle Issue
Realistically you can go for lvm san patches in hopes of solving the problem with the shotgun approach.
The bottom line is the SAN should not be triggering disk fails and it should be checked. The fact that it happens on multiple machines leads me to suspect the problem is not with any one machine but the SAN or Fabric network.
SEP
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-08-2005 03:42 AM
11-08-2005 03:42 AM
Re: SAN - HPUX - Oracle Issue
for some reason you had a problem in your EVA.
maybe a disk failed some time ago, and now another failed too.
when an I/O error occur on disk, a problem appear on LVM and on vxfs.
some file cannot be read. From what I read, any file located on /apps might be at risk.
action:
1) repair disk.
2) fix filesystem using fsck.
3) restore any missing file from your backup.
Jean-Yves
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-08-2005 09:28 AM
11-08-2005 09:28 AM
Re: SAN - HPUX - Oracle Issue
Do you have patch PHKL_32920 installed (HP-UX 11.0)? Here is an exerpt from the warning produced by HP.
One Liner:
s700_800 11.00 LVM Cumulative Patch
Reason:
Warn: 05/10/10 - This Critical Warning has been issued by HP.
- PHKL_32920 introduced behavior that can result on EIO errors
incorrectly being returned to the filesystem or the calling
application even when the logical timeout has been set to
infinity.
- This behavior can be observed when I/O requests fail because
a physical volume becomes unavailable for any reason (e.g. a
controller failure, etc.) and it depends on the number of
paths configured:
- For physical volumes with multiple paths, the incorrect
behavior can only be observed in the window in which
one I/O fails and LVM switches to a good link. Any
I/Os issued within that window will result on an EIO
error being returned.
- For physical volumes with single paths, the incorrect
behavior will be observed by all I/Os issued after the
first I/O which caused the link to become unavailable.
This behavior will continue to occur until the path to
the physical volume is restored.
- Additional information on this behavior may be found in
Service Request 8606413726 (JAGaf73586).
- To avoid this behavior, HP recommends removing PHKL_32920
from systems if the calling application or file system cannot
handle the error returned from the underlying logical volume.
- The previous patch, PHKL_30553, does not exhibit this same
behavior. While patch warnings have been issued against
PHKL_30553, they are of a less serious nature. If you
choose to remove PHKL_32920, HP recommends that PHKL_30553
be installed to ensure as many known issues as possible are
addressed. If PHKL_30553 was installed prior to PHKL_32920,
it will automatically be restored when PHKL_32920 is removed
and will not need to be reinstalled.
HW-OS:s700: 11.00
s800: 11.00
Fixed:Unknown
Fset:LVM.LVM-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP
OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_32,v=HP
LVM.LVM-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP
OS-Core.CORE2-KRN,fr=B.11.00,fa=HP-UX_B.11.00_64,v=HP
Prod:N/A
Reboot:Yes
CView:Yes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-08-2005 08:08 PM
11-08-2005 08:08 PM
Re: SAN - HPUX - Oracle Issue
Thank you for all the replies. I do have PHKL_32920 installed. The problems described in the patch description are very much like what I am getting.
I am going to remove the patch and see what happens. Unfortunately, this is something that is not easily tested.
I will update with more points later,
Thank you so far.
Sanjay.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2005 02:33 AM
12-30-2005 02:33 AM
Re: SAN - HPUX - Oracle Issue
PHKL 32920 on an L-Class 11.0 box
There are no disk issues on the Clariion Side.
Can someone confirm that removal of this patch resolves the issue?
Thanks a lot
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2005 02:43 AM
12-30-2005 02:43 AM
Re: SAN - HPUX - Oracle Issue
pvchange -t 180 /dev/rdsk/c?t?d?
thx,
bl.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2005 02:45 AM
12-30-2005 02:45 AM
Re: SAN - HPUX - Oracle Issue
The pvtimeout is set to 180" for all associated PVs and the lvtimeout is at the default for all LVs
Thanks Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2005 02:48 AM
12-30-2005 02:48 AM
Re: SAN - HPUX - Oracle Issue
Also it seems folks in this thread who've encountered this problem are simply using ALTERNATE PATHING or PVLINKS (hence I assume use LVM). Depending on the array/SAN you're using - make sure all recommended settings for the PV such as bad balock relocation, timeouts, etc. are what the array vendor says it should be. This is so your OS should be able to handle the most common faults of most arrays --- controller failure which would lead to a failover situation.
Aside from the VxFS errors (or JFS) mentioned, you or an experienced Admin should be able to tell via Syslogs or Console Messages or even STM what the issues why VxFS complained of filesystem errors. In the absence of any SAN or PV configuration faults -- then check that your JFS/VxFS is patched or at least at version 3.3 on 11.0 environments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2005 02:59 AM
12-30-2005 02:59 AM
Re: SAN - HPUX - Oracle Issue
The Critical warning described in PHKL 32920 seems awfully similar to the behavior we saw. However, Since this is a production server I just can't start installing/removing patches without some sort of confirmation or level of certainty.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-30-2005 03:02 AM
12-30-2005 03:02 AM