Operating System - HP-UX
1839214 Members
2740 Online
110137 Solutions
New Discussion

Oracle DBlink connection problems on HP VM

 
Darren Keenan
Occasional Contributor

Oracle DBlink connection problems on HP VM

We have two Oracle instances (10g) on different virtual machines (hpux 11.23).
After creating a dblink, most normal queries work, such as "describe", or "select count", or "select * where rownum < 100". However, on queries where there is a significant amount of data transferred (such as a whole table), the connection sometimes hangs.
After running a network sniff, the (virtual) interfaces are exhibiting odd behavior. For example, there are cases of an acknowledgment (ack) sent 2 hours after the last data transmission.
This is very difficult to diagnose since the problem is very inconsistent. It works about 80% of the time, but then it fails, despite an underutilized CPU and network.
Has anyone seen problems like this, and does anyone have any suggestions?
Also, this problem has persisted across application (Oracle 9i), o/s (Tru64), and hardware (Alpha server) upgrades.
Thanks.
6 REPLIES 6
Yogeeraj_1
Honored Contributor

Re: Oracle DBlink connection problems on HP VM

HI Darren,

This is quite weird. At the database server level, it really does not make any distinction between the type of interfaces.

Can you reproduce the problem easily?

Can you test the same with another oracle database which is not on an VM?

kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Darren Keenan
Occasional Contributor

Re: Oracle DBlink connection problems on HP VM

Unfortunately, we have moved to an all VM environment - including production, so it is not possible to test on a non-VM instance. However, we have duplicated this problem on other instances (in a different physical machine), so it is more general than a specific device.
I was hoping someone had seen similar behavior. Part of the problem is that we are seeing a cascade of symptoms - where one error is causing another.
MORE INFO: After looking at the sniffer trace in greater detail, it appears the root problem is centered around a session that hangs after data transmission has begun. That is, we have a successful transmission, and the next session (5 hours later) starts just like the previous one and is looking normal, but then suddenly the guest server starts acknowledging the same packet/frame (a total of 10 times). Shortly after this, the guest and host server both time out, but the don't really drop the session, and it becomes "plugged" for the next session - 5 hours later.
TwoProc
Honored Contributor

Re: Oracle DBlink connection problems on HP VM

FWIW,

I've seen it too. It was a problem in getting data from a legacy machine using an 8i client to a 9i database. We NEVER resolved it, except by getting rid of the legacy machine. When we traced it, we saw exactly what you're seeing now. We ran code in debug mode with debuggers, etc, and could never figure out what it was waiting on, it looked good to us. Like you said, it was intermittent, but happened often enough to cause problems. However, it seemed to be independent of query return sizes.
We are the people our parents warned us about --Jimmy Buffett
Darren Keenan
Occasional Contributor

Re: Oracle DBlink connection problems on HP VM

The data transfer size is key for us. We noticed that the hangs happened much more frequently when the size of the tranfer exceeds about 30-50K. So, we implemented a workaround that transfers the data in segments (10,000 rows at a time). Until recently, this workaround worked over 99% of the time. Recently however, it has begun failing about 2-5% of the time. This is not horrible, but it really isn't acceptable in a production environment (where other jobs are dependent on this one).
Thanks again.
Rajesh K Rajan
Advisor

Re: Oracle DBlink connection problems on HP VM

Darren,

Considering that you've ruled out (????) your OS / network level delays.

What does the oracle session waits show on both sides? Do you have TOAD or a similar interface to browse sessions? If so check what the waits are, as that might give you a clue, whether there is something on the database which is causing excessive waits (log file sync / Enqueue / ... ).
TwoProc
Honored Contributor

Re: Oracle DBlink connection problems on HP VM

Darren, that's about what we saw too. At first, we believed it to be the size, as you've said. The programmers scaled back the number of rows transmitted (1000) at a time, and, while it lessened the events, it still could happen even sending a single row!

What they did in the end was make a custom ftp file transfer between the boxes and changed the process to batch oriented. This was to avoid using sqlnet. I was a good enough fix until we got rid of the legacy server just a few months later, and provided a more cumbersome, but more robust data exchange. We also considered ftp, using our own socket exchanges, etc. Since the programmers were already familiar with using nfs solutions for other old legacy database interchanges, they put in just one more to get us over the hump.
We are the people our parents warned us about --Jimmy Buffett