cancel
Showing results for 
Search instead for 
Did you mean: 

sleep, run, sssllllleepppp

lastgreatone
Regular Advisor

sleep, run, sssllllleepppp

It's a real mystery. Since 6 days ago, the oracle processes have slowed right down and no one knows why. dmz gateway people say they've done nothing. Oracle programmer did not change his code. the dba did not change anything and the system manager, well we won't talk about him. So I run sar and see 100% busy on one disk but the overall average is ok. I run iostat and on that same disk and a 2nd disk there are 2869/3075 bps and 100.6/99.6 sps.

It almost looks like the TCP handshaking is intermittently interrupted. The processes flow fluently for a period of time and then, bang, an oracle process just hangs there for up to almost 1 minute before running. Why?

NFS is running on this server, Samba is running on this server. Could that be interferring?

What logs would show me events on sockets, port 1521?

Any one have to deal with something similar?

Appreciate any feedback.
TIA
12 REPLIES
Patrick Wallek
Honored Contributor

Re: sleep, run, sssllllleepppp

OK...You confused me. You go from talking about disks to TCP handshaking. I don't see the correlation.

Anyway, you say you've got 1 disk that is 100% busy. That's not a good thing. The first thing I would do is see what is on that disk. Is it an Oracle data file, an index or something else entirely? Then I'd see what I could do to rememdy that 100% disk if it has something to do with Oracle.

If that disk truly isn't a problem, then you can start looking at other things.
Anil C. Sedha
Trusted Contributor

Re: sleep, run, sssllllleepppp

Hi TIA,

Some suggestions and ideas.

Use "top" to see what processes are taking high CPU and high memory.

Your database should be spanning multiple disks so as to balance the i/o's.

You should try to extend your volume group for the same and distribute your database. Another option is moving your database to a faster disk (if any) using pvmove.

When your disk usage shows 100 % there is no association with tcp handshaking. Don't just look for i/o's, look for swap space also. This will give you an idea on why processes are not running fine. The /var/adm/syslog/syslog.log is the best solution for you here today to show you disconnected sessions. If you want to see what sockets are using port 1521 you may use

netstat -an | grep 1521

or use the "lsof" utility

lsof -i @ hostname:1521

Let me know if this helps.

Regards,
Anil
If you need to learn, now is the best opportunity
lastgreatone
Regular Advisor

Re: sleep, run, sssllllleepppp

Sorry for rambling on. But that's just it, I've run out of where to look for the problem. This problem Oracle server is inside the firewall, an L1000/11/64, it gets SYN, ACK from across the firewall from a K420 via port 1521. Up until 6 days ago, the flow was satisfactory, 1 to 5 seconds response time. The Oracle mount point which was 100% has been reduced, still the problem exists.??!!

As if periodically something connects! to the Oracle server and interferes with the processes, but what?
Vincente Fernandes
Valued Contributor

Re: sleep, run, sssllllleepppp

How long the server have been up? Maybe reboot could fix the problem, i know this will be the last option.
Michael Tully
Honored Contributor

Re: sleep, run, sssllllleepppp

What if any changes have been made to your server and or the connection through your firewall?
Anyone for a Mutiny ?
lastgreatone
Regular Advisor

Re: sleep, run, sssllllleepppp

That's what I suspect. Changes at the firewall, but it's difficult to get the firewall people to admit to any changes. Is there any way I can check the queue time/limits of packets at the firewall with some tool running from the Oracle server?
harry d brown jr
Honored Contributor

Re: sleep, run, sssllllleepppp


Frankie,

Do you have glance/measureware so that you can drill down and look at the issue?

live free or die
harry
Live Free or Die
Paula J Frazer-Campbell
Honored Contributor

Re: sleep, run, sssllllleepppp

Hi
Install the 60 day trial version of glance from the apps cd - fire it up and watch the activity.

Have you also checked yor logs /var/adm/syslog/syslog.log - dmesg -----

How full are your disks ?

Lots of things can slow a system down and I would imaging the TCP is an effect and not a cause.

Paula
If you can spell SysAdmin then you is one - anon
lastgreatone
Regular Advisor

Re: sleep, run, sssllllleepppp

The strange thing is we were able to replicate the problem on the Oracle dev. server running a browser locally. So I guess that blows the firewall bottleneck theory.

I notice in the logs that tftpd times out every 10 minutes, this service runs on both servers, could that have an impact?

The only other thing I can think of, is because of increased traffic on the Oracle instances, possibly the kernel parameters have to be looked at, along with the SGA.

Bill Thorsteinson
Honored Contributor

Re: sleep, run, sssllllleepppp

Look at the SGA to find queries which are run often,
or require lots of resources.

Attached script will show most frequent queries.

A poorly tuned security cache on a Web server saturated
our ethernet interface. One query was run over a million
times during the day.
Ian Dennison_1
Honored Contributor

Re: sleep, run, sssllllleepppp

Suggestion, drop the server, change the patch panel for the network to another port in another switch, with the same Speed and Duplex settings. This way you can at least eliminate the physical network.

Before that,...
What does 'netstat -i' show in error counts? Also 'lanadmin' for the network connections concerned.

Also, whats the memory like? Are you getting de-activations?

Share and Enjoy! Ian
Building a dumber user
lastgreatone
Regular Advisor

Re: sleep, run, sssllllleepppp

Thanks for your replies. The last two replies especially. I'll try those suggestions in the next few days. Some priorities bump other priorities....