1752695 Members
5617 Online
108789 Solutions
New Discussion юеВ

ip fragments dropped

 
michaelob
Advisor

ip fragments dropped

Hi,

I have been trying to resolve the issue of fragments being dropped on the nodes in a Oracle RAC cluster. We have implemented jumbo frames in the interconnct network which is using udp (Oracle is sending 8K packets). We have also increased the ip_reass_mem_limit = 8000000 and socket_udp_rcvbuf_default = 1048576. Should we continue to increase the ip_reass_mem_limit until the fragments being dropped stop ? Any input on how stop the fragments being dropped would be much appreciated. Please see the output from netstat -p ip and netstat -p tcp. The uptime on the node is 8 days. All interfaces are running 1000 Full-Duplex. I've also notice in measureware that I'm getting non zero values for NET_OUTQUEUE and when I drill down to the interfaces most of the queues are on the data network interfaces (1000 Full Duplex without jumbo frames) which is using tcp and not the interconnect interface.

 

ip:

488680685 total packets received

12 bad IP headers

10645546 fragments received

210 fragments dropped (dup or out of space)

7634 fragments dropped after timeout

0 packets forwarded

0 packets not forwardable

udp:

24 incomplete headers

21 bad checksums

3 socket overflows

 

Thanks

Mike

7 REPLIES 7
Laurent Menase
Honored Contributor

Re: ip fragments dropped

Hi 2 things

 

1)Do you have OracleRac error messages?

2) why don't you contact support?

 

If you have packet lost you may play very carefully increasing with kctune kernel param

11.31 str_syncq_limit

11.23 streams_sqmax

 

but you should only play with them under hp support direction.

It is 1000 by default, but try to have the smallest possible value without dropping. ( try 10000, then if not enough

100000).

 

 

 

michaelob
Advisor

Re: ip fragments dropped

Hi Laurent

Thank you for your response, we are seeing gc block lost in the oracle awr/statspack report.

My collegue has raised a call with support they have suggested that we change the ip_pmtu_strategy 3.

I'll keep your suggestion regarding str_syncq_limit in mind as the call progesses.

Thanks

Mike

Laurent Menase
Honored Contributor

Re: ip fragments dropped

gc block lost -> increase str_syncq_limit

In current context change ip_pmtu_strategy should be like painting your car in red, it is a nice colour, but it is a little cosmetic.

be very carefull changing str_syncq_limit, limit it to the strict needed value.

( try 10000, 100000, 500000) since in some condition the system may consume a lot of memory or have delaied message due to that limit.

 

donna hofmeister
Trusted Contributor

Re: ip fragments dropped

you're seeing numbers stack up for NET_OUTQUEUE?  That implies that stuff isn't able to make it off the box onto your network....which implies the issue may not be on the box at all but rather on the switch.  have you raised this concern with your network folks?  

 

also, is this box up-to-date for network patches?

michaelob
Advisor

Re: ip fragments dropped

Hi,

We are seeing NET_OUTQUEUE numbers in measureware when the systems gets busy. I expected to see the NET_OUTQUEUE high on the interconnect interface but surprisingly the queue was on the application data interface.

I've asked the question of the network team and they say there are no errors on the switches. The nodes in the cluster are BL870's in C7000 enclosures and they are using Cicso 3020 Blade Ethernet Switches.

I've ran some netperf tests and we are getting 92mbs-111mbs over Gigabit Ethernet.

As we are not seeing any errors on the interface cards and the network team are not seeing any errors on the switch. We have been trying to tune the network stack but we are still seeing fragments dropped.

Any suggestions on how we can reduce/stop the dropped fragments would be much appreciated..

Thanks
Mike
Laurent Menase
Honored Contributor

Re: ip fragments dropped

Again, gc block lost -> increase str_syncq_limit

be very carefull changing str_syncq_limit, limit it to the strict needed value.

( try 10000, 50000, 100000, 500000) since in some condition the system may consume a lot of memory or have delaied message due to that limit.

there is a document in the database which confirm it ( I co-wrote the original for itrc database) and I could find what looks like to a chinese translation  of some parts

http://translate.google.com/translate?hl=fr&sl=auto&tl=en&u=http%3A%2F%2Ftiger-wang.appspot.com%2Fcategory%2FDatabase

Also str_syncq_limit is hidden and can only be seen if you do a kctune str_syncq_limit.

( you can ask to your hp support contact to just type 'gc block lost' in the internal or public database

he will find that information too)

 

michaelob
Advisor

Re: ip fragments dropped

Hi Laurant,

I mention the str_syncq_limit to hp support and he was very interested in this option. At hp supports request we are asking the question of Oracle , what value should we set str_syncq_limit ?

I saw the chinese extract of your paper, the only google search returned for str_syncq_limit . I'll pass on the note to hp support to search the internal database for this doc.

Whilst we are waiting on a response from Oracle we are going to patch the network cards as suggested by donna and hp support.

I'll keep this call update with our progress.

Thank you for your help