tftp performance issue

Stephen Keane · ‎09-04-2005

I am trying to network boot a B1000 from a J5000. Both machines are running HPUX 11.11 and have 10/100 LAN cards. They are both connected to a Gigbit switch (I've also tried a 100 MBit switch). I've got tcpdump running on the bootp server (the J5000) to watch the packet interchange. The bootp part works fine, but when the client (the B1000) starts to tftp the bootfile across, the problems start. The client and the server are only connected to each other via the switch, nothing else is connected to the switch.

Looking at the packet interchange, the standard tftp interchange data_pkt from server to client, ack_pkt from client to server is happening. At random intervals, the client sends an error_pkt to the server, the socket pair is dropped and the client starts again from the beginning of the boot file. This has occured as early as boot packet 4.

e.g.

server -> client data_pkt1
client-> server ack_pkt1
server -> client data_pkt2
client -> server ack_pkt2
server -> client data_pkt3
client -> server ack_pkt3
server -> client data_pkt4
client -> server error_pkt

The thing is, if the server and client are both booted up, and I tftp the bootfile myself it takes approximately 30 seconds with no errors. The problem only occurs when booting from the LAN, where it can take up to 20 minutes to get the file across.

Any ideas where to look for the problem? I've tried booting from a different server (even slower and that server had a Gigabit LAN card) and I've tried booting a different client (same result).

Both server and client are patched to the recommended level.

Florian Heigl (new acc) · ‎09-05-2005

There is a small possibility that negotiation settings are a cause for this, as they're to default during bootup.

Look at lan admins error counters (late collision, etc)

If distance allows, use a crossover cable for testing and try autoneg on.

yesterday I stood at the edge. Today I'm one step ahead.

Stephen Keane · ‎09-05-2005

I've attached the output of lanadmin (display) and I can't see anything obviously wrong!

What I have noticed is that even after the client has failed to boot and is sat back in the "Main Menu" of the IPL, there are still several tftpd processes running on the server. I've tried stopping all of these and restarting the bootpd process to make sure. What appears to happen is that the server sends a packet # 0 to the client, doesn't get an ack, so it sends it again, doesn't get an ack etc. eventually the client sends an error and the server drops the connection. (That's one mode of failure anyway).

Do I / Can I upgrade the firmware on the server/client?

Florian Heigl (new acc) · ‎09-05-2005

You can update the firmware (on the client),
but I'm not sure it will help. Honestly I absolutely *hate* debugging tftp/bootp/rarp issues :)

try manually using
cd /tmp/
tftp
get /opt/ignite/bin/WINSTALL # i hope that's the path
exit

from some other clients - is it running fast for these?

yesterday I stood at the edge. Today I'm one step ahead.

Sameer_Nirmal · ‎09-05-2005

Advice to follow these steps

Checking all UTP ethernet cables if they are patched properly and working.
Verifying network card UTP port working fine using loopback.
Keeping port characteristics comman across the network. Check the speed and mode ( half or full duplex at n/w card level. Turn off lan card & switch ports auto-negotiation.

Check for firmware compatibility at both end. e.g Server firmware should support client LAN card for n/w bootup.
Which firmware version you have at both ends?

Steven E. Protter · ‎09-05-2005

If this is an Ignite boot(It is not clear to me) then it must be on the built in NIC card. The transfer of actual data can be on the fast card, but if the box has a built in NIC it must boot of that, not any add in cards.

Cards 1Gb and above must be autonegotiate on the card and the switch settings.

Slower cards should be manual both on the hp-ux side and the switch settings.

No check of this issue is complete without checking the switch settings.

This boot could be done bypassing the switch on a cross connect wire, so long as its done on the built in NIC.

If this is ignite, what version is it. If not, pretty much this post is wasted bytes in HP's oracle database.

:=)

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Stephen Keane · ‎09-05-2005

To (hopefully) answer a few of the questions ...

No, it is not Ignite (although it's still very slow if you do use ignite).

There is only one LAN card in each box (server and client).

tftp works OK (i.e. fast) if both the boxes are booted, it's only slow if a box is attempting to boot off the LAN.

I've tried autoneg on both cards, as well as manually setting them to 100FD. I've also tried half-duplex. I've tried a Gigabit switch and a 100 Megabit switch.

I'm currently trying to find a cross-over cable to connect them back to back and eliminate the switch as a cause.

Florian Heigl (new acc) · ‎09-06-2005

Stephen,

I really think this is a negotiation issue, but I'm not completely sure what the solution would look like.

The lan admin / hp*conf settings take effect during the boot process when the card gets re-initialised, thus at the PDC prompt the card probably is running autonegotiation.

with the OS running we all have found the usual issues with HP-UX and cisco switches and thus manually set everything to 100Mbit/FD; this will break the half the autonegation process for a system at PDC, thus is will go to 100Mbit/HD.

(The line speed is detected via the layer2 encoding, but the box can't find out about the duplex mode)

With a cross-over cable You should find everything to work with 100/auto.
With the switch in between, the server's port can be unmodified and the client's port either on autoneg or 100/HD.
The bad thing is that different settings for server and client on the same switch can call in different problems, like the 100/hd host being overflowed (look for backpressure or flowcontrol settings in the switch).
As this is even more configuration changes, I'd say it's best to take a lot of effort and finally get 100/auto to work properly :)

yesterday I stood at the edge. Today I'm one step ahead.

rick jones · ‎09-06-2005

Since tftp has a single-outstanding packet at a time - being a synchronous request/reply protocol, it is unlikely for a duplex mis-match to cause problems. Now, if there is anything else trying to talk at the same time to the client - even "broadcast" traffic, there could be an issue. Couple that with the fairly long tftp timeouts and it could make things run rather slowly indeed.

By the time you can run lanadmin, any problems and statistics from boot time tftp are _long_ gone - there has been a switch of driver from that in the PDC (firmware) to that in the kernel, and the interface has been reset. You would have to look at statisics on the _switch_ to see if there are errors there - of course, it would not report errors that were only seen by the client.

If you start hardcoding things, be _CERTAIN_ to hardcode _BOTH_ the client and the switch:

How Autoneg is supposed to work:

When both sides of the link are set to autoneg, they will "negotiate" the duplex setting and select full duplex if both sides can do full-duplex.

If one side is hardcoded and not using autoneg, the autoneg process will "fail" and the side trying to autoneg is required by spec to use half-duplex mode.

If one side is using half-duplex, and the other is using full-duplex, sorrow and woe is the usual result.

So, the following table shows what will happen given various settings on each side:

Auto Half Full

Auto Happiness Lucky Sorrow

Half Lucky Happiness Sorrow

Full Sorrow Sorrow Happiness

Happiness means that there is a good shot of everything going well.
Lucky means that things will likely go well, but not because you did anything correctly :) Sorrow means that there _will_ be a duplex mis-match.

When there is a duplex mismatch, on the side running half-duplex you will see various errors and probably a number of late collisions. On the side running full-duplex you will things like FCS errors.
Note that those errors are not necessarily conclusive, they are simply indicators.

Also, since tftp is a synchronous request/reply protocol, you will not see all _that_ much improvement going from 10 to 100 to 1000 million bits per second - there will/should be some, but the packet sizes are _fairly_ small, which means that host path length is a more dominant aspect. So, no 10X increases in perf as you move up.

I don't recall much about the format of a tftp error packet - does it include any code you might examine to see just what sort of error the client is reporting?

there is no rest for the wicked yet the virtuous have no pillows

Stephen Keane · ‎09-06-2005

The tftp "error" packet contains a numeric error number, specific numbers are allocated to specific errors, except for zero, which means check the error string. Guess what? Yes that's right, I get error zero and no error string.

I've swapped out all the cables (can't find a cross-over cable at the mo) and moved to different ports on the switch (just in case) and removed a cable from the switch that wasn't connected to anything and the speed has increased a bit, but I still get the error packets even when only the client and server are connected to the switch, so I don't think broadcasts are causing a problem.

rick jones · ‎09-07-2005

How long does the traffic pause, if at all, before the error packet is emitted by the client?

Can you sniff the full packets and verify the checksums?

If it works ok with a back-to-back connection from server to client, but not with a switch, that is another indication that duplex mismatch may indeed be involved. When connected back to back, there will never be more than one packet on the network. When connected to the switch, there is the possibility of other traffic from other systems appearing on the client and/or server port, which could then lead to corrupted or dropped frames.

there is no rest for the wicked yet the virtuous have no pillows

Stephen Keane · ‎09-12-2005

The problem appears to be related to the size of the bootable file that tftp is trying to transfer. Since bootp uses a 16 bit unsigned integer to hold the size of the bootfile (in 512 byte blocks) and each packet sent holds the block number in a similar field field, I checked the size of the bootable image and it was a shade over 32 MBytes. When I reduced the size of the bootfile to less than 32 MBytes, it booted in under 5 minutes.

Not sure how that relates to the error packets, unless it was the blocknumber rolling over that was causing the problem.

Stephen Keane · ‎09-12-2005

See previous post. Not strictly a solution, but it fixed the problem I was experiencing.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

tftp performance issue

tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue

Re: tftp performance issue