Operating System - HP-UX
1832764 Members
3101 Online
110045 Solutions
New Discussion

Strange network performance issue

 
Richard Allen
Frequent Advisor

Strange network performance issue

I have a IA64 system running HP-UX 11.23 that I plan to run a NFS server on for a few Linux cluster nodes.

The server has two gigabit NIC's channeled together into a 2G pipe against a Cisco switch.

When a few (5 or less) nodes are reading data from the server the performance is almost acceptable but the total bandwidth used is just over 1G.

When more nodes access the server at the same time the performance drops. I see a linear drop in performance as the nodes increase.

The servers total output never gets close to the 2 Gbit range.

At first I thought this was some NFS tuning issue and read up on NFS tuning, both at the kernel level and filesystem level (Including number of NFS daemons running, number of kernel threads for TCP NFSv3 and so on) but I was unable to change the pattern or get more performance out of the machine.

I then made simple tests with ftp and I see the same problem there.

The server's CPU's are mostly Idle all the time during testing.

On the same switch I have a Linux server with the same network setup (2G channel) and it can utilize it's bandwidth fully.

The HP-UX box and switch both agree on speed and duplex settings and there are no errors, collisions or other issues on the Server or the Switchport.

I anyone has any ideas on how I should proceed to locate the problem, their help will be greatly appriciated.

Thanks in advance.
Richard.
4 REPLIES 4
Mel Burslan
Honored Contributor

Re: Strange network performance issue

how do you measure the performance of this server while you are running these tests ? If you have it, run glance and watch the memory and disk utilizations as well as the network load. Also check to see if the table utilizations are hitting anywhere near their maximums. And let us know your findings for better educated guesses.
________________________________
UNIX because I majored in cryptology...
Steve Lewis
Honored Contributor

Re: Strange network performance issue

It sounds like you are not utilising both interfaces after all. Yuo need to have HP APA software installed and configured to group them together, but it is not easy to configure.

Please post your lanscan output and the status of the virtual interface e.g. /usr/sbin/ifconfig lan100.

It will also help to post your settings in in /etc/rc.config.d/hp_apaconf
plus
/etc/rc.config.d/hp_apaportconf
and
/etc/rc.config.d/hpgelanconf


Steve Lewis
Honored Contributor

Re: Strange network performance issue

I just re-read your posting and saw that you were getting just over 1Gbit so your apa must be OK.

What we found in the past is that while NFS is for transferring large files at high speed, it is not good for transferring lots of small files.

We found that SAMBA/CIFS is better for transferring many small files. You can use that as a replacement for NFS.

FTP is even quicker, but harder to set-up.

rcp/scp is slowest.
Richard Allen
Frequent Advisor

Re: Strange network performance issue

The files the Linux cluster nodes are reading are very large. Atleast a Gigabyte each.
I start the nodes reading a file and time them. Then I can calculate, based on the time it takes to transfer the file, the bandwidth used.

There is almost no disk load on the server during these experiments of mine because I tend to make all the nodes transfer the same file and it looks like the buffercache on the server is taking care of buisness.

No system tables are close to their maximum values.

watson# lanscan
Hardware Station Crd Hdw Net-Interface NM MAC HP-DLPI DLPI
Path Address In# State NamePPA ID Type Support Mjr#
LinkAgg0 0x0012799E2207 900 UP lan900 snap900 6 ETHER Yes 119
LinkAgg1 0x000000000000 901 DOWN lan901 snap901 7 ETHER Yes 119
LinkAgg2 0x000000000000 902 DOWN lan902 snap902 8 ETHER Yes 119
LinkAgg3 0x000000000000 903 DOWN lan903 snap903 9 ETHER Yes 119
LinkAgg4 0x000000000000 904 DOWN lan904 snap904 10 ETHER Yes 119

watson# ifconfig lan900
lan900: flags=1843
inet 172.17.150.14 netmask ffffff00 broadcast 172.17.150.255


watson# grep -v ^# /etc/rc.config.d/hp_apaportconf | grep -v ^$
HP_APAPORT_INTERFACE_NAME[0]=lan0
HP_APAPORT_GROUP_CAPABILITY[0]=3
HP_APAPORT_CONFIG_MODE[0]=FEC_AUTO
HP_APAPORT_INTERFACE_NAME[1]=lan1
HP_APAPORT_GROUP_CAPABILITY[1]=3
HP_APAPORT_CONFIG_MODE[1]=FEC_AUTO

watson# grep -v ^# /etc/rc.config.d/hp_apaconf | grep -v ^$
HP_APA_START_LA_PPA=900
HP_APA_DEFAULT_PORT_MODE=MANUAL
HP_APA_INTERFACE_NAME[0]=lan900
HP_APA_LOAD_BALANCE_MODE[0]=LB_MAC
HP_APA_GROUP_CAPABILITY[0]=3
HP_APA_HOT_STANDBY[0]=off


watson# grep -v ^# /etc/rc.config.d/hpgelanconf | grep -v ^$
HP_GELAN_INIT_ARGS="HP_GELAN_STATION_ADDRESS HP_GELAN_SPEED HP_GELAN_MTU HP_GELAN_FLOW_CONTROL HP_GELAN_AUTONEG HP_GELAN_SEND_COAL_TICKS HP_GELAN_RECV_COAL_TICKS HP_GELAN_SEND_MAX_BUFS HP_GELAN_RECV_MAX_BUFS"
HP_GELAN_INTERFACE_NAME[0]=
HP_GELAN_STATION_ADDRESS[0]=
HP_GELAN_SPEED[0]=
HP_GELAN_MTU[0]=
HP_GELAN_FLOW_CONTROL[0]=
HP_GELAN_AUTONEG[0]=
HP_GELAN_SEND_COAL_TICKS[0]=
HP_GELAN_RECV_COAL_TICKS[0]=
HP_GELAN_SEND_MAX_BUFS[0]=
HP_GELAN_RECV_MAX_BUFS[0]=


Kernel parameters that I have changed:
watson# kctune -S
Tunable Value Expression Changes
dbc_max_pct 70 70 Immed
dbc_min_pct 10 10 Immed
default_disk_ir 1 1
dnlc_hash_locks 4096 4096
dst 0 0
fs_async 1 1
ftable_hash_locks 4096 4096
max_thread_proc 1024 1024 Immed
maxvgs 32 32
ncsize 32768 32768
nflocks 8192 8192 Imm (auto disabled)
nstrpty 60 60
timezone 0 0
vnode_cd_hash_locks 4096 4096
vnode_hash_locks 4096 4096


I've tested 32, 64 and 128 nfsd's running but again, I doubt that this is a NFS problem because I cannot get the server to pump out data like I know it shoulf be able to do.