IA64 Linux TCP/IP Stack

The future home of a page detailing some performance measurements we have done/are about to do on the Linux TCP/IP stack on IA64.

What are we testing?

The network stack has multiple parts each adding to delay in processing traffic. At least we need to separate out NIC driver problems from IP problems from TCP problems.

One way to separate out NIC performance is to use the lo device and compare with real drivers.

The idea is to see:

  1. Is there any performance issue with the network under IA64 Linux?
  2. Where is the kernel spending time when under network load?
  3. What is the kernel's behaviour under various kinds of overload?
    1. Network saturation (throughput problems)
    2. Lots of half-closed connections (Time spent to close a connection when there are lots of outstanding closes; a performance issue for web servers)
    3. DOS attacks like the SYN attack
  4. Etc.

Testing Ideas

Connection round trip performance

TCP performance

Response time

Stack Performance

Parameters

The four major parameters for TCP/IP are :

does varying these result in large performance differences?

linux parameters

Notes

Think about throughput -- two links are not necessarily the same.

References

"Testing the Performance of Multiple TCP/IP Stacks", Proceedings of CMG97, December 7-12, 199, volume 1, pages 626-638. John L. Wood, Chrisopher D. Selvaggi and John Q. Walker.

TCP/IP Illustrated, Volume 1. W. Richard Stevens 1994 Addison-Wesley

Exploring the Performance of Select-based Internet Servers. Tim Brecht, Michal Ostrowski. HPL-2001-314, 2001

Preliminary Results

httperf/userver

httperf with userver makes for an ideal real-world style network test. httperf generates lots of http traffic to a high performance testing web server 'userver'.

As a first test, we have an IA64 server running 2.5.6-test5 with userver 0.3.3 listening for client requests. For the single client, we ran a test sending between 100 and 2000 messages/sec in increments of 100 messages/sec for a total of one minute (i.e. 2000 * 60 = 120,000 total messages). For the two clients, this was divided in half to give the same total messages. These are not particularly high loads, previous work shows even a modest x86 server should scale linearly to around 4000 requests.

Refer to the two graphs below; the 2.4 kernel scales as we expect (linearly), however the 2.5 kernel appears to hit a point where no more progress is made. This is repeatable with our setup; we are currently looking into what is going on. The effects do not seem to appear on a least PowerPC; running a 2.6.0-test7 kernel on a 700Mhz G3 produced a linearly scaling graph as expected. Interestingly, in this test we were using the IA64 box as a traffic generator, thus it has no problems outputting the required packets.

Update : I have got to the bottom of this particular problem -- I was sure I turned off Network Packet Filtering, but had apparently not. But it should be easy to see problems with this code, it will printk warnings! Yes -- if you haven't done something like 'dmesg -n 4' to stop unaligned access warnings ... I also upgraded to testing with a 2.6.0-test8 kernel since this stops another annoying (bogus) warning.

However, this does raise the question as to why the connection tracking table fills up quite quickly on IA64, where as on even an underpowered PowerPC it doesn't seem to have any problems. Additionally, after turning the packet filtering stuff off, I don't get any more unaligned access faults (or haven't yet).

http://www.gelato.unsw.edu.au/~ianw/tcp_perf/userver/single-2.4v2.5.png

http://www.gelato.unsw.edu.au/~ianw/tcp_perf/userver/two-2.4v2.5.png

I am now testing again, and noticing some strange effects. Testing between two IA64 boxes, the load generating machine seems to be pumping out packets at an incredible rate. Like 10,000 per second seemingly without problems. I'm not sure if this is correct, but certainly there don't seem to be any errors. Eventually the server gets to a point where it starts logging 'drop open request from 10.0.0.3' thousands of times (syslog supresses them). I'm yet to discover what is happening to cause this; userver doesn't report any problems because the packets must be getting dropped before userer has a chance to log the connection (?). Once the kernel gets into this state of dropping requests, it doesn't recover until I (seemingly) stop and start userver. All this to be confirmed, however.

IA64wiki: TCPStackPerformance (last edited 2009-12-10 03:13:39 by localhost)

Gelato@UNSW is sponsored by
the University of New South Wales National ICT Australia The Gelato Federation Hewlett-Packard Company Australian Research Council
Please contact us with any questions or comments.