A small analysis of hashing speed on Itanium
Introduction
After playing with ccache on our Itanium systems I began to wonder just what was making it run so slowly. ccache takes a hash of the gcc command line and the preprocessed source to decide if it can simply grab the required data from its cache without having to recompile. To rule out the hashing function as the bottle neck I have performed some light benchmarking of several different hashing functions on Itanium.
Methodology
The benchmark tests the hashing speed of several different hashing functions on a buffer of pseudo-random data. Several sizes, ranging from 100 bytes to 1MB, are hashed 100 times each. The amount of CPU time required is added up and averaged, then combined with the size of the data to give a KB hashed per second figure.
The hashing functions tested are
md4, from the ccache implementation. I believe this is also utilised in samba. The offical description of this algorithm is given in RFC1320.
Tiger, a 64 bit optimised hash.
md5, from the RSA reference implementation given in RFC1321
The test was run on an 800MHz Itanium, 900MHz Itanium 2, 2.2GHz Pentium IV Xeon 1 and a 700MHz PowerPC. gcc 3.2 was used on all but the Pentium, where gcc 2.96 was used. The -O3 flag was passed to gcc (as most of these algoirthms work on large loops the unrolling provided by optimisation gives considerable gains).
Results
Note : inf represents that there was not enough CPU time measured over 100 runs to return a result.
Machine |
Itanium |
Itanium 2 |
Pentium IV |
PowerPC |
||||||||
Hash |
MD4 |
Tiger |
MD5 |
MD4 |
Tiger |
MD5 |
MD4 |
Tiger |
MD5 |
MD4 |
Tiger |
MD5 |
100 |
inf |
inf |
inf |
100 |
inf |
inf |
inf |
inf |
inf |
inf |
10 |
inf |
1KB |
341 |
1024 |
256 |
512 |
1025 |
512 |
inf |
inf |
inf |
inf |
100 |
inf |
10KB |
379 |
488 |
341 |
732 |
732 |
410 |
1000 |
500 |
1000 |
500 |
250 |
500 |
100KB |
357 |
509 |
357 |
732 |
759 |
405 |
2500 |
556 |
1429 |
833 |
208 |
556 |
1MB |
342 |
493 |
351 |
724 |
730 |
403 |
2381 |
548 |
1356 |
648 |
193 |
425 |
Discussion
First and foremost, it does not appear that the time taken to do the MD4 hash is what is holding up ccache. It is the fastest hash except for tiger on the 64 bit machines.
Otherwise, the results are largely as expected. The 64 bit optimised tiger hash performs well on the Itanium, but not on the 32 bit machines. MD5 is designed to be more conservative (read slower) than MD4 and this shows too.
Code
A tarball with the code for this benchmark is available from http://www.gelato.unsw.edu.au/~ianw/hashbench/hashbench.tar.gz
You could quite easily add your favourite hash to the test; just add a function in tests.c (prototyped in tests.h) as void test_name(char *buf, int size) which runs your hash on string buf (length size). Add the function pointer to the array of functions in main() in hashbench.c and increase the test number by 1. Your test will now run.
This was a university server so was probably doing many other things at the time (1)
