Benchmarks of the NPTL library on IA64

Update

As at September 2004 these results are mostly of historical interest; the libraries and kernel have undergone significant changes that will alter performance.

Currently the people at Bull Open Source are doing some scalability and performance testing on a wide range of hardware.

Their results are available in their forums

Notes

Tests

Lifecycle

Test the thread life cycle by creating a thread and joining it as much as we can in the specified time.

Results

Library

Average/second

Compared to NPTL (IA-64)

Linux Threads (Pentium)

127819

44% more threads created

NPTL (Pentium)

229619

158% more threads created

Linux Threads (IA-64)

12330

86% less threads created

NPTL (IA-64)

88899

-

Context

Have a fight over a variable locked by some mutexes. This just makes two threads switch as much as they possibly can.

Results

Library

Average/second

Compared to NPTL (IA-64)

Linux Threads (Pentium)

145010

52% less context switches

NPTL (Pentium)

342365

14% more context switches

Linux Threads (IA-64)

182826

39% less context switches

NPTL (IA-64)

301656

-

Library

Average Time / Switch - (Locking Overhead)

Compared to Machine Cycles

Linux Threads (Pentium)

6896ns - 132ns = 6764ns

6764ns * (2.5 cycles per ns) = 16910 machine cycles

NPTL (Pentium)

2920ns - 70.5ns = 2849ns

2849ns * (2.5 cycles per ns) = 7112 machine cycles

Linux Threads (IA-64)

5470ns - 158ns = 5312ns

5312ns * (0.9 cycles per ns) = 4781 machine cycles

NPTL (IA-64)

3315ns - 49ns = 3266ns

49ns * (0.9 cycles per ns) = 2939 machine cycles

Wakeup

Have 10 worker threads with one master thread. The threads conditionally wait on a "queue" that the master thread fills. When the queue is full, do a signal wake, where the woken worker thread processes the queue and returns. Showing how quickly condition variables respond.

Results

Library

Average/second

Compared to NPTL (IA-64)

Linux Threads (Pentium)

97771

94% more wake ups

NPTL (Pentium)

122761

143% more wake ups

Linux Threads (IA-64)

43483

14% less wake ups

NPTL (IA-64)

50331

-

Uncontested

See how many times a thread can get/release an uncontested lock.

Results

Library

Average/second

Compared to NPTL (IA-64)

Linux Threads (Pentium)

7571516

26% less uncontested locks taken

NPTL (Pentium)

7060900

31% less uncontested locks taken

Linux Threads (IA-64)

6328766

38% less uncontested locks taken

NPTL (IA-64)

10212532

-

Library

Average Time / Operation

Compared to Machine Cycles

Linux Threads (Pentium)

132ns / 2 = 66ns

66ns * (2.5 cycles per ns) = 165 machine cycles

Linux Threads (IA-64)

158ns / 2 = 79ns

79ns * (0.9 cycles per ns) = 63 machine cycles

NPTL (Pentium)

141ns / 2 = 70.5ns

70.5ns * (2.5 cycles per ns) = 176 machine cycles

NPTL (IA-64)

98ns / 2 = 49ns

49ns * (0.9 cycles per ns) = 44 machine cycles

Effect of page size

IA64 linux allows page sizes of 4K,8K,16K or 64K. 2.5.67 kernels were configured with only differing page sizes and the tests were run with libc cvs @ 2003-04-22 + NPTL 0.36 on the aforementioned Itanium 2 machine.

Context Switching

Page Size (KB)

4

8

16

64

Linux Threads

179183

179863

177651

177508

NPTL

396513

401157

411063

376556

%GAIN

54.81%

55.16%

56.78%

52.86%

Life Cycle

Page Size (KB)

4

8

16

64

Linux Threads

18502

16515

13727

6605

NPTL

101776

106560

106073

99582

%GAIN

81.82%

84.50%

87.06%

93.37%

Wake Up

Page Size (KB)

4

8

16

64

Linux Threads

69118

68690

67065

68069

NPTL

118380

110928

111702

105223

%GAIN

41.61%

38.08%

39.96%

35.31%

Uncontested

Page Size (KB)

4

8

16

64

Linux Threads

6323734

6325072

6324960

6325262

NPTL

10201658

10206947

10206628

10206287

%GAIN

38.01%

38.03%

38.03%

38.03%

Test Result Data

Lifecycle

Linux Threads (Pentium)

128340 threads created in 4.99537 sec = 25691.8 per second
129766 threads created in 4.99872 sec = 25959.9 per second
121461 threads created in 4.99888 sec = 24297.7 per second
130323 threads created in 4.99892 sec = 26070.2 per second
129206 threads created in 4.99882 sec = 25847.3 per second

Linux Threads (IA-64)

61478 threads created in 4.99969 sec = 12296.4 per second
61646 threads created in 4.99982 sec = 12329.6 per second
62016 threads created in 4.99966 sec = 12404 per second
61528 threads created in 4.99976 sec = 12306.2 per second
61585 threads created in 4.99959 sec = 12318 per second

NPTL (Pentium)

1146950 threads created in 4.99332 sec = 229697 per second
1155268 threads created in 4.9982 sec = 231137 per second
1147274 threads created in 4.99876 sec = 229511 per second
1140484 threads created in 4.99872 sec = 228155 per second
1147698 threads created in 4.9987 sec = 229599 per second

NPTL (IA-64)

453504 threads created in 4.99994 sec = 90702 per second
442196 threads created in 4.99971 sec = 88444.4 per second
442380 threads created in 4.99951 sec = 88484.7 per second
442508 threads created in 4.99948 sec = 88510.7 per second
441764 threads created in 4.99974 sec = 88357.4 per second

Context

Linux Threads (Pentium)

681148 context switches in 5.08959 sec = 133832 per second
973348 context switches in 4.9987 sec = 194720 per second
627962 context switches in 5.08881 sec = 123401 per second
738299 context switches in 5.04877 sec = 146233 per second
654499 context switches in 5.159 sec = 126866 per second

Linux Threads (IA-64)

1009192 context switches in 5.07046 sec = 199034 per second
1092900 context switches in 5.15004 sec = 212212 per second
881415 context switches in 4.99969 sec = 176294 per second
813726 context switches in 5.18222 sec = 157023 per second
914534 context switches in 5.39332 sec = 169568 per second

NPTL (Pentium)

1648943 context switches in 5.11928 sec = 322104 per second
1482676 context switches in 4.99866 sec = 296615 per second
2425918 context switches in 5.08865 sec = 476731 per second
1558364 context switches in 5.1187 sec = 304445 per second
1646595 context switches in 5.27868 sec = 311933 per second

NPTL (IA-64)

1904623 context switches in 5.06151 sec = 376296 per second
1523602 context switches in 4.99972 sec = 304738 per second
1527689 context switches in 5.00027 sec = 305521 per second
1559780 context switches in 4.99951 sec = 311986 per second
1630905 context switches in 5.26538 sec = 309741 per second

Wakeup

Linux Threads (Pentium)

428249 wakes ups in 4.99544 sec = 85727.9 per second
500555 wakes ups in 4.99804 sec = 100150 per second
505355 wakes ups in 4.99886 sec = 101094 per second
506447 wakes ups in 4.99872 sec = 101315 per second
502745 wakes ups in 4.99881 sec = 100573 per second

Linux Threads (IA-64)

250795 wakes ups in 4.99994 sec = 50159.6 per second
247568 wakes ups in 4.99947 sec = 49518.8 per second
246061 wakes ups in 5.00033 sec = 49208.9 per second
246656 wakes ups in 4.99965 sec = 49334.6 per second
245981 wakes ups in 4.99955 sec = 49200.6 per second

NPTL (Pentium)

594859 wakes ups in 4.99864 sec = 119004 per second
672069 wakes ups in 4.99914 sec = 134437 per second
601190 wakes ups in 4.99806 sec = 120285 per second
601265 wakes ups in 4.99857 sec = 120288 per second
598793 wakes ups in 4.99861 sec = 119792 per second

NPTL (IA-64)

260518 wakes ups in 4.99969 sec = 52106.8 per second
255846 wakes ups in 4.99955 sec = 51173.8 per second
240073 wakes ups in 4.99973 sec = 48017.2 per second
249059 wakes ups in 4.99959 sec = 49815.9 per second
252732 wakes ups in 5.00013 sec = 50545.1 per second

Uncontested

Linux Threads (Pentium)

37948774 uncontested locks taken in 4.9984 sec = 7592178.223 per second
37786013 uncontested locks taken in 4.99857 sec = 7559370.627 per second
37935855 uncontested locks taken in 4.99858 sec = 7589326.369 per second
37752282 uncontested locks taken in 4.99852 sec = 7552690.486 per second
37809587 uncontested locks taken in 4.99861 sec = 7564020.198 per second

Linux Threads (IA-64)

31645158 uncontested locks taken in 4.99992 sec = 6329131.6 per second
31640567 uncontested locks taken in 4.99929 sec = 6329017.184 per second
31640113 uncontested locks taken in 4.99933 sec = 6328866.871 per second
31637056 uncontested locks taken in 4.99935 sec = 6328236.402 per second
31639168 uncontested locks taken in 4.99941 sec = 6328584.17 per second

NPTL (Pentium)

35295501 uncontested locks taken in 4.99789 sec = 7062087.463 per second
35313675 uncontested locks taken in 4.99866 sec = 7064624.08 per second
35175797 uncontested locks taken in 4.99865 sec = 7037057.998 per second
35316222 uncontested locks taken in 4.99857 sec = 7065259.412 per second
35367709 uncontested locks taken in 4.99863 sec = 7075476.235 per second

NPTL (IA-64)

51058878 uncontested locks taken in 4.99938 sec = 10213037.93 per second
51055216 uncontested locks taken in 4.99937 sec = 10212338.12 per second
51050637 uncontested locks taken in 4.99938 sec = 10211397.7 per second
51064154 uncontested locks taken in 4.99992 sec = 10212992.17 per second
51057906 uncontested locks taken in 4.99935 sec = 10212898.66 per second

IA64wiki: NPTLbenchmarks (last edited 2009-12-10 03:14:05 by localhost)

Gelato@UNSW is sponsored by
the University of New South Wales National ICT Australia The Gelato Federation Hewlett-Packard Company Australian Research Council
Please contact us with any questions or comments.