Barriers to High Performance Computing on Linux

Linux has come a long long way since it was first released. It now runs on an extremely large variety of hardware, from embedded devices to 2000+ processor clusters. However, just running is not really good enough. To run well in a HPC envrionment requires certain infrastructure changes, and new or improved open-source software.

Note that just because there's something here doesn't mean that the Gelato@UNSW team is going to do anything about it.

Also note, that in the benchmarks I've (PeterChubb) run, kernel time itself is not significant. That might change as the number of processsors is increased (I have access only up to 4, which isn't very many). It's more likely that scalability problems will be found in other areas (e.g., increased memory latency because of large NUMA memories; slower than desirable operation because of TLB or cache contention, etc.) that kernel algorithms have impact on, but are not directly attributable to time spent in the kernel.

Kernel Changes

Memory Architecture

Scheduler changes

Performance Measurement and Control

Accounting

File systems

Topology/resource scheduling

Locking

Otherwise, it's very easy to multiply locks to no good purpose (it's often easier to create a new lock rather than understand what's there), which can lead to deadlocks.

Lock placement also needs to be carefully addressed. There seem to be two schools of thought:

  1. Put the lock into the same cacheline as the variables it protects. Then when you have the lock, you have the variables too. This is good for lightly contested locks.
  2. Put the lock on its own into a separate cacheline. Then when the lock is contended, the current holder of the lock can proceed at full speed until it does the write to release the lock, wherupon the cacheline containing the variables will be transferred to the other processor.

As far as I know, little or no analysis has yet been done on the placement of locks in the Linux kernel.

Infrastructure Changes

Threading

Other

See Rik van Riel's page http://surriel.com/research_wanted/

Also Larry McVoys talks on SMP Clusters (sounds a bit like Cellular Disco?)

And the ADEOS work http://www.opersys.com/adeos/practical-smp-clusters/

IA64wiki: HPCBarriers (last edited 2009-12-10 03:14:03 by localhost)

Gelato@UNSW is sponsored by
the University of New South Wales National ICT Australia The Gelato Federation Hewlett-Packard Company Australian Research Council
Please contact us with any questions or comments.