Table of Contents
Long Format VHPT and TLB Sharing Patches
Matthew Chapman (<matthewc AT cse DOT unsw DOT au>) has written a series of patches to the standard kernel to support both a Long format Virtual Hashed Page Table (VHPT) and TLB sharing of program text segments.
What do the patches do?
The translation lookaside buffer (TLB) is a cache of the most recently used virtual-to-physical address translations. The page tables are the backing store for the TLB. These patches share TLB entries between processes, meaning less need to look up the translations in the page tables. There are two components to the patches.
Long Format VHPT : The Itanium architecture supports a hardware based page table walker which sits between the TLB and system page tables called the Virtual Hashed Page Table (VHPT). When a TLB miss occurs, the VHPT is searched for the translation. The VHPT can have two forms, a short and a long. Only the long format has support for protection keys (see below), so the first part of the patch enables long VHPT support.
TLB Sharing of Text Segments: In a virtual memory based system, each and every process has access to the full address space of the architecture, independent of other processes use. This means that a virtual address used by one process will not map to the same physical address as this same virtual address in another process. As it is the job of the TLB to map virtual addresses to physical addresses, this means each time we switch processes we need to flush or clear out the TLB to avoid stale entries translations then get reloaded from the canonical source, the page tables).
This is an expensive operation, and generally not necessary. Flushing the TLB can easily be avoided by adding some extra information to the TLB about each process, called an address space ID (ASID). Ideally, each process gets its own ASID, and thus all that is needed to ensure correct translation is the setting of a register value to tell the TLB the current ASID.
The Itanium architecture uses a Region ID to implement the ASID concept. It works as you would expect, with each process given a unique Region ID to identify its address space. But instead of just one register to indicate the current address space, the Itanium provides a further enhancement with region registers. There are 8 region registers, which means that instead of just having the TLB map one address space, we can have 8 different ones simultaneously! In Linux these 8 registers correspond to 8 regions, some of which are allocated for for kernel use, some for program data and some for program code.
TLB sharing consists of two separate components:
Text sharing by using the same Region ID across the text Program code sections of processes using the same executable. Since Linux puts all program code into a known region, by checking if an executable already has an ASID we can simply re-use it. Thus we share the existing TLB entries with no extra effort. This by itself does not require long format VHPT.
mmap (including shared library code) sharing using protection keys. For program data (e.g. mmaped files), we require more fine grained control than simply sharing an entire region. For example, if a region is allocated for a program's data section, a process may mmap some memory in this region and share it with other processes. However, this does not imply that it is willing to share the entire region! We need a way to allow TLB sharing of only the specified mmaped region.
Thus we have protection keys which allow fine-grained access to regions. It works like this:
The kernel state for each process is expanded to contain 16 protection key registers. Each register consists of the protection key plus an additional set of rights (set by the kernel). More on this later.
- Every time a file is mmaped (unless it is explicitly mapped privately) it is put into the "shared" region and given a protection key (there are 2^18 possible keys, which is plenty). [side note: Protection is on a per-page basis (i.e. each page has a protection key). This mean we can now have the concept of an mmaped "object" by assigning all pages of the mmaped file the same key. Thus, all pages with a particular key can be thought of as the same object] This is particularly useful for the dynamic library loader used by Linux, as the libraries it loads can now be shared by putting them in this region. This requires some small changes to the loader to set this up correctly.
- Now, when a process attempts to access that mmaped object, obviously a TLB translation has to be done. The process has requested a virtual address (consisting of a region number, a virtual address and and an offset), which is looked up in the TLB to give a physical page number, as normal.
- However, now the TLB has a little extra information, namely the protection key. The protection key registers for the process are searched with the key from the TLB. As mentioned above, each protection key register has rights attached to it (RWX), and these are combined with the rights from the TLB to give the effective (or actual) rights for that process to that object.
- If the protection key is found in one of the protection key registers, the translation is returned. If the protection key is not found in one of the protection key registers, a protection key fault is raised. The kernel must check if the processes has rights to the object with that key, and load one of the protection key registers with the key (if allowed) [another side note: once the registers are full (i.e. a process has access to 16 shared objects) the kernel simply starts refilling sequentially. If 16 registers is enough, and the optimal replacement algorithm is a matter for further research].
Itanium Page Tables and TLB UNSW-CSE-TR-0307 Matthew Chapman, Ian Wienand, Gernot Heiser May 2003
How can I try it out?
There are two series of Long Format VHPT patches, 2.5.67 and 2.6.0. The later release has been enhanced to consider the real number of processes on the boot machine and to handle NUMA type machines. Due to unpredictable race conditions the Shared TLB patch has not been upgraded.
Release for kernel 2.5.67
Patches apply on top of the standard 2.5.67 kernel with the
Long VHPT patch: linux-2.5.67-ia64-longvhpt patch
Long VHPT + Shared TLB:
Currently the Shared TLB patch is being updated to the 2.6.x kernel and will NOT be available until testing is complete. You will also need a hacked loader to take advantage of
the sharing features. We have a dpatch that you can apply against Debian Stable's libc6.1 package (2.2.5-11.2).
N.B. Due to a patched glibc dependence you must apply the following patch glibc22-hppa-rela patch, before glibc22-ia64-sharing.dpatch! Gelato@UNSW local source also has a copy of the glibc patch here.
An easier way is to create a minimal '''Debian Stable''' chroot environment and copy binary versions of the hacked ld-2.2.5.so and libc-2.2.5.so to the /lib directory of your chroot environment. Then when you are in your chroot you will have the advantages of sharing.
WARNING: The sharing patch is not SMP safe. There are still some issues with the stability of the sharing patches -- we welcome bug reports. We are working on integrating this properly; please contact us if this interests you. If you are not familiar with patching and compiling kernels you can
Release for kernel 2.6.0
- With the release of the 2.6.x series kernel, work has been put in to
bring the long format VHPT originally written by MatthewChapman up to date with this version of the kernel. The advantage of applying the long format VHPT patch to the 2.6.0 series kernel is that ia-64 is included in the main stream kernel now, this means that you only have to obtain the main kernel tree with no additional patches involved, except of course the long format VHPT patch. We have had several responses from others who are interested in the long format VHPT walker and enhancements have been made to the patch to include NUMA and better SMP support. Testing details are being carried out and a full comparison between Matt's original work and the 2.6.0-test6 update will be given.
The latest patch can be found in the updates section.
If you require patches for earlier 2.6 kernel releases, these can be found here
Note: all patches have been tested to boot on Itanium2 and the HP SkiSimulator. Thanks to:
Arun Sharma (<arun DOT sharma AT intel DOT com>) and Kenneth Chen (<kenneth DOT w DOT chen AT intel DOT com>) for their input.
Mathew's initial results, in Itanium Page Tables and TLB and on the longvhpt page, were encouraging and we will continue to report new data that is generated as the long format VHPT patch matures. The long format VHPT has been updated to the 2.6.0 series kernel, performance results for this patch can be found
The use of the long format VHPT with large (> 1GB) file transfers has produced some interesting results with the long format VHPT reducing file transfer times non modified kernel results and VHPT enabled kernel results. At the release of the 2.6.3 kernel, attention was drawn to tlb flushing times and testing is underway to produce data.
Reporting your results
- We would like performance results from kernels with only Long VHPT enabled and kernels with both Long VHPT and the shared TLB support enabled. We are looking for results such as running on a live mirror of one of your production systems or perhaps results from a series of in house benchmarks you use to evaluate new systems. However, we welcome all feedback.
Please forward any results, comments or questions either by editing this page (you'll need to register first), or by emailing <gelato AT gelato DOT unsw DOT edu DOT au>
- This section describes subtle updates that have occurred to documents presented on this page, subtle changes to patches or information that may assist those of you who have read this page previously.
03-12-2003 Recently the long format VHPT patch was updated with huge TLB support (thanks to Arun Sharma). General testing for compilation and simulator booting have been successfully carried out, and any feed back would be appreciated: send mail to <gelato AT gelato DOT unsw DOT edu DOT au>. The long format VHPT with Huge TLB support can be found here.
23-12-2003 An update long-format-VHPT patch has been generated against linux-2.6.0 and David Mosberger's ia64 update patch. The Mosberger patch must be applied to the kernel source before the long-format-VHPT patch. The long-format-VHPT patch has been tested for various compilation configurations and booted on an Itanium1 machine (the 2.6.0 kernel is not stable on Itanium1, therefore extensive testing could not be carried out).
28-01-2004 A patch for kernel release 2.6.1-rc3 has been generated long-format-VHPT patch NEW. This patch includes updates for allocation of 512 CPUs, however we do not have the available resources to test such a configuration. Tests that have been performed are, boot Itanium 1 and 2 with configuration options of 64 CPUs and 512 CPUs, and booting on SkiSimulator. Note that this patch will also apply to 2.6.2-rc1.
16-03-2004 This patch is against 2.6.4, small changes to the way ia64 instructions are handled in the hugetlb code. Patch can be found hereNEW.
25-03-2004 2.6.5-rcX patch NEW, small source changes in Kconfig and arch/ia64/mm/init.c.
01-04-2004 Gelato@UNSW maintains a KernelAutoBuild system for the Linux IA64 kernel, we have updated the system with an ability to patch the vanilla tree. If required it is also able to build and test the patch kernel on HP's SkiSimulator.
28-05-2004 This latest patch has taken some time to test because of problem in the actual vanilla tree with undefined references. Then the latest long-format-VHPT patch NEW is against vanilla 2.6.7-rc1.
29-06-2004 Changes to setup.c resulted in a patch failure, here is a patch againts 2.6.7. Currently under i2 testing, boot on simulator.
21-07-2004 PLEASE READ! After some comments from people that have been using the Long Format VHPT patches a few bugs have been found that can break the kernel. This can be reproduced by including the config options SMP and the number of processors >= 64. A new set of patches have been produced and thouroughly tested. The effected patches are between 2.6.1-rc3 and 2.6.7-rc1. All the above links have been redirected to the new patches, and are dated 2004-07-[20|21].