IA64 Superpages

If you are interested in this work further, please contact IanWienand

Introduction

LucyChubb did the foundation work for transparent superpages for Itanium (IA64) utilising prior work by Shimitzu and Rice University. IanWienand is now looking at the project.

Itanium

Itanium, unlike many (most) other architectures, has allowances for a wide variety of multiple page sizes within the running system. The page size is a field within the TLB and the architecture defines a large range of allowable page sizes (unlike, say, x86 which only allows 4K or 4MB pages).

This makes Itanium perfect to experiment with for superpages.

The full range of sizes supported is:

Processor

Page Sizes

Max memory bandwidth

Itanium 1

4k, 16k, 64k, 256k, 1M, 4M, 16M, 64M, 256M

2.1G/s

Itanium 2

4K, 8K, 16K, 64K, 256K, 1M, 4M, 16M, 64M, 256M, 1G, 4G

10.6G/s (333MHz system bus)

6.4G/s (200MHz system bus)

VHPT

Itanium has a virtually hashed linear page table, or VHPT, which in Linux is utilised as more or less a cache of the page tables to make refilling the TLB faster.

So, when a region of virtual memory is accessed which is not mapped by the TLB, the processor hashes that virtual address and looks up the value in the VHPT. If it is there, it refills the TLB from this value and continues. If it is not, it falls back to the usual operating system method of walking the page tables.

This is a nice trade off between having an immense TLB (not practical at a hardware level) and having to walk the page tables all the time (expensive).

VHPT formats

The VHPT can have two different formats, the long and short.

By default, Itanium divides the 64 bit virtual address space up into 8 regions (each one eigth of the total VA space).

With the short format VHPT, the entries in the VHPT are smaller (hence the short!) and do not contain enough information to fill TLB entry. Some information is taken from the region the virtual address lies in, particularly the page size. So with the short format VHPT, you can specify a page size per region, but not with any smaller granularity. This has the advantage that entries are smaller so you can fit more in the cache, and is how Linux by default uses the VHPT (since currently Linux only uses a single page size, except for huge TLB, which on IA64 is implemented in a separate region which has a higher page size).

In the long format the entry in the VHPT has an extra 8 bytes (64 bits) in which it can store more information. In this case, you can store the page size in the VHPT entry, and it will be loaded into the TLB. The disadvantage is that longer entries take up more cache.

So the first part of getting superpages to work is to have a long format VHPT. (link to LVHPT work?)

PTEs

The Linux VM is very much geared towards having each PTE entry map one single page of memory. It would be nice if this were not the case, and if indeed the page table implementation was abstracted from the rest of the kernel (link to Pauls work).

We need somewhere to store information about how big a page is within the page table entry (PTE). There are two options; either stuff the page size into the existing PTE format (there are a few spare bits) or double the size of a PTE and put the page size in there.

Currently, we took the second approach. That's because there are lots of other nifty thing you can do with the Itanium VM, such as sharing protection keys, that would require much more space than can be fit into a few bits spare in the existing PTE format. The disadvantage is that PTE entries are bigger, and take up more cache.

Superpages

If you have a long format VHPT and room to store the page size, you can transparently implement superpages fairly easily from that point.

When, say, an anonymous mmap comes in, you can check the size of it. If it is greater than a base page size (say, 64K on Itanium) then you decide to allocate a superpage.

Memory isn't allocated until the anonymous memory is actually written to, so for the first step you simply create all your PTE entries as not present with a larger page size entry.

Now, when you take a fault on a page, you check the page size entry. If it is not a base page, then you back-track to the first page in the superpage and then allocate the page size worth of contiguous memory. You then fill in all the entries in the superpage to point to the pages of memory they are actually allocated to.

However, what goes into the TLB is just one entry that maps the whole superpage. So you have the advantage that you haven't broken Linux semantics; each PTE still points to a frame of physical memory, however in the TLB the entry has a larger page size value so when you walk through that memory TLB faults will not be taken. This is sometimes also referred to as a "clustered page table" type arrangement.

Current work

This work is currently under heavy development. We have long format VHPT patches, PTE doubling patches and a superpage patch that does anonymous memory. It is not at the stage where regular patches are published, but please contact IanWienand even if just to register that you are interested in the development of the work.

Things coming in the future are

Patches

The work is under development, but you are welcome to look at it. The easiest way is via

http://www.gelato.unsw.edu.au/~ianw/superpage/patches/latest

which should contain the latest patches at any time. See the README for more details on what is what.

IA64wiki: Ia64SuperPages (last edited 2009-12-10 03:14:01 by localhost)

Gelato@UNSW is sponsored by
the University of New South Wales National ICT Australia The Gelato Federation Hewlett-Packard Company Australian Research Council
Please contact us with any questions or comments.