Using Large Block Devices

Once you've got yourself a large block device, what do you do with it?

You can:

  1. Use it as a raw device

  2. Partition it

  3. Put a filesystem on it

  4. Combine it with other huge block devices in a RAID (See also LBDRaid)

Raw Large Block Device

Not many things need raw large block devices, and most of the ones that do (database log files, filesystem log files, etc) don't need huge ones.

Theoretically one can export a raw large block device using NBD, but I (PeterChubb) haven't tried it.

Software Raid currently in the 2.4 and 2.5 series kernels is limited to a maximum of 2TB-1k per member.

Partitioning Huge Devices

The standard PC (MSDOS) partitioning scheme doesn't work well for huge devices, as it uses internally a 32-bit number for start-sector and lenght for each partition.

There are two partitioning schemes available in Linux that do not suffer the 32-bit limitation: The EFI GUID scheme (GPT) and the Windows LDM scheme.

If you want to partition your huge discs under Linux, you're pretty much limited to GPT, which is understood by the parted program.

Of course, you don't have to partition a disc to use it: just create a filesystem directly on the device.

Filesystems for Large Block Devices

Running fsck on a huge disc takes an incredibly long time, so it's advisable to use filesystems that don't need to be checked.

File System Limitations

FS (blocksize)

Max file size

max filesystem size

notes

ext[23] (1k)

16G

2T

Needs regular fsck, slow to create/check, wastes space

ext[23] (4k)

2T

16T

(ditto)

JFS (4k)

9E

4 P

Doesn't work if PAGESIZE != 4k; VM-friendly fsck

XFS

9E

9 E

Backports to 2.4 available; VM-friendly fsck

ReiserFS3.5

2G

16T

uses large amounts of physical memory for in-kernel bitmaps

ReiserFS3.6

1E

16T

Not yet tested by PeterChubb

ext3

The standard utilities don't use the BLKGETSIZE64 ioctl; to use them, you must tell them to use a large block size (either directly with -b 4096 or indirectly (with  -T largefile  or  -T largefile4).

You'll probably also want to reduce the amount of space reserved for the superuser, and use fewer superblocks than the default.

I (PeterChubb) have tested with:

# mke2fs -Osparse_super -b 4096 -m1 -j -Tlargefile4  /dev/sdb

Even with these options, the amount of space on the disc that's unavailable is quite large.

You need a large amount of physical memory to allow mkfs to complete in reasonable time --- I tried on a machine with 64M memory, and it took 6 hours (swap swap swap swap). On a (200MHz) machine with 2G memory, it took around 30 minutes to create a 3TB filesystem.

Unless you turn it off, ext3 will arrange for fsck to be run every now and then. The manual page for tune2fs(8) says:

If you don't want this behaviour, then try a different filesystem.

JFS

Documentation and the latest utilities are available at http://oss.software.ibm.com/jfs/

JFS has been used for a long time on AIX; but it's now in the 2.4 and 2.5 kernels.

Patches to use BLLKGETSIZE64 are now in the current jfsutils package; the only gotcha I know of is that JFS doesn't (yet) work if PAGESIZE != 4k (and so won't work on IA64). You can track that bug at http://www-124.ibm.com/developerworks/bugs/?func=detailbug&bug_id=3350&group_id=35

You'll need to install the jfsutils package.

XFS

Documentation, sources, patches, etc., are available at http://oss.sgi.com/projects/xfs/

For debian users, install the xfsutils package.

For me, XFS just worked. It's standard in the 2.5 kernels; you can get a patched 2.4.20 from the XFS website.

ReiserFS 3.6

Documentation, patches, etc., are available at http://www.namesys.com/

Debian users, install reiserfsprogs to get tools, etc.

IA64wiki: LBDFileSystems (last edited 2009-12-10 03:13:51 by localhost)

Gelato@UNSW is sponsored by
the University of New South Wales National ICT Australia The Gelato Federation Hewlett-Packard Company Australian Research Council
Please contact us with any questions or comments.