Using Large Block Devices
Contents
Once you've got yourself a large block device, what do you do with it?
You can:
Use it as a raw device
Partition it
Put a filesystem on it
Combine it with other huge block devices in a RAID (See also LBDRaid)
Raw Large Block Device
Not many things need raw large block devices, and most of the ones that do (database log files, filesystem log files, etc) don't need huge ones.
Theoretically one can export a raw large block device using NBD, but I (PeterChubb) haven't tried it.
Software Raid currently in the 2.4 and 2.5 series kernels is limited to a maximum of 2TB-1k per member.
Partitioning Huge Devices
The standard PC (MSDOS) partitioning scheme doesn't work well for huge devices, as it uses internally a 32-bit number for start-sector and lenght for each partition.
There are two partitioning schemes available in Linux that do not suffer the 32-bit limitation: The EFI GUID scheme (GPT) and the Windows LDM scheme.
If you want to partition your huge discs under Linux, you're pretty much limited to GPT, which is understood by the parted program.
Of course, you don't have to partition a disc to use it: just create a filesystem directly on the device.
Filesystems for Large Block Devices
Running fsck on a huge disc takes an incredibly long time, so it's advisable to use filesystems that don't need to be checked.
File System Limitations
FS (blocksize) |
Max file size |
max filesystem size |
notes |
ext[23] (1k) |
16G |
2T |
Needs regular fsck, slow to create/check, wastes space |
ext[23] (4k) |
2T |
16T |
(ditto) |
JFS (4k) |
9E |
4 P |
Doesn't work if PAGESIZE != 4k; VM-friendly fsck |
XFS |
9E |
9 E |
Backports to 2.4 available; VM-friendly fsck |
ReiserFS3.5 |
2G |
16T |
uses large amounts of physical memory for in-kernel bitmaps |
ReiserFS3.6 |
1E |
16T |
Not yet tested by PeterChubb |
ext3
The standard utilities don't use the BLKGETSIZE64 ioctl; to use them, you must tell them to use a large block size (either directly with -b 4096 or indirectly (with -T largefile or -T largefile4).
You'll probably also want to reduce the amount of space reserved for the superuser, and use fewer superblocks than the default.
I (PeterChubb) have tested with:
# mke2fs -Osparse_super -b 4096 -m1 -j -Tlargefile4 /dev/sdb
Even with these options, the amount of space on the disc that's unavailable is quite large.
You need a large amount of physical memory to allow mkfs to complete in reasonable time --- I tried on a machine with 64M memory, and it took 6 hours (swap swap swap swap). On a (200MHz) machine with 2G memory, it took around 30 minutes to create a 3TB filesystem.
Unless you turn it off, ext3 will arrange for fsck to be run every now and then. The manual page for tune2fs(8) says:
- It is strongly recommended that either -c (mount-count-dependent) or -i (time-dependent) checking be enabled to force periodic full e2fsck(8) checking of the filesystem. Failure to do so may lead to filesystem corruption due to bad disks, cables, memory, or kernel bugs to go unnoticed until they cause data loss or corruption.
If you don't want this behaviour, then try a different filesystem.
JFS
Documentation and the latest utilities are available at http://oss.software.ibm.com/jfs/
JFS has been used for a long time on AIX; but it's now in the 2.4 and 2.5 kernels.
Patches to use BLLKGETSIZE64 are now in the current jfsutils package; the only gotcha I know of is that JFS doesn't (yet) work if PAGESIZE != 4k (and so won't work on IA64). You can track that bug at http://www-124.ibm.com/developerworks/bugs/?func=detailbug&bug_id=3350&group_id=35
You'll need to install the jfsutils package.
XFS
Documentation, sources, patches, etc., are available at http://oss.sgi.com/projects/xfs/
For debian users, install the xfsutils package.
For me, XFS just worked. It's standard in the 2.5 kernels; you can get a patched 2.4.20 from the XFS website.
ReiserFS 3.6
Documentation, patches, etc., are available at http://www.namesys.com/
Debian users, install reiserfsprogs to get tools, etc.
