Large block support is now in Linux 2.5! If you want it in 2.4 linux, see the Gelato@UNSW download page
Looks like we'll have tens of terabytes in a single package by 2012. You can buy a RAID controller that looks like a single disc with 10TB now. Or build one for under $5k/TB. Could Linux work with such a device?
Size of partition is limited to 2^31 blocks for most
partitioning schemes, to 2^32 for those schemes (e.g., ultrix)
that use unsigned 32bit numbers. A block is 512 bytes ---
embedded in the interfaces to read_dev_sector() etc --- not even
as a symbol! This makes use of large blocks to get around a
32-bit block number a bit problematical.
Need to use a different type for block numbers/offsets
We're going to need new partitioning schemes for large multi-terabyte discs. None of the existing schemes will work well for multi-terabyte discs with small physical block sizes, but the EFI partitioning scheme may be OK if it's adopted widely.
The blkpg stuff uses sizes in bytes, and uses a long-long to hold them, thus limiting the size of disc to 2^63 (9 EB)--- which should be adequate :-)
Internally almost everything seems to be measured in 512-byte
or 1k units. The maximum logical blocksize is the same as the page size.
Note this may have implications for Lucy's work on
multiple-page sizes.
Linux LVM copes with up to 1EB (with 16G physical extent size). However, the generic kernel limits will apply.
Scsi-3 allows 64-bit logical block addresses. These are not yet used by Linux.
struct scsi_disk: capacity needs to be unsigned long or uint64_t (currently unsigned int) is sizes in blocks. Raw scsi uses 32-bit unsigned Nr blocks, 32bit logical sector size for SCSI2.
May wish to start using read16/write16 commands for big discs, which allows up to 9 EB.
vfs seems to use unsigned long where appropriate -- no 32-bit limitations (or if they're there they're bugs) on 64-bit machines.
Most of the interfaces are in terms of loff_t which is 64-bit on all platforms.
Because fsck would take so long, it's unlikely that a non-journalled filesystem would be used on a large partition/logical volume.
NFS version 2 uses a 32-bit field for file sizes and offsets; NFS version 3 can use 64-bit sizes and offsets ---- use NFSv3 for large file system work.
If the block size is 8k can go up to 32 T partition, 2T FILE. The standard maximum block size is 4k; it can't be bigger than PAGE_CACHE_SIZE (currently 4k on most 32-bit platforms; 16k on itanium by default (but can go to 64k for McKinley architecture)
sector_t wherever number-of-disc-{blocks,sectors},
or offset-in-disc-{sectors,blocks} are meant, and it's
possible that such numbers could be greater than the 32-bit
limit.
( Within a file system that uses 32-bit on-disc sector
offsets, there's not much point).
bmap() takes and returns sector_t
(bmap() maps from a block offset within a file to
a block offset within a file system. It has to be able to
cope with large filesystems)
Every filesystem has to export a bmap() function.