Why does Debian Linux allow up to 128TiB virtual address space per process but just 64TiB physical memory?
I just read here:
- up to 128TiB virtual address space per process (instead of 2GiB)
- 64TiB physical memory support instead of 4GiB (or 64GiB with the PAE
Why is that? I mean, the physical memory support is being limited by the kernel or by the current hardware?
Why would you need twice the virtual memory space than the physical memory you can actually address?
I don’t know why, but I can think of seven reasons why it’d be useful to support twice as much address space as physical memory.
- The first is so that you can run applications that need the extra memory — even if it means swapping to disk.
- Cleaner memory layouts to partition memory usage. E.g., an OS might take higher-numbered addresses and leave lower-numbered addresses for applications to make separation cleaner.
- Address space layout randomization is a bit more effective.
- Marking pages as executable may mean leftover memory.
- Memory-mapped I/O.
- Memory allocation is easier: one can allocate bigger chunks at a time.
- Reduced memory fragmentation
Those are hardware limitations. Current x86_64/amd64 hardware allows 48-bit virtual addresses and various size (depends on the implementation—e.g, my workstation here only supports 36 bits) physical addresses. The Linux kernel splits virtual address space in half (using half for the kernel, half for userspace—just like it does on x86).
So you get:
2⁴⁸ bytes ÷ 2 = 2⁴⁷ bytes = 128 TiB
Physical address size is often smaller because it’s actually physical. It takes up pins/pads, transistors, connections, etc., on/in the CPU and trace lines on the board. Probably also the same in the chipsets. It makes no sense to support an amount of ram that is inconceivable over the processor core’s or socket’s design lifespan—all those things cost money. (Even with 32 DIMM slots and 64GiB DIMMs in each, you’re still only at 2TiB. Even if DIMM capacity doubles yearly, we’re 5 years away from 64TiB.
As Peter Cordes points out, people are now attaching non-volatile storage such as 3D XPoint to the memory bus, which makes running out of address space conceivable. Newer processors have extended the physical address space to 48 bits; it’s possible the Debian wiki just hasn’t been updated.
Those limits don’t come from Debian or from Linux, they come from the hardware. Different architectures (processor and memory bus) have different limitations.
On current x86-64 PC processors, the MMU allows 48 bits of virtual address space. That means that the address space is limited to 256TB. With one bit to distinguish kernel addresses from userland addresses, that leaves 128TB for a process’s address space.
On current x86-64 processors, physical addresses can use up to 48 bits, which means you can have up to 256TB. The limit has progressively risen since the amd64 architecture was introduced (from 40 bits if I recall correctly). Each bit of address space costs some wiring and decoding logic (which makes the processor more expensive, slower and hotter), so hardware manufacturers have an incentive to keep the size down.
Linux only allows physical addresses to go up to 2^46 (so you can only have up to 64TB) because it allows the physical memory to be entirely mapped in kernel space. Remember that there are 48 bits of address space; one bit for kernel/user leaves 47 bits for the kernel address space. Half of that at most addresses physical memory directly, and the other half allows the kernel to map whatever it needs. (Linux can cope with physical memory that can’t be mapped in full at the same time, but that introduces additional complexity, so it’s only done on platforms where it’s require, such as x86-32 with PAE and armv7 with LPAE.)
It’s useful for virtual memory to be larger than physical memory for several reasons:
- It lets the kernel map the whole physical memory, and have space left for additional virtual mappings.
- In addition to mappings of physical memory, there are mappings of swap, of files and of device drivers.
- It’s useful to have unmapped memory in places: guard pages to catch buffer overflows, large unmapped zones due to ASLR, etc.