What does the "segfault at X" kernel log message mean if X is very large?

I’ve got a device with bad RAM. Running memtest overnight shows all faulting addresses to be in the 0x7d0000000 - 0x7f0000000 range. I plan to replace the RAM, but until then, I’ve disabled a 2GB chunk around it with memmap=:

# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.5.0-25-generic root=UUID=5277c53f-b2cd-4301-8fdf-0b2119430870 ro memmap=2G$0x0000000790000000 quiet splash vt.handoff=7

Those cmdline options do seem to be acknowledged by the kernel:

[    0.000000] user-defined physical RAM map:
[    0.000000] user: [mem 0x0000000000000000-0x000000000009efff] usable
[    0.000000] user: [mem 0x000000000009f000-0x00000000000fffff] reserved
[    0.000000] user: [mem 0x0000000000100000-0x0000000019e6a017] usable
[    0.000000] user: [mem 0x0000000019e6a018-0x0000000019e7ae57] usable
[    0.000000] user: [mem 0x0000000019e7ae58-0x000000002cb82fff] usable
[    0.000000] user: [mem 0x000000002cb83000-0x000000002ed2ffff] reserved
[    0.000000] user: [mem 0x000000002ed30000-0x000000002edacfff] ACPI data
[    0.000000] user: [mem 0x000000002edad000-0x000000002f29bfff] ACPI NVS
[    0.000000] user: [mem 0x000000002f29c000-0x000000002fd0efff] reserved
[    0.000000] user: [mem 0x000000002fd0f000-0x000000002fd0ffff] usable
[    0.000000] user: [mem 0x000000002fd10000-0x000000003cffffff] reserved
[    0.000000] user: [mem 0x00000000e0000000-0x00000000efffffff] reserved
[    0.000000] user: [mem 0x00000000fe000000-0x00000000fe010fff] reserved
[    0.000000] user: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] user: [mem 0x00000000fed00000-0x00000000fed03fff] reserved
[    0.000000] user: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] user: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
[    0.000000] user: [mem 0x0000000100000000-0x000000078fffffff] usable
[    0.000000] user: [mem 0x0000000790000000-0x000000080fffffff] reserved
[    0.000000] user: [mem 0x0000000810000000-0x00000008beffffff] usable

However, I still get segfaults, ostensibly in the reserved address range:

Mar 09 20:47:40 srv0 kernel: udisksd[656]: segfault at 7fe974786218 ip 00007fe974786218 sp 00007ffcd10d1848 error 7 in libbd_swap.so.3.0.0[7fe974785000+2000] likely on CPU 7 (core 3, socket 0)

According to this page, I should interpret that as udiskd trying to write to the reserved address 0x7fe974786218 (error 7). At first glance, the 0x7f address seems to match up with what memtest found to be bad RAM, but is off by orders of magnitude, since it points to a value of 140 TB. My machine has 32 GB.

What, if not a memory address, does the segfault at X value represent?

Asked By: thariqfahry

||

You’re confusing address spaces here; this is virtual memory of the process address space of the udisks process. You reserved physical address spaces.

A segfault happens when a process tries to access a virtual memory address that is not mapped to any physical page, or that it’s not allowed to access.

Physical and virtual addresses have nothing to do with each other, keeping a table to map virtual addresses to physical addresses is why your processor has a memory management unit. So the problem here is software accessing the wrong memory address – a bug.

Of course, that bug might not be a software bug, but caused by damaged RAM that you didn’t reserve; nobody can know that! There’s no guarantee that yesterday night’s memtest still is relevant today, especially if there’s problems on more than one physical address range. Honestly, what you’re doing is quite hazardous – you know you have memory that might randomly corrupt data, you hope for the best that you caught all the offending memory and blocked it from usage. If the things you do with your computer matter, I wouldn’t do that. Since you say you’re planning to replace that memory, remove the whole RAM module now, and resume working if possible, or hurry getting replacement RAM.

Answered By: Marcus Müller