RTL810xE PCI Express Fast Ethernet on arch creating multiple journal errors

My journal is flooded with this:

    journalctl -r
    2024-01-10T20:07:01.947911-08:00 dell kernel: pci 0000:01:00.0:    [ 0] RxErr                  (First)
    2024-01-10T20:07:01.947686-08:00 dell kernel: pci 0000:01:00.0:   device [10ec:8136] error status/mask=00000001/00006000
    2024-01-10T20:07:01.947423-08:00 dell kernel: pci 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    2024-01-10T20:07:01.946988-08:00 dell kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:01:00.0
    2024-01-10T20:07:01.694824-08:00 dell kernel: pci 0000:01:00.0:    [ 0] RxErr                  (First)
    2024-01-10T20:07:01.694573-08:00 dell kernel: pci 0000:01:00.0:   device [10ec:8136] error status/mask=00000001/00006000
    2024-01-10T20:07:01.694279-08:00 dell kernel: pci 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
    2024-01-10T20:07:01.693781-08:00 dell kernel: pcieport 0000:00:1d.0: AER: Multiple Corrected error received: 0000:01:00.0
    2024-01-10T20:07:01.601284-08:00 dell kernel: pci 0000:01:00.0:    [ 0] RxErr                  (First)
    
    

The card is RTL810xE PCI Express Fast Ethernet controller which according to https://linux-hardware.org/index.php?id=pci:10ec-8136-1028-056a wants the Realtek r8169 driver.

The https://wiki.archlinux.org/title/Network_configuration/Ethernet page has a section Realtek no link / WOL problem that suggests there is a problem with this driver in a Windows dual boot. This is not a dual boot machine. It is arch linux 6.6.10-arch1-1

I can bring the card to life with modprobe r8169 but this does not affect the journal’s error messaging for this device.

Asked By: Stephen Boston

||

Your kernel is configured with the PCI Express Root Port Advanced Error Reporting (CONFIG_PCIEAER) feature, and the chipset is detecting minor correctable PCIe link errors communicating with the RTL810xE NIC in bus location 0000:01:00.0, and automatically correcting them.

This is not a network communication error: this is a PCIe link error within the computer.

If this network interface is an add-on card, make sure the card is undamaged, firmly in the slot, and the card-edge connector surfaces are clean. Test with another network card of the same model if possible.

Otherwise, and particularly if this network interface is integrated to the motherboard, the errors might be a "known issue" of this particular chip and/or motherboard design, and you may not be able to eliminate the root cause. However, as long as the errors are of the severity=Corrected type, they should cause no problems.

If a component that has previously worked without errors suddenly starts producing multiple severity=Corrected errors, that might be an early sign of imminent hardware failure. The kernel is reporting them so that the system administrator can judge whether or not proactive maintenance might be appropriate.

In other words: if the system previously worked without these warnings, it might be good idea to have a spare network card close at hand, in case this one fails.

The severity=Corrected messages are emitted using the KERN_WARNING error level (= numerically 4). To get rid of those messages in your journal, you could adjust systemd-journald to only store messages higher than that priority:

Create a file named /etc/systemd/journald.conf.d/silence-kernel-warnings.conf with the following contents:

[Journal]
MaxLevelKMsg=err

This will not affect errors of severity=Uncorrected, which would indicate actual data corruption in the respective PCIe link. Such errors are reported with KERN_ERR error level (= numerically 3).

If you are building your own custom kernels, consider disabling the CONFIG_PCIEAER kernel configuration option.

Answered By: telcoM
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.