OpenGL fails to load due to nvidia driver

I have 2 video cards and installed the nvidia driver:

❯ lspci -nnk | grep -iA3 -E "(vga|NVIDIA).*(controller|GeForce)"
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 [8086:5916] (rev 02)
        DeviceName:  Onboard IGD
        Subsystem: Hewlett-Packard Company HD Graphics 620 [103c:82c1]
        Kernel driver in use: i915
--
01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
        Subsystem: Hewlett-Packard Company GM108M [GeForce 940MX] [103c:82c1]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

and modules loaded:

❯ lsmod | grep -iE '(iris|965|915|nouveau|nvidia)'
nvidia_drm             94208  4
nvidia_modeset       1556480  2 nvidia_drm
nvidia_uvm           3481600  2
nvidia              62734336  87 nvidia_uvm,nvidia_modeset
i915                 4108288  39
i2c_algo_bit           20480  1 i915
drm_buddy              20480  1 i915
ttm                   110592  1 i915
intel_gtt              28672  1 i915
drm_display_helper    229376  1 i915
video                  77824  2 i915,nvidia_modeset
cec                    86016  2 drm_display_helper,i915

having:

❯ sudo lshw -c video | grep 'configuration'
       configuration: depth=32 driver=i915 latency=0 resolution=3840,2160
       configuration: driver=nvidia latency=0

For some reason, OpenGL (EGL) crushes and OpenGL (GLX) provides:

❯ glxinfo | grep "OpenGL renderer"   
libGL error: glx: failed to create dri3 screen
libGL error: failed to load driver: nouveau
OpenGL renderer string: Mesa Intel(R) HD Graphics 620 (KBL GT2)

Qt5 (e.g. kwalletd5) fails:

❯ kwalletd5 
kf.wallet.kwalletd: Lacking a socket, pipe: 0 env: 0
libGL error: glx: failed to create dri3 screen
libGL error: failed to load driver: nouveau

I do not understand whi libGL is looking for nouveau when nouveau is not installed. I understand the nouveau is for legacy NVIDIA and this NVIDIA card (i.e. GM108M [GeForce 940MX] with NV118) should use nvidia driver.

eglinfo creates a crashdump, that makes be guess that the driver was not properly compiled for this kernel … ?!? (wondering …)

❯ coredumpctl info eglinfo
           PID: 3006 (eglinfo)
           UID: 1026 (alex)
           GID: 1000 (alex)
        Signal: 6 (ABRT)
     Timestamp: Sun 2023-10-08 10:50:39 EDT (1h 3min ago)
  Command Line: /usr/bin/eglinfo
    Executable: /usr/bin/eglinfo
 Control Group: /user.slice/user-1026.slice/user@1026.service/app.slice/app-org.kde.kinfocenter-1e5614d213e84a2fac7e745b95873f3b.scope
          Unit: user@1026.service
     User Unit: app-org.kde.kinfocenter-1e5614d213e84a2fac7e745b95873f3b.scope
         Slice: user-1026.slice
     Owner UID: 1026 (alex)
       Boot ID: 0e078812604c40b896a2926936fed0ed
    Machine ID: 5e088a0fd5f24ea3ba800ad0886bc587
      Hostname: azx360
       Storage: /var/lib/systemd/coredump/core.eglinfo.1026.0e078812604c40b896a2926936fed0ed.3006.1696776639000000.zst (present)
  Size on Disk: 2.0M
       Message: Process 3006 (eglinfo) of user 1026 dumped core.
                
                Stack trace of thread 3006:
                #0  0x00007f8f4878483c n/a (libc.so.6 + 0x8e83c)
                #1  0x00007f8f48734668 raise (libc.so.6 + 0x3e668)
                #2  0x00007f8f4871c4b8 abort (libc.so.6 + 0x264b8)
                #3  0x00007f8f4871d390 n/a (libc.so.6 + 0x27390)
                #4  0x00007f8f4878e7b7 n/a (libc.so.6 + 0x987b7)
                #5  0x00007f8f4878f30e n/a (libc.so.6 + 0x9930e)
                #6  0x00007f8f4878f480 n/a (libc.so.6 + 0x99480)
                #7  0x00007f8f48791a38 n/a (libc.so.6 + 0x9ba38)
                #8  0x00007f8f48793dc1 __libc_calloc (libc.so.6 + 0x9ddc1)
                #9  0x00007f8f46733bb1 n/a (libnvidia-eglcore.so.535.113.01 + 0x1533bb1)
                #10 0x00007f8f46741a91 n/a (libnvidia-eglcore.so.535.113.01 + 0x1541a91)
                #11 0x00007f8f46741b12 n/a (libnvidia-eglcore.so.535.113.01 + 0x1541b12)
                #12 0x00007f8f46741ce0 n/a (libnvidia-eglcore.so.535.113.01 + 0x1541ce0)
                #13 0x00007f8f48242f72 n/a (libEGL_nvidia.so.0 + 0x42f72)
                #14 0x00007f8f482485a4 n/a (libEGL_nvidia.so.0 + 0x485a4)
                #15 0x000055846b68f824 n/a (eglinfo + 0x8824)
                #16 0x000055846b6932f5 n/a (eglinfo + 0xc2f5)
                #17 0x000055846b68b2b6 n/a (eglinfo + 0x42b6)
                #18 0x00007f8f4871dcd0 n/a (libc.so.6 + 0x27cd0)
                #19 0x00007f8f4871dd8a __libc_start_main (libc.so.6 + 0x27d8a)
                #20 0x000055846b68b6e5 n/a (eglinfo + 0x46e5)
                ELF object binary architecture: AMD x86-64

more, looking to understand from inxi:

❯ inxi -Gx
Graphics:
  Device-1: Intel HD Graphics 620 vendor: Hewlett-Packard driver: i915
    v: kernel arch: Gen-9.5 bus-ID: 00:02.0
  Device-2: NVIDIA GM108M [GeForce 940MX] vendor: Hewlett-Packard
    driver: nvidia v: 535.113.01 arch: Maxwell bus-ID: 01:00.0
  Device-3: Suyin HP TrueVision FHD RGB-IR driver: uvcvideo type: USB
    bus-ID: 1-5:2
  Display: x11 server: X.Org v: 21.1.8 driver: X: loaded: intel,nvidia
    unloaded: modesetting dri: i965 gpu: i915 resolution: 3840x2160
  API: EGL Message: No EGL data available.
  API: OpenGL v: 4.6 vendor: intel mesa v: 23.2.1-arch1.2 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel HD Graphics 620 (KBL GT2)
  API: Vulkan v: 1.3.264 drivers: nvidia surfaces: xcb,xlib devices: 1

here are the packages installed for video driver:

❯ pacman -Q | grep -iE '(nvidia|mesa|intel|cuda|vulkan|vdpau)'  
intel-gmmlib 22.3.11-1
intel-gpu-tools 1.27-2
intel-media-driver 23.3.3-1
intel-media-sdk 23.2.2-2
libvdpau 1.5-2
mesa 1:23.2.1-2
mesa-utils 9.0.0-3
nvidia 535.113.01-4
nvidia-settings 535.113.01-1
nvidia-utils 535.113.01-2
vulkan-headers 1:1.3.264-2
vulkan-icd-loader 1.3.263-1
vulkan-tools 1.3.263-1
xf86-video-intel 1:2.99.917+923+gb74b67f0-1

Any guideline is much appreciated to fix OpenGL is much appreciated!

Update: after some troubleshooting, I identified that the problem was not directly related to nvidia, although was triggered when installing Nvidia driver

I identified that pinentry, used by gpg-agent had a problem getting the X or plasmashell device, probably to pop-up the dialog for the passphrase. A log about the issue:

[USER@MACHINE ~]$ Unsupported return type 65 QPixmap in method "grab"
Unsupported return type 65 QPixmap in method "grab"
Unsupported return type 65 QPixmap in method "grab"
Unsupported return type 65 QPixmap in method "grab"

[USER@MACHINE ~]$ journalctl -xe
Oct 13 17:00:17 MACHINE systemd-timesyncd[279]: Contacted time server [REDACTED]:123 ([REDACTED].arch.pool.ntp.org).
Oct 13 17:04:55 MACHINE plasmashell[976]: Could not find the Plasmoid for Plasma::FrameSvgItem(0x562417c320e0) QQmlContext(0x562413f2ad10) QUrl("file:///usr/share/pla>
Oct 13 17:04:55 MACHINE plasmashell[976]: Could not find the Plasmoid for Plasma::FrameSvgItem(0x562417c320e0) QQmlContext(0x562413f2ad10) QUrl("file:///usr/share/pla>
Oct 13 17:09:04 MACHINE systemd-timesyncd[279]: Timed out waiting for reply from [REDACTED]:123 ([REDACTED].arch.pool.ntp.org).
Oct 13 17:09:07 MACHINE plasmashell[976]: trying to show an empty dialog
Oct 13 17:09:07 MACHINE plasmashell[976]: file:///usr/share/plasma/plasmoids/org.kde.plasma.taskmanager/contents/ui/Task.qml:286: Unable to assign [undefined] to QStr>
Oct 13 17:09:07 MACHINE plasmashell[976]: file:///usr/share/plasma/plasmoids/org.kde.plasma.taskmanager/contents/ui/Task.qml:286: Unable to assign [undefined] to QStr>
Oct 13 17:09:07 MACHINE systemd[828]: Started System Settings - System Settings.
-- Subject: A start job for unit UNIT has finished successfully
-- Defined-By: systemd
-- Support: [REDACTED]
-- 
-- A start job for unit UNIT has finished successfully.
-- 
-- The job identifier is 590.
Oct 13 17:09:08 MACHINE systemsettings[11476]: file:///usr/lib/qt/qml/org/kde/kirigami.2/ScrollablePage.qml:200:9: QML MouseArea: Binding loop detected for property ">
Oct 13 17:09:08 MACHINE systemsettings[11476]: file:///usr/lib/qt/qml/org/kde/kirigami.2/ScrollablePage.qml:200:9: QML MouseArea: Binding loop detected for property ">
Oct 13 17:09:08 MACHINE systemsettings[11476]: QQmlEngine::setContextForObject(): Object already has a QQmlContext
Oct 13 17:09:14 MACHINE systemd-timesyncd[279]: Timed out waiting for reply from [REDACTED]:123 ([REDACTED].arch.pool.ntp.org).
Oct 13 17:09:14 MACHINE systemd-timesyncd[279]: Contacted time server [REDACTED]:123 ([REDACTED].arch.pool.ntp.org).
Oct 13 17:09:20 MACHINE kwalletd5[3655]: kf.wallet.backend: Setting useNewHash to true
Oct 13 17:09:20 MACHINE kwalletd5[3655]: kf.wallet.backend: Wallet new enough, using new hash
Oct 13 17:09:20 MACHINE kwalletd5[3655]: kf.wallet.backend: Error decrypting message:  No secret key , code  17 , source  GPGME
Oct 13 17:09:24 MACHINE kwin_x11[933]: kwin_core: XCB error: 152 (BadDamage), sequence: 12493, resource id: 8467472, major code: 143 (DAMAGE), minor code: 3 (Subtract)
Oct 13 17:09:28 MACHINE kwalletd5[3655]: kf.wallet.backend: Setting useNewHash to true
Oct 13 17:09:28 MACHINE kwalletd5[3655]: kf.wallet.backend: Wallet new enough, using new hash
Oct 13 17:09:28 MACHINE kwalletd5[3655]: kf.wallet.backend: Error decrypting message:  No secret key , code  17 , source  GPGME
Oct 13 17:09:31 MACHINE kwalletd5[3655]: kf.wallet.backend: Error decrypting message:  No secret key , code  17 , source  GPGME
Oct 13 17:09:32 MACHINE kwalletd5[3655]: kf.wallet.backend: Error decrypting message:  No secret key , code  17 , source  GPGME
Oct 13 17:09:32 MACHINE kwalletd5[3655]: kf.wallet.backend: Error decrypting message:  No secret key , code  17 , source  GPGME
Oct 13 17:09:32 MACHINE kwin_x11[933]: kwin_core: XCB error: 152 (BadDamage), sequence: 15002, resource id: 8467610, major code: 143 (DAMAGE), minor code: 3 (Subtract)
Oct 13 17:09:33 MACHINE kwin_x11[933]: kwin_core: XCB error: 152 (BadDamage), sequence: 15530, resource id: 8467635, major code: 143 (DAMAGE), minor code: 3 (Subtract)
Oct 13 17:09:33 MACHINE kwalletd5[3655]: kf.wallet.backend: Error decrypting message:  No secret key , code  17 , source  GPGME
Oct 13 17:09:33 MACHINE kwin_x11[933]: kwin_core: XCB error: 152 (BadDamage), sequence: 16073, resource id: 8467650, major code: 143 (DAMAGE), minor code:

Update:

  • I upgraded HP driver from F.10 to F.42
  • Re-installed X, nvidia, gnupg, pinentry.
  • Everytime I start the X with nvidia driver, it fails. So I am falling back on Intel. The following is a diff between a working xorg.conf and the one that fails with nvidia:
$ diff xorg.conf xorg.conf.2023-10-23-a-failure.bak 
38a39
>     Driver         "nvidia"
40,43c41
<     # Driver         "nvidia"
<     ChipId          0x0
<     ChipRev         0x0
<     IRQ             0
---
>     BusID          "PCI:1:0:0"
48a47
>     Option "AllowEmptyInitialConfiguration"
53a53,63
> EndSection
> 
> Section "Device"
>     Identifier "intel"
>     Driver "modesetting"
>     BusID "PCI:0:2.0" # e.g. PCI:0:2:0
> EndSection
> 
> Section "Screen"
>     Identifier "intel"
>     Device "intel"

Update:

  • Was able to trace the problem to eglinfo -B that produces a core dump as:
$ eglinfo -B    
GBM platform:
EGL API version: 1.5
EGL vendor string: NVIDIA
EGL version string: 1.5
EGL client APIs: OpenGL_ES OpenGL
OpenGL core profile vendor: NVIDIA Corporation
OpenGL core profile renderer: NVIDIA GeForce 940MX/PCIe/SSE2
OpenGL core profile version: 4.6.0 NVIDIA 535.113.01
OpenGL core profile shading language version: 4.60 NVIDIA
OpenGL compatibility profile vendor: NVIDIA Corporation
OpenGL compatibility profile renderer: NVIDIA GeForce 940MX/PCIe/SSE2
OpenGL compatibility profile version: 4.6.0 NVIDIA 535.113.01
OpenGL compatibility profile shading language version: 4.60 NVIDIA
malloc(): invalid next size (unsorted)
coredumpctl info 
(...)
  Signal: 6 (ABRT)
(...)
  Command Line: eglinfo -B
  Executable: /usr/bin/eglinfo
(...)
  Size on Disk: 2.1M
  Message: Process 3735 (eglinfo) of user 1026 dumped core.

                #0  0x00007fc48d23083c n/a (libc.so.6 + 0x8e83c)
                #1  0x00007fc48d1e0668 raise (libc.so.6 + 0x3e668)
                #2  0x00007fc48d1c84b8 abort (libc.so.6 + 0x264b8)
                #3  0x00007fc48d1c9390 n/a (libc.so.6 + 0x27390)
                #4  0x00007fc48d23a7b7 n/a (libc.so.6 + 0x987b7)
                #5  0x00007fc48d23db04 n/a (libc.so.6 + 0x9bb04)
                #6  0x00007fc48d23fdc1 __libc_calloc (libc.so.6 + 0x9ddc1)
                #7  0x00007fc48b133bb1 n/a (libnvidia-eglcore.so.535.113.01 + 0x1533bb1)
                #8  0x00007fc48b141ccc n/a (libnvidia-eglcore.so.535.113.01 + 0x1541ccc)
                #9  0x00007fc48cc42f72 n/a (libEGL_nvidia.so.0 + 0x42f72)
                #10 0x00007fc48cc485a4 n/a (libEGL_nvidia.so.0 + 0x485a4)
                #11 0x0000555f5f076c7d n/a (eglinfo + 0x6c7d)
                #12 0x0000555f5f07c279 n/a (eglinfo + 0xc279)
                #13 0x0000555f5f0742b6 n/a (eglinfo + 0x42b6)
                #14 0x00007fc48d1c9cd0 n/a (libc.so.6 + 0x27cd0)
                #15 0x00007fc48d1c9d8a __libc_start_main (libc.so.6 + 0x27d8a)
                #16 0x0000555f5f0746e5 n/a (eglinfo + 0x46e5)
                ELF object binary architecture: AMD x86-64

Note: I’m adding all these in case other will troubleshoot similarly, and my journey helps

Asked By: azbarcea

||

Similar problems are solved by means of reinstalling Xorg or Wayland packages, and some by removing nvidia drivers, as in:

[SOLVED] OpenGL / GLX not working with nouveau

As I understood, you want to keep nvidia drivers, so as to make libgl work with one of them.

So, if the reinstall proposal does not work, I think you’d rather be testing config files, e.g., /etc/x11/xorg.conf. And, for the latter, make sure to restart your graphical environment after each test.

I was able to realize that the "error" can be ignored.

❯ glxinfo | grep "OpenGL renderer"   
libGL error: glx: failed to create dri3 screen
libGL error: failed to load driver: nouveau

GnuPG was the one broken. I cleaned up everything related to GnuPG and recreated a complete new kwallet.

In KDE System Settings -> KDE Wallet, in Automatic Wallet Selection: Select wallet to use: New

Note: and continue the dialogs and process from there on.

Answered By: azbarcea

Some terminology first: graphics drivers comprise a kernel driver and a userspace driver. Also, the tech for offloading rendering from primary GPU to the non-primary one on X11 is called PRIME (Idk if it’s relevant on XWayland-less Wayland, apparently not. Leave a comment if you know something). Typically, the primary GPU is the integrated one, not NVidia (which makes sense because you don’t want waste dGPU power for lightweight tasks), so in the code below I use DRI_PRIME=1 to explicitly offload rendering to dGPU. But it is configurable (in BIOS/UEFI and whatnot), and in your case the system tries to use dGPU first as you can see by the errors.

With that out of the way, if you add a LIBGL_DEBUG=verbose you might get a bit more introspection into what’s going on:

$ LIBGL_DEBUG=verbose DRI_PRIME=1 glxinfo | grep "OpenGL renderer"
[…]
libGL: pci id for fd 5: 10de:25a0, driver nouveau
libGL: MESA-LOADER: dlopen(/usr/lib/dri/nouveau_dri.so)
[…]
libGL error: failed to load driver: nouveau
[…]
libGL: pci id for fd 4: 8086:9bc4, driver iris
libGL: MESA-LOADER: dlopen(/usr/lib/dri/iris_dri.so)
[…]
OpenGL renderer string: Mesa Intel(R) UHD Graphics (CML GT2)

What happens is that the userspace nouveau driver (the nouveau_dri.so) fails to load because it needs the kernel driver nouveau to be loaded. But as you show in the lsmod output, you have a nvidia kernel driver loaded instead (which conflicts with nouveau). As result it falls back to the iGPU.

The nvidia_dri.so does not exist currently, so the DRI_PRIME env. variable is a no-op if you want to offload rendering to the nvidia GPU.

However, NVidia does support PRIME since 435.17 version, but you need to do that via their own utility called prime-run (for you Archlinux it’s in nvidia-prime package).

So:

$ prime-run glxinfo | grep "OpenGL renderer"
OpenGL renderer string: NVIDIA GeForce RTX 3050 Ti Laptop GPU/PCIe/SSE2
Answered By: Hi-Angel