getdents() syscall appears to be returning different results within a container

I’m trying to read what type of file /dev/null is. If I use stat() it reports correctly that it’s a character device.

If I use getdents(), it also reports that it’s a character device – unless I run it in a container, in which case it says it’s a regular file!

Why does running it in a container give different results?

This was tested on recent versions of docker and podman giving the same results, using the ubuntu:22.04 image.

Below is reproduction code – the stat() approach always works, but getdents causes the assert to fail when run inside a container. Also worth noting that the code doesn’t always get reproduced – on some systems / containers it seems to still work fine.

(Tested on linux 6.8.2-arch2-1 and podman 5.0.0)

#include <assert.h>
#include <dirent.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/syscall.h>

#define BUF_SIZE 1024

struct linux_dirent {
    long           d_ino;
    off_t          d_off;
    unsigned short d_reclen;
    char           d_name[];
};

int main() {
    // stat approach

    struct stat st;
    stat("/dev/null", &st);

    printf("stat type: %dn", st.st_mode & S_IFMT);

    assert((st.st_mode & S_IFMT) == S_IFCHR);

    // getdirents approach

    int fd, nread;
    char buf[BUF_SIZE];
    struct linux_dirent *d;
    int bpos;
    char d_type;

    fd = open("/dev", O_RDONLY | O_DIRECTORY);

    for (;;) {
        nread = syscall(SYS_getdents, fd, buf, BUF_SIZE);

        for (bpos = 0; bpos < nread;) {
            d = (struct linux_dirent *)(buf + bpos);
            if (strcmp(d->d_name, "null") == 0) {
                d_type = *(buf + bpos + d->d_reclen - 1);
                printf("getdents type: %dn", d_type);
                assert(d_type == DT_CHR);
                exit(EXIT_SUCCESS);
            }
            bpos += d->d_reclen;
        }
    }
    close(fd);

    exit(EXIT_SUCCESS);
}
Asked By: Colourful

||

It turns out that getdirents is telling you the truth!

If we enter a rootless podman container and run mount, we see that /dev/null is actually a bind mount (the -v ... here is just so that I have access to your sample code from inside the container):

$ podman run -it --rm  -v $PWD:/src:z fedora:39
[root@00af7efc8781 /]# mount |grep /dev/null
devtmpfs on /dev/null type devtmpfs (rw,nosuid,noexec,seclabel,size=4096k,nr_inodes=8186582,mode=755,inode64)

What do we see if we unmount that bind mount? Let’s find out:

  • First, we need the container pid:

    $ podman container inspect -l | jq .[0].State.Pid
    50502
    
  • With that, we can use nsenter to enter the associated mount and pid namespaces:

    $ sudo nsenter -t 50502 -m -p
    
  • And finally we can unmount the /dev/null bind mount:

    [root@fizzgig /]# umount /dev/null
    

Now, we see:

[root@fizzgig /]# ls -l /dev/null
-rwx------. 1 21937 21937 0 Apr  2 20:03 /dev/null

Surprise, it’s a file!


Calling getdirents is reading directory entries from /dev, which means it doesn’t know about the bind mounts…so you see the d_type of the underlying entry.

Answered By: larsks
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.