How does linux store the mapping folder -> file_name -> inode?

Just started reading a bit about the linux file system. In several places I found quotes like this one:

Unix directories are lists of association structures, each of which contains one filename and one inode number.

So I expected to find out that each directory would contain the names of the files under it, with each file mapped to an inode. But when I do vim directory_name in ubuntu, I get something like this:

" ============================================================================
" Netrw Directory Listing                                        (netrw v156)
"   /Users/user/workspace/folder
"   Sorted by      name
"   Sort sequence: [/]$,<core%(.d+)=>,.h$,.c$,.cpp$,~=*$,*,.o$,.obj$,.info$,.swp$,.bak$,~$
"   Quick Help: <F1>:help  -:go up dir  D:delete  R:rename  s:sort-by  x:special
" ==============================================================================
../
./
folder1/
folder2/
file1
file2

I expected to see an inode number next to each file name, why isn’t this the case?

Asked By: Hamster

||

That quote is about how (logically—the actual structures are often very different nowadays) Unix filesystems work. And you can see the inode numbers with, for example, the -i flag to ls:

$ ls -li
total 8
532028 -rw-r--r-- 1 anthony anthony 115 Apr 25 12:07 a
532540 -rw-r--r-- 1 anthony anthony  70 Apr 25 12:07 b

That number on the left is the inode. And if I run ln b c (creating a hardlink), then:

$ ls -li
total 12
532028 -rw-r--r-- 1 anthony anthony 115 Apr 25 12:07 a
532540 -rw-r--r-- 2 anthony anthony  70 Apr 25 12:07 b
532540 -rw-r--r-- 2 anthony anthony  70 Apr 25 12:07 c

The permissions & size are part of the inode, not the directory. Easy enough to see by what happens after chmod 0600 c:

$ ls -li
total 12
532028 -rw-r--r-- 1 anthony anthony 115 Apr 25 12:07 a
532540 -rw------- 2 anthony anthony  70 Apr 25 12:07 b
532540 -rw------- 2 anthony anthony  70 Apr 25 12:07 c

both b and c changed, because they share the same inode.

However, the kernel only exposes the filesystem to userspace over a well-defined API (except for the raw devices like /dev/sda1). It gives userspace access to a bunch of syscalls to do things like create and remove links, change permissions, read and write to files, rename, etc. It does not expose the raw, underlying filesystem data structures to userspace. That’s for a bunch of good reasons: it allows network file systems, it means the kernel can enforce permissions and keep the filesystem data structures correct, it means you can use different filesystems (with different data structures) without having to change user space.

So, basically, vim dir is just showing you a directory listing—more or less just like ls does. It’s done via a vim module called Netrw, as it says up top (try :help netrw in vim). You can’t actually edit the underlying filesystem data structures.

Answered By: derobert

I suspect you may be reading a really, really old exposition of how the Unix file system works. What you describe would have been true in the late 1970s or so, but it is no longer true on any modern file system.

On many modern platforms, there are several file systems in common use, and each of them hides its internals from user space. You can find out what they look like and play around with them, but unless you want to specialize in designing file systems, it’s perhaps better to just trust the book’s author to give you enough to have a basic understanding of the design, without getting into too much detail (some of which will be obsolete by the time you need it again anyway).

Answered By: tripleee

A directory is, semantically speaking, a mapping from file name to inode. This is how the directory tree abstraction is designed, corresponding to the interface between applications and filesystems. Applications can designate files by name and enumerate the list of files in a directory, and each file has a unique designator which is called an “inode”.

How this semantics is implemented depends on the filesystem type. It’s up to each filesystem how the directory is encoded. In most Unix filesystems, a directory is a mapping from filenames to inode numbers, and there’s a separate table mapping inode numbers to inode data. (The inode data contains file metadata such as permissions and timestamps, the location of file contents, etc.) The mapping can be a list, a hash table, a tree…

You can’t see this mapping with Vim. Vim doesn’t show the storage area that represents the directory. Linux, like many other modern Unix systems, doesn’t allow applications to see the directory representation directly. Directories act like ordinary files when it comes to their directory entry and to their metadata, but not when it comes to their content. Applications read from ordinary file with system calls such as open, read, write, close; for directories there are other system calls: opendir, readdir, closedir, and modifying a directory is done by creating, moving and deleting files. An application like cat uses open, read, close to read a file’s content; an application like ls uses opendir, readdir, closedir to read a directory’s content. Vim normally works like cat to read a file’s content, but if you ask it to open a directory, it works like ls and prints the data in a nicely-formatted way.

If you want to see what a directory looks like under the hood, you can use a tool such as debugfs for ext2/ext3/ext4. Make sure you don’t modify anything! A tool like debugfs bypasses the filesystem and can destroy it utterly. The ext2/ext3/ext4 debugfs is safe because it’s in read-only mode unless you explicitly allow writing through a command line option.

# debugfs /dev/root
debugfs 1.42.12 (29-Aug-2014)
debugfs: dump / /tmp/root.bin
debugfs: quit
# od -t x1 /tmp/root.bin

You’ll see the names of the directory entries in / amidst a bunch of other characters, some unprintable. To make sense of it, you’d need to know the details of the filesystem format.

Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.