Why can't hard links reference files on other filesystems?

I’m aware that this article exists:
Why are hard links only valid within the same filesystem?
But it unfortunately didn’t click with me.

https://www.kernel.org/doc/html/latest/filesystems/ext4/directory.html
I’m reading operating system concepts by Galvin and found some great beneficial resources like linux kernel documentation.

There can be many directory entries across the filesystem that reference the same inode number–these are known as hard links, and that is why hard links cannot reference files on other filesystems.

In the very beginning the author says this. But I don’t understand the reason behind it.

Information contained in an inode:

  • Mode/permission (protection)
  • Owner ID
  • Group ID
  • Size of file
  • Number of hard links to the file
  • Time last accessed
  • Time last modified
  • Time inode last modified

https://www.grymoire.com/Unix/Inodes.html

Now since the inode contains these information, what’s the problem with letting hard links reference files on other filesystem?

What problem would occur if hard link reference on other filesystems?

About hard link:

The term "hard link" is misleading, and a better term is "directory entry".

A directory is a type of file that contains (at least) a pair considering of a file name and an inode. Every entry in a directory is a "hard link", including symbolic links. When you create a new "hard link", you’re just adding a new entry to some directory that refers to the same inode as the existing directory entry.

enter image description here

This is how I visualize what a directory concept looks like in an operating system. Each entry is a hardlink according to the above quoted text. The only problem that I can see is that multiple filesystem could have same range of inode(But I don’t think so as inode is limited in an operating system).

Also why would not it be nice to add new information about filesystem in inode itself? Would not that be really convenient?

Asked By: achhainsan

||

A "hard link" just is the circumstance that two (or more) entries in the hierarchy of your file system refer to the same underlying data structure. Your figure illustrates that quite nicely!

That’s it; that’s all there is to it. It’s like if you have a cooking book with an index at the end, and the index says "Bread: see page 3", and "Bakery: see page 3". Now there’s two names for what is on page 3.

You can have as many index entries that point to the same page as you want. What does not work is that you have an index entry for something in another book. The other book simply doesn’t exist within your current book, so referring to pages in it just can’t work, especially because different versions of the other book could number pages differently over time.

Because a single filesystem can only guarantee consistency for itself, you cannot refer to "underlying storage system details" like inodes of other filesystems without it breaking all the time. So, if you want to refer to a directory entry that’s stored on a different file system, you’ll have to do that by the path. UNIX helps you with that through the existence of symlinks.

The only problem that I can see is that multiple filesystem could have same range of inode(But I don’t think so as inode is limited in an operating system).

That’s both untrue and illogical: I can ship you my hard drive, right. How would I ensure that the file system on my hard drive has no inode numbers you already used in one of the many file systems that your computer might have?

Also why would not it be nice to add new information about filesystem in inode itself? Would not that be really convenient?

No. Think of a file system as an abstraction of "bytes on storage media": a file system in itself is an independent data structure containing data organized into files; it must not depend on any external data to be complete. Breaking that will just lead to inconsistencies, because independence means that I can change inode numbers on file system A without having to know about file system B. Now, if B depended on A, it would be broken afterwards.

Answered By: Marcus Müller

What is a hard link

You are mixing up inodes and inode-references. A hard-link is an inode-reference.

A hard-link does not exist, there is no such thing. At least not as symbolic links exist. Every file has as least one. They are just file references.

Could you have a hard-link to another file-system.

No

As @MarcksM├╝ller said, a page number refers to a page in this book.

Yes

But it would not be the same thing.

You could use a symbolic-link. Or, someone could implement a new file type that links using a UUID/inode-number, or UUID/file-path. I don’t know if this already exists, but have not seen it (I think NTFS may have it).

notes on your question

Your diagram looks correct.

However, your assertion that inodes must be unique is wrong. Imagine moving a USB connected device from one computer to another. It must work, but will probably use same inode numbers as an existing device.

Answered By: ctrl-alt-delor

The challenge with this question is that it’s based on a falsehood. It’s based on the idea that such a thing would be impossible under any circumstances. It’s easy to imagine how this might work, so it’s useless trying to explain why it’s impossible.

There are two problems you’d need to overcome. And these problems are enough to put off OS developers from trying to implement it.


The first would be how you reference which other file system a hard link points to.

In a running OS, each mounted filesystem can be allocated a unique number. This lets the OS know which mounted filesystem is responsible for which inode. But these numbers are only valid for the duration of the mount. If the OS is rebooted or the filesystem is unmounted (moved, unplugged, …) then the number can change.

Hypothetically, you could use the UUID of the filesystem, but the reliability of that would be questionable. Duplicate UUIDs happen for file systems because of cloning and migration.

While it’s not impossible, doing this would result in undue coupling of design between filesystem drivers for different file systems, and many developers would be strongly opposed to that.


The second problem is that the filesystem itself needs to know how many links exist on a file. Filesystems only delete a file when there are no links left. Filesystem checks need to ensure reference counts are correct, so would need to store external reference counts inbound. But there’s no guarantee both file systems will be always mounted together to maintain the count.

If the filesystem stores those references, holding onto files and not just deleting them because another filesystem holds the hard link, what happens when the other filesystem is erased or otherwise destroyed?

Now you’d need an administrative action to remove them without access to the external filesystem. And that would be a very dangerous tool indeed.

In general filesystem checks which audited and corrected external links would be hard to achieve and mismatches common because of mounting one FS and not the other.

Answered By: Philip Couling

In a POSIX-like file system, you have two distinct things: You have real files – some data stored on your hard drive, and an inode describing the location and size of the data. And directory entries – describing a path, and a reference to a real file.

When you create a file in the file system, you actually create a real file, and a directory entry with a reference to that file’s inode, in other words, a hard link. You can then create more additional hard links, that refer to the same inode. You can also create references to inodes by opening a file; that reference will stay as long as the file is opened. You use a directory entry to locate the inode when you open a file; if there is not just one hard link but multiple directory entries for the same real file, then you can use any of them.

Now with references to an inode, these references are counted. The real file with its inode are destroyed when the last reference goes away. The unlink function deletes a directory entry and removes the reference to its inode; the real file and the inode get deleted when the last directory entry is deleted. Unless the file is opened – the inode and data for an opened file are only deleted when the file is closed. You can even create a file without a directory entry, write to it, read from it, and it will be deleted when the file is closed.

What does this have to do with external file systems? The file system containing an inode and file MUST know about all references to the inode. If you created a file on your external hard drive, then created two additional hard links on your built-in hard drive, then unplugged the external drive, then the inode on the external drive cannot know about the references. I can create an additional hard link on my external drive. I can delete the two additional hard links. The external drive won’t know the correct reference count and go wrong.

That’s what you have soft links for. A soft link contains a description telling you how to find a directory entry for a file. That description can point to an external file system without any problems. Or it just contains an absolute path, that would be a path on the same file system as the soft link. Or it contains a relative path, relative to the directory entry of the soft link. So you have no problem if the external drive is unplugged (accessing files through a soft link may fail), or if the file got deleted (soft links don’t keep a file alive); accessing the file will just fail.

Answered By: gnasher729

I’m not trying to answer your question, but…

Every entry in a directory is a "hard link", including symbolic links.

That’s wrong.

Actually, the first part is right. Every entry in a directory is a hard link. But, a symbolic link is not an entry in a directory. A symbolic link is a file, just like any other file, except a flag in its inode tells the OS that when a program tries to access the symlink, the OS should treat the access specially.*

The author probably knew that, but they were trying to explain hard links, not symbolic links, and they got lazy. If they were less lazy, they might have said,

Every entry in a directory is a hard link. Even the entry for a symbolic link (a symbolic link actually is a special kind of file) is a "hard link" to the symbolic link file.


* The special treatment, when a program tries to access a symlink,** is;

  1. open the symlink file,
  2. read a pathname from the symlink file,
  3. close the symlink file,
  4. access the target file (i.e., the file named by the pathname that was read from the symlink file) as if that was the file that the program asked to open in the first place.

If the target file turns out to be another symlink, then the process repeats, either until a non-symlink file is found, or until the ELOOP error code is returned.


** There are a few system calls and/or options in system calls that don’t "follow" symbolic links. Those are necessary because otherwise, how could any program (e.g., ls) ever give you information about symbolic links, or create or destroy symbolic links?

Answered By: Solomon Slow
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.