Why hard link doesn't corrupt if we remove the original file?

Why the hard link doesn’t corrupt if we remove the original file?
If I remove the original file then the softlink gets corrupt but hard link doesn’t so why it does’t corrupt

Asked By: Rahul Sharma

||

It is because hardlinks are essentially references to the same file, and there’s no "original" file in terms of hardlinks. They point to same data structure on the disk (the inode that contains next to all metadata of the file).

Whereas softlinks point to filename and not the data structure describing the file.

Answered By: Danila Vershinin

On Linux, what uniquely identifies a file filesystem-wide is the inode number. This is nothing more that a numeric ID, guaranteed to be unique one the entire filesystem (note: inodes can be recycled, but no two "live" files can have the same inode in the same filesystem).

A file name is nothing more than a "convenience label" attached to this inode inside a specific directory. Hardlinking a file is nothing more then adding another such convenient name, inside the same or different directory (in the first case, the new hardlink must have a different name).

You can see the inode number via ls -i. For example:

# ls -alni
total 4
 68329917 drwxr-xr-x.  2 1000 1000   37 Feb  3 15:25 .
101396179 drwx------. 25 1000 1000 4096 Feb  3 15:24 ..
 68329918 -rw-r--r--.  2 1000 1000    0 Feb  3 15:25 test.txt
 68329918 -rw-r--r--.  2 1000 1000    0 Feb  3 15:25 zzz.txt

Please note how zzz.txt and test.txt, having the same inode number, really are the same file – referenced via two different names. Removing only one of these two names does not really remove (unlink) the inode from the filesystem, leaving the other unaffected.

A soft link is a completely different thing – it is not a real link to the original inode, rather it is a special small file (with its own different inode) pointing to the original file path/name. Removing the original file leave a broken ("corrupted") pointer behind.

Answered By: shodanshok

You seem to misunderstand what a hardlink and a file is in Unix.

The basis of a Unix filesystem are files. A file is an unstructured anonymous bytestream. A file does not have a name. It only has a file serial number, basically a unique (for that filesystem) identifier. (The file serial number is sometimes called inode)

There are several different kinds of special files which are standardized by POSIX:

Operating Systems are allowed to add their own kinds of special files, for example, Solaris has doors.

Device files are used to provide an interface to interacting with devices, e.g., traditionally /dev/sda to interact with the first hard disk. FIFO special files work like shell pipes, but since they have a name, the two processes reading and writing them can be started at different times and in different security contexts. Sockets allow for interprocess communication similar to a network socket but only on the local machine.

Now we get to the two kinds of special files that are relevant to your question: directories and symlinks.

A directory special file actually works very much like a directory in real-life. For example, think about a phone directory: it lists the names of people together with their phone number. That’s exactly what a directory does in a Unix filesystem: it lists the names of files and their file serial number.

This pairing of name and file serial number is what we call a hardlink (or just link).

When you "delete a file" in Unix using the rm utility, you are not actually "deleting the file". You are removing the entry for that name from the directory, in other words, you are removing the hardlink, not the file. This is called unlinking and in fact the POSIX library function used by rm is called unlink.

So, when you do something like

touch foo

You have not created a file named foo. You have created a file without a name but with some particular file serial number and you have added a directory entry to the current directory which links the name foo to the file serial number of the file you just created.

Now, when you use the ln utility to create a second hardlink:

ln foo bar

you have created a second directory entry in the current directory which links the name bar to the same file serial number that foo links to.

It is important to realize that neither of those two links are special. They are exactly the same.

If you now unlink foo:

rm foo

All you did was remove the directory entry which links the name foo to the file serial number. You did not remove the file. Therefore, you can still access the file using the name bar since this directory entry was not touched at all.

In fact, you cannot delete files in Unix. You can only remove links. The filesystem itself will remove the file once it has no more links pointing to it, and is no longer open.

A symlink, however, is a special file which contains a path. I.e., when you do

ln -s /path/to/quux baz

You are literally writing the string /path/to/quux into the file. More precisely, you are creating a symlink special file with the content /path/to/quux and you are creating a directory entry in the current directory which links the name baz with the file serial number of the file you just created.

It doesn’t actually matter whether /path/to/quux resolved to a file serial number or not. In fact, there are programs which use this for some clever configuration. E.g., the fnord and gatling web servers use symlinks to represent HTTP redirects, so when you do:

ln -s https://www.google.com/ /var/www/search.html

Then navigating to http://mydomain/search.html will redirect you to https://www.google.com/.

So, in short:

Why the hardlink doesn’t corrupt if we remove the orginal file?

Because you didn’t remove the original file. You only removed one of multiple links. The file is no longer accessible using that specific name, but the file still exists and can still be accessed using other names.

if i remove the orginal file then the sorflink gets corrupt

Again, you are not removing the original file. You are removing the name. But the symlink points to the name and not the file. Therefore, the symlink now points to a name that can no longer be resolved to a file.


Sidenote: You may have noticed something interesting: directories provide a mapping of names to files. But directories are themselves also files. Therefore, directories automatically also provide a mapping of names to directories.

In other words: directories can be arbitrarily nested in Unix!

This may not sound very exciting today, since every filesystem in widespread use allows for nested hierarchies that are arbitrarily deep. But that was not at all the case when Unix was created almost 60 years ago. Several filesystems at the time either didn’t have directories at all or had a fixed level of nesting (e.g. 2 levels).

Making directories special files gives you a hierarchical filesystem for free without having to add any special constructs. This is a very elegant design.

Answered By: Jörg W Mittag
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.