To what extent does Linux support file names longer than 255 bytes?

I asked about Linux’s 255-byte file name limitation yesterday, and the answer was that it is a limitation that cannot/will not be easily changed. But I remembered that most Linux supports NTFS, whose maximum file name length is 255 UTF-16 characters.

So, I created an NTFS partition, and try to name a file to a 160-character Japanese string, whose bytes in UTF-8 is 480. I expected that it would not work but it worked, as below. How come does it work, when the file name was 480 bytes? Is the 255-byte limitation only for certain file systems and Linux itself can handle file names longer than 255 bytes?

enter image description here

—-PS—–

The string is the beginning part of a famous old Japanese essay titled "方丈記". Here is the string.

ゆく河の流れは絶えずして、しかももとの水にあらず。よどみに浮かぶうたかたは、かつ消えかつ結びて、久しくとどまりたるためしなし。世の中にある人とすみかと、またかくのごとし。たましきの都のうちに、棟を並べ、甍を争へる、高き、卑しき、人の住まひは、世々を経て尽きせぬものなれど、これをまことかと尋ぬれば、昔ありし家はまれなり。

I had used this web application to count the UTF-8 bytes.

enter image description here

Asked By: Damn Vegetables

||

The limit for the length of a filename is indeed coded inside the filesystem, e.g. ext4, from https://en.wikipedia.org/wiki/Ext4 :

Max. filename length 255 bytes

From https://en.wikipedia.org/wiki/XFS :

Max. filename length 255 bytes

From https://en.wikipedia.org/wiki/Btrfs :

Max. filename length 255 ASCII characters (fewer for multibyte character encodings such as Unicode)

From https://en.wikipedia.org/wiki/NTFS :

Max. filename length 255 UTF-16 code units

An overview over these limits for a number of file systems can be found at https://en.wikipedia.org/wiki/Comparison_of_file_systems#Limits . There you can also see that ReiserFS has a higher limit (almost 4K) but the kernel itself (inside VFS, the kernel virtual filesystem) has the limit of 255 bytes.

Your text uses 160 UTF-16 characters as used in NTFS:

echo ゆく河の流れは絶えずして、しかももとの水にあらず。よどみに浮かぶうたかたは、かつ消えかつ結びて、久しくとどまりたるためしなし。世の中にある人とすみかと、またかくのごとし。たましきの都のうちに、棟を並べ、甍を争へる、高き、卑しき、人の住まひは、世々を経て尽きせぬものなれど、これをまことかと尋ぬれば、昔ありし家はまれなり。 > jp.txt
iconv -f utf-8 -t utf-16 jp.txt > jp16.txt
ls -ld jp*.txt
cat jp16.txt | hexdump -C

This shows 0x140 = 320 bytes (plus 2 bytes prepended byte order mark (BOM) if used). In other words, 160 UTF-16 characters and therefore below the 255 UTF-16 character limit in NTFS but more than 255 bytes.

(ignoring the newline character here)

Answered By: Ned64

So, here’s what I’ve found out.

Coreutils don’t particularly care about filename length and simply work with user input regardless of its length, i.e. there are zero checks.

I.e. this works (filename length in bytes 462!):

name="和総坂裁精座回資国定裁出観産大掲記労。基利婚岡第員連聞余枚転屋内分。妹販得野取戦名力共重懲好海。要中心和権瓦教雪外間代円題気変知。貴金長情質思毎標豊装欺期権自馬。訓発宮汚祈子報議広組歴職囲世階沙飲。賞携映麻署来掲給見囲優治落取池塚賀残除捜。三売師定短部北自景訴層海全子相表。著漫寺対表前始稿殺法際込五新店広。"
cd /mnt/ntfs
touch "$name"

Even this works

echo 123 > "$name"
cat "$name"
123

However once you try to copy the said file to any of your classic Linux filesystems, the operation will fail:

cp "$name" /tmp
cp: cannot stat '/tmp/和総坂裁精座回資国定裁出観産大掲記労。基利婚岡第員連聞余枚転屋内分。妹販得野取戦名力共重懲好海。要中心和権瓦教雪外間代円題気変知。貴金長情質思毎標豊装欺期権自馬。訓発宮汚祈子報議広組歴職囲世階沙飲。賞携映麻署来掲給見囲優治落取池塚賀残除捜。三売師定短部北自景訴層海全子相表。著漫寺対表前始稿殺法際込五新店広。': File name too long

I.e. cp has actually attempted to create this file in /tmp but /tmp doesn’t allow filenames longer than 255 bytes.

Also I’ve managed to open this file in mousepad (a GTK application), edit and save it – it all worked which means 255 bytes restriction applies only to certain Linux filesystems.

This doesn’t mean everything will work. For instance my favorite console file manager, Midnight Commander, a clone of Norton Commander – cannot list (shows file size as 0), open, or do anything with this file:

Error
No such file or directory (2)
Answered By: Artem S. Tashkinov

The answer, as often, is “it depends”.

Looking at the NTFS implementation in particular, it reports a maximum file name length of 255 to statvfs callers, so callers which interpret that as a 255-byte limit might pre-emptively avoid file names which would be valid on NTFS. However, most programs don’t check this (or even NAME_MAX) ahead of time, and rely on ENAMETOOLONG errors to catch errors. In most cases, the important limit is PATH_MAX, not NAME_MAX; that’s what’s typically used to allocate buffers when manipulating file names (for programs that don’t allocate path buffers dynamically, as expected by OSes like the Hurd which doesn’t have arbitrary limits).

The NTFS implementation itself doesn’t check file name lengths in bytes, but always as 2-byte characters; file names which can’t be represented in an array of 255 2-byte elements will cause a ENAMETOOLONG error.

Note that NTFS is generally handled by a FUSE driver on Linux. The kernel driver currently only supports UCS-2 characters, but the FUSE driver supports UTF-16 surrogate pairs (with the corresponding reduction in character length).

Answered By: Stephen Kitt

But I remembered that most Linux supports NTFS, whose maximum file name length is 255 UTF-16 characters.

Are we talking filename length or pathname length?

The maximum length for NTFS pathnames has always been 64K bytes (=32K UTF-16 codepoints).

The Win32 API imposed stricter limits because (editorial comment) idiot programmers liked to declare char filename[MAX_PATH], but there were syntactic kludges around that.

Answered By: user442032

TL;DR:

There was/is some limit, for example readdir_r() can’t read file names longer than 255 bytes. However Linux does aware of that and modern APIs can read long file names without problem


There’s this line in ReiserFS wiki

Max. filename length: 4032 bytes, limited to 255 by Linux VFS

so there may be some real limits in VFS although I don’t know enough about Linux VFS to tell. The VFS functions all work on struct dentry which stores names in the struct qstr d_name;

extern int vfs_create(struct inode *, struct dentry *, umode_t, bool);
extern int vfs_mkdir(struct inode *, struct dentry *, umode_t);
extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
extern int vfs_symlink(struct inode *, struct dentry *, const char *);
extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
extern int vfs_rmdir(struct inode *, struct dentry *);
extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int);
extern int vfs_whiteout(struct inode *, struct dentry *);

The struct qstr stores hash, length and pointer to the name so I don’t think there are any physical limits unless the VFS functions explicitly truncate the name on creating/opening. I didn’t check the implementation but I think long names should work fine

Update:

The length check is done in linux/fs/libfs.c and ENAMETOOLONG will be returned if the name is too long

/*
 * Lookup the data. This is trivial - if the dentry didn't already
 * exist, we know it is negative.  Set d_op to delete negative dentries.
 */
struct dentry *simple_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
{
    if (dentry->d_name.len > NAME_MAX)
        return ERR_PTR(-ENAMETOOLONG);
    if (!dentry->d_sb->s_d_op)
        d_set_d_op(dentry, &simple_dentry_operations);
    d_add(dentry, NULL);
    return NULL;
}

The limit is defined in linux/limits.h

#define NAME_MAX         255    /* # chars in a file name */

But I have no idea how long file names can be opened without that error


However there are a few system calls that do have limits. struct dirent has the following members

struct dirent {
   ino_t          d_ino;       /* Inode number */
   off_t          d_off;       /* Not an offset; see below */
   unsigned short d_reclen;    /* Length of this record */
   unsigned char  d_type;      /* Type of file; not supported
                                  by all filesystem types */
   char           d_name[256]; /* Null-terminated filename */
};

Since d_name is a fixed array, many functions like readdir_r() won’t ever be able to return names longer than 255 bytes. For example

struct dirent entry;
struct dirent *result;
dir = opendir("/");
int return_code = readdir_r(dir, &entry, &result);

That’s why readdir_r() was deprecated

On some systems, readdir_r() can’t read directory entries with very long names. When the glibc implementation encounters such a name, readdir_r() fails with the error ENAMETOOLONG after the final directory entry has been read. On some other systems, readdir_r() may return a success status, but the returned d_name field may not be null terminated or may be truncated.

readdir_r(3) — Linux manual page

readdir() OTOH allocates memory for struct dirent itself, so the name can actually be longer than 255 bytes and you must not use sizeof(d_name) and sizeof(struct dirent) to get the name and struct lengths

Note that while the call

fpathconf(fd, _PC_NAME_MAX)

returns the value 255 for most filesystems, on some filesystems (e.g., CIFS, Windows SMB servers), the null-terminated filename that is (correctly) returned in d_name can actually exceed this size. In such cases, the d_reclen field will contain a value that exceeds the size of the glibc dirent structure shown above.

readdir(3) — Linux manual page

Some other functions like getdents() use struct linux_dirent and struct linux_dirent64 which doesn’t suffer from the fixed length issue

struct linux_dirent {
   unsigned long  d_ino;     /* Inode number */
   unsigned long  d_off;     /* Offset to next linux_dirent */
   unsigned short d_reclen;  /* Length of this linux_dirent */
   char           d_name[];  /* Filename (null-terminated) */
                     /* length is actually (d_reclen - 2 -
                        offsetof(struct linux_dirent, d_name)) */
   /*
   char           pad;       // Zero padding byte
   char           d_type;    // File type (only since Linux
                             // 2.6.4); offset is (d_reclen - 1)
   */
}

struct linux_dirent64 {
   ino64_t        d_ino;    /* 64-bit inode number */
   off64_t        d_off;    /* 64-bit offset to next structure */
   unsigned short d_reclen; /* Size of this dirent */
   unsigned char  d_type;   /* File type */
   char           d_name[]; /* Filename (null-terminated) */
};

strace ls shows that ls uses getdents() to list files so it can handle file names with arbitrary length

Answered By: phuclv
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.