Number of files per directory

I have a directory with about 100000 small files (each file is from 1-3 lines, each file is a text file). In size the directory isn’t very big (< 2GB). This data lives in a professionally administered NFS server. The server runs Linux. I think the filesystem is ext3, but I don’t know for sure. Also, I don’t have root access to the server.

These files are the output of a large scale scientific experiment, over which I don’t have control. However, I have to analyze the results.

Any I/O operation/processing in this directory is very, very slow. Opening a file (fopen in python), reading from an open file, closing a file, are all very slow. In bash ls, du, etc. don’t work.

The question is:

What is the maximum number of files in a directory in Linux in such a way that it is practical to do processing, fopen, read, etc? I understand that the answer depends on many things: fs type, kernel version, server version, hardware, etc. I just want a rule of thumb, if possible.

Asked By: carlosdc

||

As you surmise, it does depend on many things, mostly the filesystem type and options and to some extent the kernel version. In the ext2/ext3/ext4 series, there was a major improvement when the dir_index option appeared (some time after the initial release of ext3): it makes directories be stored as search trees (logarithmic time access) rather than linear lists (linear time access). This isn’t something you can see over NFS, but if you have some contact with the admins you can ask them to run tune2fs -l /dev/something |grep features (perhaps even convince them to upgrade?). Only the number of files matters, not their size.

Even with dir_index, 100000 feels large. Ideally, get the authors of the program that creates the files to add a level of subdirectories. For no performance degradation, I would recommend a limit of about 1000 files per directory for ext2 or ext3 without dir_index and 20000 with dir_index or reiserfs. If you can’t control how the files are created, move them into separate directories before doing anything else.

Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.