"traversal failed: u: Bad message" when deleting an extremely large directory in Linux

I am trying to remove some extremely large directories, however no success. Here are some observations:

# cwd contains the two larger directories
$ ls -lhF
drwxrwxr-x 2 hongxu hongxu 471M Oct 16 18:52 J/
drwxr-xr-x 2 hongxu hongxu 5.8M Oct 16 17:21 u/
# Note that this is the output of `ls` of the directory themselves so they should be *huge*
# J/ seems much larger than u/ (containing more files), so take u/ as an example

$ rm -rf u/
# hang for a very long time, and finally report
rm: traversal failed: u: Bad message

$ cd u/
# can cd into u/ without problems

$ ls -lhF
# hang for a long time; cancel succeeds when I press Ctrl-C

$ rm *
# hang for a long time; cancel fails when I press Ctrl-C
# however there are no process associated with `rm` as reported by `ps aux`

These two directories mostly contain lots of small files (each of which not exceeding 10k, I suppose). Now that I have to remove these two directories to free more disk space. What should I do?

UPDATE1:
Please see the output of rm -rf u/ which tells that rm: traversal failed: u: Bad message after quite a long time (> 2 hours). Therefore, the problem seems not about efficiency.

UPDATE2:
When applying fsck, it reports as follows (seems fine):

$ sudo fsck -A -y /dev/sda2
fsck from util-linux 2.31.1
fsck.fat 4.1 (2017-01-24)
/dev/sda1: 13 files, 1884/130812 clusters

$ df /dev/sda2
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/sda2      244568380 189896000  43628648  82% /

UPDATE3:
In case it may be relevant (but probably not), these two directories (J/ and u/) contain terminfo generated by tic command; different from regular compiled terminfo files (e.g., those inside /lib/terminfo), these were generated with some fuzzing techniques so may not be “legal” terminfo files. irrelevant!

UPDATE4:
Some more observations:

$ find u/ -type f | while read f; do echo $f; rm -f $f; done
# hang for a long time, IUsed (`df -i /dev/sda2`) not decreased
$ mkdir emptyfolder && rsync -r --delete emptyfolder/ u/
# hang for a long time, IUsed (`df -i /dev/sda2`) not decreased
$ strace rm -rf u/
execve("/bin/rm", ["rm", "-rf", "u"], 0x7fffffffc550 /* 121 vars */) = 0                                                                                                                       
brk(NULL)                               = 0x555555764000                                                                                                                                       
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)                                                                                                                
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)                                                                                                                
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3                                                                                                                                   
fstat(3, {st_mode=S_IFREG|0644, st_size=125128, ...}) = 0                                                                                                                                      
mmap(NULL, 125128, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffff7fd8000                                                                                                                              
close(3)                                = 0                                                                                                                                                    
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)                                                                                                                
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3                                                                                                                    
read(3, "177ELF21133>1260342"..., 832) = 832                                                                                                     
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0                                                                                                                                     
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffff7fd6000                                                                                                      
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ffff79e4000                                                                                                     
mprotect(0x7ffff7bcb000, 2097152, PROT_NONE) = 0                                                                                                                                               
mmap(0x7ffff7dcb000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7ffff7dcb000                                                                           
mmap(0x7ffff7dd1000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ffff7dd1000                                                                                 
close(3)                                = 0                                                                                                                                                    
arch_prctl(ARCH_SET_FS, 0x7ffff7fd7540) = 0                                                                                                                                                    
mprotect(0x7ffff7dcb000, 16384, PROT_READ) = 0                                                                                                                                                 
mprotect(0x555555762000, 4096, PROT_READ) = 0                                                                                                                                                  
mprotect(0x7ffff7ffc000, 4096, PROT_READ) = 0                                                                                                                                                  
munmap(0x7ffff7fd8000, 125128)          = 0                                                                                                                                                    
brk(NULL)                               = 0x555555764000                                                                                                                                       
brk(0x555555785000)                     = 0x555555785000                                                                                                                                       
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3                                                                                                                     
fstat(3, {st_mode=S_IFREG|0644, st_size=1683056, ...}) = 0                                                                                                                                     
mmap(NULL, 1683056, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ffff7e3b000                                                                                                                             
close(3)                                = 0                                                                                                                                                    
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0                                                                                                                                      
lstat("/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0                                                                                                                                      
newfstatat(AT_FDCWD, "u", {st_mode=S_IFDIR|0755, st_size=6045696, ...}, AT_SYMLINK_NOFOLLOW) = 0                                                                                               
openat(AT_FDCWD, "u", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_DIRECTORY) = 3                                                                                                                 
fstat(3, {st_mode=S_IFDIR|0755, st_size=6045696, ...}) = 0                                                                                                                                     
fcntl(3, F_GETFL)                       = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW|O_DIRECTORY)                                                                               
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0                                                                                                                                                    
getdents(3, /* 2 entries */, 32768)     = 48                                                                                                                                                   
getdents(3, /* 1 entries */, 32768)     = 24                                                                                                                                                   
... (repeated lines)                                                                                                                                                                           
getdents(3, /* 1 entries */, 32768)     = 24                                                                                                                                                   
getdents(3strace: Process 5307 detached                                                                                                                                                        
 <detached ...>
# (manually killed)
$ ls -f1 u/
./ 
../
../
../
../
... (repeated lines)
../
$ sudo journalctl -ex
Oct 17 16:00:16 CSLRF03AU kernel: JBD2: Spotted dirty metadata buffer (dev = sda2, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error: 6971 callbacks suppressed
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm find: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm zsh: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm rm: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm find: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm rsync: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm zsh: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm zsh: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm rm: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm find: Directory index failed checksum
Oct 17 16:00:20 CSLRF03AU kernel: EXT4-fs error (device sda2): ext4_htree_next_block:948: inode #9789534: block 1020: comm find: Directory index failed checksum
# #9789534 is the inode of `u/` as reported by `ls -i`

So should be a filesystem corruption.
But rebooting does not work 🙁

Asked By: Hongxu Chen

||

You can try find /u -type f | while read f; do rm -f $f; done
This will take a while but might work. For some reason, loops in bash works well when other approaches fails.

Answered By: Eran Ben-Natan

You can’t remove huge quantities of files using rm. You can either do

find u/ -type f -print0 | xargs -r -0 rm -f

this will delete only files; to delete everything, use

find u/ -print0 | xargs -r -0 rm -rf

you can probably use the --delete option of find, if your system has it:

find u/ -type f --delete

or the funky method with rsync:

mkdir emptyfolder
rsync -r --delete emptyfolder/ u/

rsync is way faster than rm when deleting things as it will bypass some checks.

Answered By: darxmurf

Okay, I finally solved the issues. It was due to the filesystem errors that cause ls to display wrongly, and other utilities to malfunction.
I’m sorry that the question title is misleading (despite that there are indeed many files inside u/, the directory is not extremely large).

I solved the problem by using a live usb since the corrupted filesystem is /. The fix was simply applying sudo fsck -cfk /dev/sda2 where dev/sda2 is the corrupted disk.

Answered By: Hongxu Chen
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.