Find and remove large files that are open but have been deleted

How does one find large files that have been deleted but are still open in an application? How can one remove such a file, even though a process has it open?

The situation is that we are running a process that is filling up a log file at a terrific rate. I know the reason, and I can fix it. Until then, I would like to rm or empty the log file without shutting down the process.

Simply doing rm output.log removes only references to the file, but it continues to occupy space on disk until the process is terminated. Worse: after rming I now have no way to find where the file is or how big it is! Is there any way to find the file, and possibly empty it, even though it is still open in another process?

I specifically refer to Linux-based operating systems such as Debian or RHEL.

Asked By: dotancohen

||

If you can’t kill your application, you can truncate instead of deleting the log file to reclaim the space. If the file was not open in append mode (with O_APPEND), then the file will appear as big as before the next time the application writes to it (though with the leading part sparse and looking as if it contained NUL bytes), but the space will have been reclaimed (that does not apply to HFS+ file systems on Apple OS/X that don’t support sparse files though).

To truncate it:

: > /path/to/the/file.log

If it was already deleted, on Linux, you can still truncate it by doing:

: > "/proc/$pid/fd/$fd"

Where $pid is the process id of the process that has the file opened, and $fd one file descriptor it has it opened under (which you can check with lsof -p "$pid".

If you don’t know the pid, and are looking for deleted files, you can do:

lsof -nP | grep '(deleted)'

lsof -nP +L1, as mentioned by @user75021 is an even better (more reliable and more portable) option (list files that have fewer than 1 link).

Or (on Linux):

find /proc/*/fd -ls | grep  '(deleted)'

Or to find the large ones with zsh:

ls -ld /proc/*/fd/*(-.LM+1l0)

An alternative, if the application is dynamically linked is to attach a debugger to it and make it call close(fd) followed by a new open("the-file", ....).

Answered By: Stéphane Chazelas

It’s up to the file system driver to actually free the allocated space, and that will usually happen only once all file descriptors referring to that file are released. So you can’t really reclaim the space, unless you make the application close the file. Which means either terminating it or playing with it “a bit” in a debugger (e.g. closing the file and making sure it is not opened/written to again, or opening /dev/null instead). Or you could hack the kernel, but I would advise against that.

Truncating the file as Stephane suggests might help, but the real outcome will also depend on your file system (for example pre-allocated blocks will likely be freed only after you close the file in any case).

The rationale behind this behaviour is that the kernel wouldn’t know what to do with data requests (both read and write, but reading is actually more critical) targeting such a file.

Answered By: peterph

Check out the quickstart here: lsof Quickstart

I’m surprised no one mentioned the lsof quickstart file (included with lsof). Section “3.a” shows how to find open, unlinked files:

lsof -a +L1 *mountpoint*

E.g.:

[root@enterprise ~]# lsof -a +L1 /tmp
COMMAND   PID   USER   FD   TYPE DEVICE    SIZE NLINK  NODE NAME
httpd    2357 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /tmp/ibOrELhG (deleted)
mysqld   2588  mysql    6u   REG 253,17       0     0  1497 /tmp/ibmDFAW8 (deleted)
mysqld   2588  mysql    7u   REG 253,17       0     0 11387 /tmp/ib2CSACB (deleted)
mysqld   2588  mysql   11u   REG 253,17       0     0 11388 /tmp/ibQpoZ94 (deleted)
httpd    3457   root   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd    8437 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd    8438 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd    8439 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd    8440 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd    8441 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd    8442 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd    8443 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd    8444 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd   16990 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd   19595 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd   27495 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd   28142 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
httpd   31478 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)

On Red Hat systems to find the local copy of the quick-start file, I usually do this:

[root@enterprise ~]# locate -i quickstart |grep lsof
/usr/share/doc/lsof-4.78/00QUICKSTART

… or this:

[root@enterprise ~]# rpm -qd lsof
/usr/share/doc/lsof-4.78/00.README.FIRST
/usr/share/doc/lsof-4.78/00CREDITS
/usr/share/doc/lsof-4.78/00DCACHE
/usr/share/doc/lsof-4.78/00DIALECTS
/usr/share/doc/lsof-4.78/00DIST
/usr/share/doc/lsof-4.78/00FAQ
/usr/share/doc/lsof-4.78/00LSOF-L
/usr/share/doc/lsof-4.78/00MANIFEST
/usr/share/doc/lsof-4.78/00PORTING
/usr/share/doc/lsof-4.78/00QUICKSTART
/usr/share/doc/lsof-4.78/00README
/usr/share/doc/lsof-4.78/00TEST
/usr/share/doc/lsof-4.78/00XCONFIG
/usr/share/man/man8/lsof.8.gz
Answered By: user75021

Instead of deleting a file held by a process that you can’t kill now, you can empty that log file with echo:

echo -n > /var/log/myapp.log

You can check after that with df command – the "Used bytes" column will be decreased.

Answered By: Omar Aladdin