tar –exclude doesn't exclude. Why?

I have this very simple line in a bash script which executes successfully (i.e. producing the _data.tar file), except that it doesn’t exclude the sub-directories it is told exclude via the --exclude option:

/bin/tar -cf /home/_data.tar  --exclude='/data/sub1/*'  --exclude='/data/sub2/*' --exclude='/data/sub3/*'  --exclude='/data/sub4/*'  --exclude='/data/sub5/*'  /data

Instead, it produces a _data.tar file that contains everything under /data, including the files in the subdirectories I wanted to exclude.

Any idea why? and how to fix this?

Update I implemented my observations based on the link provided in the first answer below (top level dir first, no whitespace after last exclude):

/bin/tar -cf /home/_data.tar  /data  --exclude='/data/sub1/*'  --exclude='/data/sub2/*'  --exclude='/data/sub3/*'  --exclude='/data/sub4/*'  --exclude='/data/sub5/*'

But that didn’t help. All “excluded” sub-directories are present in the resulting _data.tar file.

This is puzzling. Whether this is a bug in current tar (GNU tar 1.23, on a CentOS 6.2, Linux 2.6.32) or “extreme sensitivity” of tar to whitespaces and other easy-to-miss typos, I consider this a bug. For now.

This is horrible: I tried the insight suggested below (no trailing /*) and it still doesn’t work in the production script:

/bin/tar -cf /home/_data.tar  /data  --exclude='/data/sub1'  --exclude='/data/sub2'  --exclude='/data/sub3'  --exclude='/data/sub4'

I can’t see any difference between what I tried and what @Richard Perrin tried, except for the quotes and 2 spaces instead of 1. I am going to try this (must wait for the nightly script to run as the directory to be backed up is huge) and report back.

/bin/tar -cf /home/_data.tar  /data --exclude=/data/sub1 --exclude=/data/sub2 --exclude=/data/sub3 --exclude=/data/sub4

I am beginning to think that all these tar --exclude sensitivities aren’t tar’s but something in my environment, but then what could that be?

It worked! The last variation tried (no single-quotes and single-space instead of double-space between the --excludes) tested working. Weird but accepting.

Unbelievable! It turns out that an older version of tar (1.15.1) would only exclude if the top-level dir is last on the command line. This is the exact opposite of how version 1.23 requires. FYI.

Asked By: ateiob

||

This link might be helpful.
http://answers.google.com/answers/threadview/id/739467.html

Two immediate differences between the non-working line and some tips in the link:

  1. All excludes come after the top-level directory.
  2. Cannot have ANY spaces after the last --exclude.
Answered By: BigG

If you want to exclude an entire directory, your pattern should match that directory, not files within it. Use --exclude=/data/sub1 instead of --exclude='/data/sub1/*'

Be careful with quoting the patterns to protect them from shell expansion.

See this example, with trouble in the final invocation:

$ for i in 0 1 2; do mkdir -p /tmp/data/sub$i; echo foo > /tmp/data/sub$i/foo; done
$ find /tmp/data
/tmp/data
/tmp/data/sub2
/tmp/data/sub2/foo
/tmp/data/sub0
/tmp/data/sub0/foo
/tmp/data/sub1
/tmp/data/sub1/foo
$ tar -zvcf /tmp/_data.tar /tmp/data --exclude='/tmp/data/sub[1-2]'
tar: Removing leading `/' from member names
/tmp/data/
/tmp/data/sub0/
/tmp/data/sub0/foo
$ tar -zvcf /tmp/_data.tar /tmp/data --exclude=/tmp/data/sub[1-2]
tar: Removing leading `/' from member names
/tmp/data/
/tmp/data/sub0/
/tmp/data/sub0/foo
$ echo tar -zvcf /tmp/_data.tar /tmp/data --exclude=/tmp/data/sub[1-2]
tar -zvcf /tmp/_data.tar /tmp/data --exclude=/tmp/data/sub[1-2]
$ tar -zvcf /tmp/_data.tar /tmp/data --exclude /tmp/data/sub[1-2]
tar: Removing leading `/' from member names
/tmp/data/
/tmp/data/sub2/
/tmp/data/sub2/foo
/tmp/data/sub0/
/tmp/data/sub0/foo
/tmp/data/sub2/
tar: Removing leading `/' from hard link targets
/tmp/data/sub2/foo
$ echo tar -zvcf /tmp/_data.tar /tmp/data --exclude /tmp/data/sub[1-2]
tar -zvcf /tmp/_data.tar /tmp/data --exclude /tmp/data/sub1 /tmp/data/sub2
Answered By: R Perrin

A workaround may be to use a combination of find ... -prune and tar to exclude the specified directories.

On Mac OS X the --exclude option of GNU tar seems to work as it should though.

In the following test case the directories /private/var/log/asl and /private/var/log/DiagnosticMessages are to be excluded from a compressed archive of the /private/var/log directory.

# all successfully tested in Bash shell on Mac OS X (using gnutar and gfind)

# sudo port install findutils  # for gfind from MacPorts

sudo gnutar -czf ~/Desktop/varlog.tar.gz /private/var/log --exclude "/private/var/log/asl" --exclude "/private/var/log/DiagnosticMessages"

sudo gnutar -czf ~/Desktop/varlog.tar.gz  --exclude "/private/var/log/asl" --exclude "/private/var/log/DiagnosticMessages" /private/var/log

set -f # disable file name globbing
sudo gnutar -czf ~/Desktop/varlog.tar.gz  --exclude "/private/var/log/asl" --exclude "/private/var/log/Diagnostic*" /private/var/log

# combining GNU find and tar (on Mac OS X)

sudo gfind /private/var/log -xdev -type d ( -name "asl" -o -name "DiagnosticMessages" ) -prune -o -print0 | 
   sudo gnutar --null --no-recursion -czf ~/Desktop/varlog.tar.gz --files-from -

# exclude even more dirs
sudo gfind /private/var/log -xdev -type d ( -name "asl" -o -name "[Dacfks]*" ) -prune -o -print0 | 
    sudo gnutar --null --no-recursion -czf ~/Desktop/varlog.tar.gz --files-from -


# testing the compressed archive

gnutar -C ~/Desktop -xzf ~/Desktop/varlog.tar.gz

sudo gfind /private/var/log ~/Desktop/private ( -iname DiagnosticMessages -or -iname asl )

sudo rm -rf ~/Desktop/varlog.tar.gz ~/Desktop/private
Answered By: jon

It may be that your version of tar requires that the --exclude options have to be placed at the beginning of the tar command.

See: https://stackoverflow.com/q/984204

tar --exclude='./folder' --exclude='./upload/folder2' 
    -zcvf /backup/filename.tgz .

See: http://mandrivausers.org/index.php?/topic/8585-multiple-exclude-in-tar/

tar --exclude=<first> --exclude=<second> -cjf backupfile.bz2 /home/*

Alternative:

EXCLD='first second third'
tar -X <(for i in ${EXCLD}; do echo $i; done) -cjf backupfile.bz2 /home/*

Yet another tar command tip is from here:

tar cvfz myproject.tgz --exclude='path/dir_to_exclude1' 
                       --exclude='path/dir_to_exclude2' myproject
Answered By: carlo

Perhaps you can try the command with another option:

--wildcards

And check if it’s running as intended.

Answered By: Luis

For excluding multiple files, try

--exclude=/data/{sub1,sub2,sub3,sub4}

This will save some code and headache. This is a global solution, for all kind of programs / options. If you also want to include the parent directory in your selection (in this case data), you have to include a trailing comma. E.g.:

umount /data/{sub1,sub2,}
Answered By: tolga9009

I am using a mac, and found that excludes weren’t working unless the top level folder is the last argument

example of working command:

tar czvf tar.tgz --exclude='Music' dir

FYI:

$: tar --version
bsdtar 2.8.3 - libarchive 2.8.3
Answered By: jars99

In my case, it didn’t exclude for a different reason.

The full path vs relative path.

Both the exclude and the directory must use the same path format (i.e both full path or both relative paths)

Example:

tar -cvf ctms-db-sync.tar --exclude='/home/mine/tmp/ctms-db-sync/sql' ctms-db-sync

This will not work because exclude uses full path where as the target uses a relative path

tar -cvf ctms-db-sync.tar --exclude='/home/mine/tmp/ctms-db-sync/sql' /home/mine/tmp/ctms-db-sync

This works because both use the full path

tar -cvf ctms-db-sync.tar --exclude='ctms-db-sync/sql' ctms-db-sync

This works because both use the relative path

Answered By: hbt

Additional notes to R Perrin’s excellent answer:

Suppose you do not want to archive absolute but relative paths, e.g. ‘data’ instead of ‘/tmp/data’.
To exclude absolute paths your tar arguments will differ based on the tar implementation (gnu tar vs. bsd tar) you use:

$ for i in 0 1 2; do
    for j in 0 1 2; do 
      mkdir -p /tmp/data/sub$i/sub$j
      echo foo > /tmp/data/sub$i/sub$j/foo
    done
  done

$ find /tmp/data/
/tmp/data/
/tmp/data/sub2
/tmp/data/sub2/sub2
/tmp/data/sub2/sub2/foo
/tmp/data/sub2/sub1
/tmp/data/sub2/sub1/foo
/tmp/data/sub2/sub0
/tmp/data/sub2/sub0/foo
/tmp/data/sub1
/tmp/data/sub1/sub2
/tmp/data/sub1/sub2/foo
/tmp/data/sub1/sub1
/tmp/data/sub1/sub1/foo
/tmp/data/sub1/sub0
/tmp/data/sub1/sub0/foo
/tmp/data/sub0
/tmp/data/sub0/sub2
/tmp/data/sub0/sub2/foo
/tmp/data/sub0/sub1
/tmp/data/sub0/sub1/foo
/tmp/data/sub0/sub0
/tmp/data/sub0/sub0/foo

$ cd /tmp/data; tar -zvcf /tmp/_data.tar --exclude './sub[1-2]'
./
./sub0/
./sub0/sub2/
./sub0/sub2/foo
./sub0/sub1/
./sub0/sub1/foo
./sub0/sub0/
./sub0/sub0/foo

# ATTENTION: bsdtar's behaviour differs from traditional tar (without a leading '^')!
$ cd /tmp/data; bsdtar -zvcf /tmp/_data.tar --exclude './sub[1-2]' .
a .
a ./sub0
a ./sub0/sub0
a ./sub0/sub0/foo

# FIX: Use a regex by adding a leading '^' will cause bsdtar to match only parent files and folders.
$ cd /tmp/data; bsdtar -zvcf /tmp/_data.tar --exclude '^./sub[1-2]' .
# ALTERNATIVE: bsdtar -C /tmp/data -zvcf /tmp/_data.tar --exclude '^./sub[1-2]' .
a .
a ./sub0
a ./sub0/sub2
a ./sub0/sub1
a ./sub0/sub0
a ./sub0/sub0/foo
a ./sub0/sub1/foo
a ./sub0/sub2/foo
Answered By: Jakob

I tried all sorts of combinations including a few of the answers listed and just couldn’t get it to exclude the listed files.

So being fed up of chasing the answer to what was meant to be a five minute job I did the opposite: created an archive of the folders I wanted to include.

I did this by creating an archive then adding to it:

tar -cvpf /path/to/mybackup.tar ./bin
tar rvf /path/to/mybackup.tar ./boot
tar rvf /path/to/mybackup.tar ./etc
tar rvf /path/to/mybackup.tar ./home
tar rvf /path/to/mybackup.tar ./lib
tar rvf /path/to/mybackup.tar ./sbin
tar rvf /path/to/mybackup.tar ./usr
tar rvf /path/to/mybackup.tar ./var

A few notes:

  • I used the relative instead of absolute paths (which were also giving trouble) by running from the root of the filesystem.
  • You must create a plain tar (and not zipped tar .tgz / .tar.gz) archive – you can zip it later using gzip mybackup.tar
  • Make sure you don’t put the archive in any folder you are including or you’ll get some recursion (a partial backup also included in the backup itself).
  • Note the difference in the first command (create) from the others (add).
  • You can check that files are being added rather than the backup overwritten (e.g. after the second command) if you are paranoid by using tar tvf mybackup.tar.
Answered By: SharpC

Just now detected on tar (GNU tar) 1.29

THis call does not exclude from archive files specified with –exclude-from:

/bin/tar --files-from ${datafile} --exclude-from ${excludefile} -jcf ${backupfile}

This call works coorectly:

/bin/tar --exclude-from ${excludefile} --files-from ${datafile} -jcf ${backupfile}

Order of parameters is important!

Answered By: Alexander

Success Case:

1) if giving full path to take backup, in exclude also should be used full path.

tar -zcvf /opt/ABC/BKP_27032020/backup_27032020.tar.gz 
    --exclude='/opt/ABC/csv/*' --exclude='/opt/ABC/log/*' /opt/ABC

2) if giving current path to take backup, in exclude also should be used current path only.

tar -zcvf backup_27032020.tar.gz 
    --exclude='ABC/csv/*' --exclude='ABC/log/*' ABC

Failure Case:

1) if giving currentpath directory to take backup and full path to ignore,then wont work

tar -zcvf /opt/ABC/BKP_27032020/backup_27032020.tar.gz 
    --exclude='/opt/ABC/csv/*' --exclude='/opt/ABC/log/*' ABC

Note: mentioning exclude before/after backup directory is fine.

Answered By: Sridhar Kumar N

The exclude family of parameters apply to the internal relative names of the files in the tarball. The absolute path you specify will never exist within the tarball since it only has relative paths from the provided root.

Answered By: Tigerware
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.