How to rsync only new files
I am trying to set up rsync to synchronize my main web server to the remote server by adding newly generated file to the latter.
Here is the command that I use:
rsync -avh --update -e "ssh -i /path/to/thishost-rsync-key" remoteuser@remotehost:/foo/bar /foo/bar
But it seems that the web server actually transfers all files despite the ‘–update’ flag. I have tried different flag combinations (e.g. omitting ‘-a’ and using’-uv’ instead) but none helped. How can I modify the rsync command to send out only newly added files?
From man rsync
:
--ignore-existing skip updating files that exist on receiver
--update
does something slightly different, which is probably why you are getting unexpected results (see man rsync
):
This forces rsync to skip any files which exist on the destination and have a modified time that is newer than the source file. (If an existing destination file has a modification time equal to the source file’s, it will be updated if the sizes are different.)
From my experience with rsync, a 1TB partition copying is too large to be efficient. It takes rsync forever to process it. Instead, do it by subdirectories. That is, run rsync for each main subdirectory. It goes a lot faster if it doesn’t have to juggle tens of thousands of files.
The issue might be caused by different user/group IDs on source and target servers. My case was similar, having all files transferred instead of only the modified/new ones. The solution was to use parameters -t
(instead of -a
), and -P
(equivalent to --partial --progress
):
rsync -h -v -r -P -t <source> <target>
or shorter (thanks @Manngo);
rsync -hvrPt <source> <target>
This transfers only new files, and files already existing but modified. Parameter -a
does too much, like user and group ID sync, which in my case can not work as I have different users and groups on my source and target systems. Therefore with -a
all my source and target files were always regarded as "different".
The parameters in detail:
-h
: human readable numbers-v
: verbose-r
: recurse into directories-P
:--partial
(keep partially transferred files) +
--progress
(show progress during transfer)-t
: preserve modification times
The problem you describe is probably because you are missing the trailing /
on the source directory. As a result, rsync
will copy all the files twice: the first time to /foo/bar
and the second time to /foo/bar/bar
. Thereafter it will efficiently copy updates to /foo/bar/bar
.
The correct command should be this:
rsync -avh -e "ssh -i /path/to/thishost-rsync-key" remoteuser@remotehost:/foo/bar/ /foo/bar
Notice that rsync –ignore-existing -raz –progress /var/www/88021064/var xx@server.tld:/usr/home/xxx/public_html/var/ will not not do anything,
the way it worked for me is rsync –ignore-existing -raz –progress /var/www/88021064**/var/*** xx@server.tld:/usr/home/xxx/public_html/var/,
and notice that will not update hideen files for hideen files only execute one more time /var/www/88021064/var/.[^.]*