Why is this binary file transferred over "ssh -t" being changed?

I am trying to copy files over SSH, but cannot use scp due to not knowing the exact filename that I need. Although small binary files and text files transfer fine, large binary files get altered. Here is the file on the server:

remote$ ls -la
-rw-rw-r--  1 user user 244970907 Aug 24 11:11 foo.gz
remote$ md5sum foo.gz 
9b5a44dad9d129bab52cbc6d806e7fda foo.gz

Here is the file after I’ve moved it over:

local$ time ssh me@server.com -t 'cat /path/to/foo.gz' > latest.gz

real    1m52.098s
user    0m2.608s
sys     0m4.370s
local$ md5sum latest.gz
76fae9d6a4711bad1560092b539d034b  latest.gz

local$ ls -la
-rw-rw-r--  1 dotancohen dotancohen 245849912 Aug 24 18:26 latest.gz

Note that the downloaded file is bigger than the one on the server! However, if I do the same with a very small file, then everything works as expected:

remote$ echo "Hello" | gzip -c > hello.txt.gz
remote$ md5sum hello.txt.gz
08bf5080733d46a47d339520176b9211  hello.txt.gz

local$ time ssh me@server.com -t 'cat /path/to/hello.txt.gz' > hi.txt.gz

real 0m3.041s
user 0m0.013s
sys 0m0.005s

local$ md5sum hi.txt.gz
08bf5080733d46a47d339520176b9211  hi.txt.gz

Both file sizes are 26 bytes in this case.

Why might small files transfer fine, but large files get some bytes added to them?

Asked By: dotancohen


When using that method to copy the file the files appear to be different.

Remote server

ls -l | grep vim_cfg
-rw-rw-r--.  1 slm slm 9783257 Aug  5 16:51 vim_cfg.tgz

Local server

Running your ssh ... cat command:

$ ssh dufresne -t 'cat ~/vim_cfg.tgz' > vim_cfg.tgz

Results in this file on the local server:

$ ls -l | grep vim_cfg.tgz 
-rw-rw-r--. 1 saml saml 9820481 Aug 24 12:13 vim_cfg.tgz

Investigating why?

Investigating the resulting file on the local side shows that it’s been corrupted. If you take the -t switch out of your ssh command then it works as expected.

$ ssh dufresne 'cat ~/vim_cfg.tgz' > vim_cfg.tgz

$ ls -l | grep vim_cfg.tgz
-rw-rw-r--. 1 saml saml 9783257 Aug 24 12:17 vim_cfg.tgz

Checksums now work too:

# remote server
$ ssh dufresne "md5sum ~/vim_cfg.tgz"
9e70b036836dfdf2871e76b3636a72c6  /home/slm/vim_cfg.tgz

# local server
$ md5sum vim_cfg.tgz 
9e70b036836dfdf2871e76b3636a72c6  vim_cfg.tgz
Answered By: slm


Don’t use -t. -t involves a pseudo-terminal on the remote host and should only be used to run visual applications from a terminal.


The linefeed character (also known as newline or n) is the one that when sent to a terminal tells the terminal to move its cursor down.

Yet, when you run seq 3 in a terminal, that is where seq writes 1n2n3n to something like /dev/pts/0, you don’t see:




Why is that?

Actually, when seq 3 (or ssh host seq 3 for that matters) writes 1n2n3n, the terminal sees 1rn2rn3rn. That is, the line-feeds have been translated to carriage-return (upon which terminals move their cursor back to the left of the screen) and line-feed.

That is done by the terminal device driver. More exactly, by the line-discipline of the terminal (or pseudo-terminal) device, a software module that resides in the kernel.

You can control the behaviour of that line discipline with the stty command. The translation of LF -> CRLF is turned on with

stty onlcr

(which is generally enabled by default). You can turn it off with:

stty -onlcr

Or you can turn all output processing off with:

stty -opost

If you do that and run seq 3, you’ll then see:

$ stty -onlcr; seq 3

as expected.

Now, when you do:

seq 3 > some-file

seq is no longer writing to a terminal device, it’s writing into a regular file, there’s no translation being done. So some-file does contain 1n2n3n. The translation is only done when writing to a terminal device. And it’s only done for display.

similarly, when you do:

ssh host seq 3

ssh is writing 1n2n3n regardless of what ssh‘s output goes to.

What actually happens is that the seq 3 command is run on host with its stdout redirected to a pipe. The ssh server on host reads the other end of the pipe and sends it over the encrypted channel to your ssh client and the ssh client writes it onto its stdout, in your case a pseudo-terminal device, where LFs are translated to CRLF for display.

Many interactive applications behave differently when their stdout is not a terminal. For instance, if you run:

ssh host vi

vi doesn’t like it, it doesn’t like its output going to a pipe. It thinks it’s not talking to a device that is able to understand cursor positioning escape sequences for instance.

So ssh has the -t option for that. With that option, the ssh server on host creates a pseudo-terminal device and makes that the stdout (and stdin, and stderr) of vi. What vi writes on that terminal device goes through that remote pseudo-terminal line discipline and is read by the ssh server and sent over the encrypted channel to the ssh client. It’s the same as before except that instead of using a pipe, the ssh server uses a pseudo-terminal.

The other difference is that on the client side, the ssh client sets the terminal in raw mode (and disables local echo). That means that no translation is done there (opost is disabled and also other input-side behaviours). For instance, when you type Ctrl-C, instead of interrupting ssh, that ^C character is sent to the remote side, where the line discipline of the remote pseudo-terminal sends the interrupt to the remote command.

When you do:

ssh -t host seq 3

seq 3 writes 1n2n3n to its stdout, which is a pseudo-terminal device. Because of onlcr, that gets translated on host to 1rn2rn3rn and sent to you over the encrypted channel. On your side there is no translation (onlcr disabled), so 1rn2rn3rn is displayed untouched (because of the raw mode) and correctly on the screen of your terminal emulator.

Now, if you do:

ssh -t host seq 3 > some-file

There’s no difference from above. ssh will write the same thing: 1rn2rn3rn, but this time into some-file.

So basically all the LF in the output of seq have been translated to CRLF into some-file.

It’s the same if you do:

ssh -t host cat remote-file > local-file

All the LF characters (0x0a bytes) are being translated into CRLF (0x0d 0x0a).

That’s probably the reason for the corruption in your file. In the case of the second smaller file, it just so happens that the file doesn’t contain 0x0a bytes, so there is no corruption.

Note that you could get different types of corruption with different tty settings. Another potential type of corruption associated with -t is if your startup files on host (~/.bashrc, ~/.ssh/rc…) write things to their stderr, because with -t the stdout and stderr of the remote shell end up being merged into ssh‘s stdout (they both go to the pseudo-terminal device).

You don’t want the remote cat to output to a terminal device there.

You want:

ssh host cat remote-file > local-file

You could do:

ssh -t host 'stty -opost; cat remote-file' > local-file

That would work (except in the writing to stderr corruption case discussed above), but even that would be sub-optimal as you’d have that unnecessary pseudo-terminal layer running on host.

Some more fun:

$ ssh localhost echo | od -tx1
0000000 0a


$ ssh -t localhost echo | od -tx1
0000000 0d 0a

LF translated to CRLF

$ ssh -t localhost 'stty -opost; echo' | od -tx1
0000000 0a

OK again.

$ ssh -t localhost 'stty olcuc; echo x'

That’s another form of output post-processing that can be done by the terminal line discipline.

$ echo x | ssh -t localhost 'stty -opost; echo' | od -tx1
Pseudo-terminal will not be allocated because stdin is not a terminal.
stty: standard input: Inappropriate ioctl for device
0000000 0a

ssh refuses to tell the server to use a pseudo-terminal when its own input is not a terminal. You can force it with -tt though:

$ echo x | ssh -tt localhost 'stty -opost; echo' | od -tx1
0000000   x  r  n  n

The line discipline does a lot more on the input side.

Here, echo doesn’t read its input nor was asked to output that xrnn so where does that come from? That’s the local echo of the remote pseudo-terminal (stty echo). The ssh server is feeding the xn it read from the client to the master side of the remote pseudo-terminal. And the line discipline of that echoes it back (before stty opost is run which is why we see a CRLF and not LF). That’s independent from whether the remote application reads anything from stdin or not.

$ (sleep 1; printf '3') | ssh -tt localhost 'trap "echo ouch" INT; sleep 2'

The 0x3 character is echoed back as ^C (^ and C) because of stty echoctl and the shell and sleep receive a SIGINT because stty isig.

So while:

ssh -t host cat remote-file > local-file

is bad enough, but

ssh -tt host 'cat > remote-file' < local-file

to transfer files the other way across is a lot worse. You’ll get some CR -> LF translation, but also problems with all the special characters (^C, ^Z, ^D, ^?, ^S…) and also the remote cat will not see eof when the end of local-file is reached, only when ^D is sent after a r, n or another ^D like when doing cat > file in your terminal.

Answered By: Stéphane Chazelas
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.