SSH connections running in the background don't exit if multiple connections have been started by the same shell
Apparently, if the same shell launches multiple ssh connections to the same server, they won’t return after executing the command they’re given but will hang (Stopped (tty input)
) for ever. To illustrate:
#!/bin/bash
ssh localhost sleep 2
echo "$$ DONE!"
If I run the script above more than once in the background, it never exits:
$ for i in {1..3}; do foo.sh & done
[1] 28695
[2] 28696
[3] 28697
$ ## Hit enter
[1] Stopped foo.sh
[2]- Stopped foo.sh
[3]+ Stopped foo.sh
$ ## Hit enter again
$ jobs -l
[1] 28695 Stopped (tty input) foo.sh
[2]- 28696 Stopped (tty input) foo.sh
[3]+ 28697 Stopped (tty input) foo.sh
Details
- I found this because I was ssh’ing in a Perl script to run a command. The same behavior occurs when using Perl’s
system()
call to launchssh
. - The same issue occurs when using Perl modules instead of
system()
. I triedNet::SSH::Perl
,Net:SSH2
andNet::OpenSSH
. - If I run the multiple ssh commands from different shells (open multiple terminals) they work as expected.
-
Nothing obviously useful in the ssh connection debugging info:
OpenSSH_7.5p1, OpenSSL 1.1.0f 25 May 2017 debug1: Reading configuration data /home/terdon/.ssh/config debug1: Reading configuration data /etc/ssh/ssh_config debug2: resolving "localhost" port 22 debug2: ssh_connect_direct: needpriv 0 debug1: Connecting to localhost [::1] port 22. debug1: Connection established. debug1: identity file /home/terdon/.ssh/id_rsa type 1 debug1: key_load_public: No such file or directory debug1: identity file /home/terdon/.ssh/id_rsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/terdon/.ssh/id_dsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/terdon/.ssh/id_dsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/terdon/.ssh/id_ecdsa type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/terdon/.ssh/id_ecdsa-cert type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/terdon/.ssh/id_ed25519 type -1 debug1: key_load_public: No such file or directory debug1: identity file /home/terdon/.ssh/id_ed25519-cert type -1 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_7.5 debug1: Remote protocol version 2.0, remote software version OpenSSH_7.5 debug1: match: OpenSSH_7.5 pat OpenSSH* compat 0x04000000 debug2: fd 3 setting O_NONBLOCK debug1: Authenticating to localhost:22 as 'terdon' debug3: hostkeys_foreach: reading file "/home/terdon/.ssh/known_hosts" debug3: record_hostkey: found key type ECDSA in file /home/terdon/.ssh/known_hosts:47 debug3: load_hostkeys: loaded 1 keys from localhost debug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521 debug3: send packet: type 20 debug1: SSH2_MSG_KEXINIT sent debug3: receive packet: type 20 debug1: SSH2_MSG_KEXINIT received debug2: local client KEXINIT proposal debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,ext-info-c debug2: host key algorithms: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,rsa-sha2-512,rsa-sha2-256,ssh-rsa debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,aes128-cbc,aes192-cbc,aes256-cbc debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: compression ctos: none,zlib@openssh.com,zlib debug2: compression stoc: none,zlib@openssh.com,zlib debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug2: peer server KEXINIT proposal debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1 debug2: host key algorithms: ssh-rsa,rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519 debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1 debug2: compression ctos: none,zlib@openssh.com debug2: compression stoc: none,zlib@openssh.com debug2: languages ctos: debug2: languages stoc: debug2: first_kex_follows 0 debug2: reserved 0 debug1: kex: algorithm: curve25519-sha256 debug1: kex: host key algorithm: ecdsa-sha2-nistp256 debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none debug3: send packet: type 30 debug1: expecting SSH2_MSG_KEX_ECDH_REPLY debug3: receive packet: type 31 debug1: Server host key: ecdsa-sha2-nistp256 SHA256:uxhkh+gGPiCJQPaP024WXHth382h3BTs7QdGMokB9VM debug3: hostkeys_foreach: reading file "/home/terdon/.ssh/known_hosts" debug3: record_hostkey: found key type ECDSA in file /home/terdon/.ssh/known_hosts:47 debug3: load_hostkeys: loaded 1 keys from localhost debug1: Host 'localhost' is known and matches the ECDSA host key. debug1: Found key in /home/terdon/.ssh/known_hosts:47 debug3: send packet: type 21 debug2: set_newkeys: mode 1 debug1: rekey after 134217728 blocks debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug3: receive packet: type 21 debug1: SSH2_MSG_NEWKEYS received debug2: set_newkeys: mode 0 debug1: rekey after 134217728 blocks debug2: key: /home/terdon/.ssh/id_rsa (0x555a5e4b5060) debug2: key: /home/terdon/.ssh/id_dsa ((nil)) debug2: key: /home/terdon/.ssh/id_ecdsa ((nil)) debug2: key: /home/terdon/.ssh/id_ed25519 ((nil)) debug3: send packet: type 5 debug3: receive packet: type 7 debug1: SSH2_MSG_EXT_INFO received debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521> debug3: receive packet: type 6 debug2: service_accept: ssh-userauth debug1: SSH2_MSG_SERVICE_ACCEPT received debug3: send packet: type 50 debug3: receive packet: type 51 debug1: Authentications that can continue: publickey,password debug3: start over, passed a different list publickey,password debug3: preferred publickey,keyboard-interactive,password debug3: authmethod_lookup publickey debug3: remaining preferred: keyboard-interactive,password debug3: authmethod_is_enabled publickey debug1: Next authentication method: publickey debug1: Offering RSA public key: /home/terdon/.ssh/id_rsa debug3: send_pubkey_test debug3: send packet: type 50 debug2: we sent a publickey packet, wait for reply debug3: receive packet: type 60 debug1: Server accepts key: pkalg rsa-sha2-512 blen 279 debug2: input_userauth_pk_ok: fp SHA256:OGvtyUIFJw426w/FK/RvIhsykeP8kIEAtAeZwYBIzok debug3: sign_and_send_pubkey: RSA SHA256:OGvtyUIFJw426w/FK/RvIhsykeP8kIEAtAeZwYBIzok debug3: send packet: type 50 debug3: receive packet: type 52 debug1: Authentication succeeded (publickey). Authenticated to localhost ([::1]:22). debug2: fd 6 setting O_NONBLOCK debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug3: send packet: type 90 debug1: Requesting no-more-sessions@openssh.com debug3: send packet: type 80 debug1: Entering interactive session. debug1: pledge: network debug3: receive packet: type 80 debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0 debug3: receive packet: type 91 debug2: callback start debug2: fd 3 setting TCP_NODELAY debug3: ssh_packet_set_tos: set IPV6_TCLASS 0x08 debug2: client_session2_setup: id 0 debug1: Sending command: sleep 2 debug2: channel 0: request exec confirm 1 debug3: send packet: type 98 debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 debug2: channel 0: rcvd adjust 2097152 debug3: receive packet: type 99 debug2: channel_input_status_confirm: type 99 id 0 debug2: exec request accepted on channel 0
-
This doesn’t depend on my
~/.ssh/config
setup. Renaming the file doesn’t change anything. - This happens on multiple machines. I’ve tried 4 or 5 different machines running updated Ubuntu and Arch distros.
- The command (
sleep
in the dummy example but something a good deal more complex in real life) exits successfully and does what it’s supposed to do. This doesn’t depend on the command you’re running, it’s an ssh issue. - This is the worst of them: it isn’t consistent. Every now and then, one of the instances will exit and return control to the parent script. But not always, and there is no pattern I’ve been able to discern.
- Renaming
~/.bashrc
makes no difference. Also, I’ve run this on machines running Ubuntu (default login shelldash
) and Arch (default login shellbash
, called assh
). - Interestingly, the issue only occurs if I hit any key (for example Enter, but any seems to work) after launching the loop but before the first script exits. If I leave the terminal alone, they finish as expected.
What’s going on? Is this a bug in ssh? Is there an option I need to set? How can I launch multiple instances of a script that runs a command over ssh from the same shell?
You may find help in the man page:
-n Redirects stdin from /dev/null (actually, prevents reading from stdin). This must be used when ssh is run in the
background. A common trick is to use this to run X11 programs on a remote machine. For example, ssh -n
shadows.cs.hut.fi emacs & will start an emacs on shadows.cs.hut.fi, and the X11 connection will be automatically
forwarded over an encrypted channel. The ssh program will be put in the background. (This does not work if ssh
needs to ask for a password or passphrase; see also the -f option.)
If that still doesn’t help, I’d try -T
(disable pseudo-tty allocation), just on a whim.
Apparently, if the same shell launches multiple ssh connections to the same server, they won’t return after executing the command they’re given but will hang (Stopped (tty input)) for ever.
This is common behavior of concurrent access to TTY. The whole process is already backgrounded and when it tries to write the output, it is not allowed to access TTY and receives a signal (SIGTTOU
), which is not caught by bash
process and therefore the default action is performed (Stop
).
The -n
option as explained in the other answer or redirection of the IO to some files will help you. Not sure if there is something more to describe, but if so, please clarify.
Foreground processes and terminal access control
To understand what is going on, you need to know a little about sharing terminals. What happens when two programs try to read from the same terminal at the same time? Each input byte goes randomly to one of the programs. (Not random as in the kernel uses an RNG to decide, just random as in unpredictable in practice.) The same thing happens when two programs read from a pipe, or any other file type which is a stream of bytes being moved from one place to another (socket, character device, …), rather than a byte array where any byte can be read multiple times (regular file, block device). For example, run a shell in a terminal, figure out the name of the terminal and run cat
.
$ tty
/dev/pts/18
$ cat
Then from another terminal, run cat /dev/pts/18
. Now type in the terminal, and watch as lines sometimes go to one of the cat
processes and sometimes to the other. Lines are dispatched as a whole when the terminal is in cooked mode. If you put the terminal in raw mode then each byte would be dispatched independently.
That’s messy. Surely there should be a mechanism to decide that one program gets the terminal, and the others don’t. Well, there is! It triggers in typical cases, but not in the scenario I set up above. That scenario is unusual because cat /dev/pts/18
wasn’t started from /dev/pts/18
. It’s unusual to access a terminal from a program that wasn’t started inside this terminal. In the usual case, you run a shell in a terminal, and you run programs from that shell. Then the rule is that the program in the foreground gets the terminal, and programs in the background don’t. This is known as terminal access control. The way it works is:
- Each process has a controlling terminal (or doesn’t have one, typically because it doesn’t have any open file descriptor that’s a terminal).
- When a process tries to access its controlling terminal, if the process is not in the foreground, then the kernel blocks it. (Conditions apply. Access to other terminals is not regulated.)
- The shell decides who is the foreground process. (Foreground process group, actually.) It calls the
tcsetpgrp
to let the kernel know who should be in the foreground.
This works in typical cases. Run a program in a shell, and that program gets to be the foreground process. Run a program in the background (with &
), and the program doesn’t get to be in the foreground. When the shell is displaying a prompt, the shell puts itself in the foreground. When you resume a suspended job with fg
, the job gets to be in the foreground. With bg
, it doesn’t.
If a background process tries to read from the terminal, the kernel sends it a SIGTTIN signal. The default action of the signal is to suspend the process (like SIGSTOP). The parent of the process can know about this by calling waitpid
with the WSTOPPED
flag; when a child process receives a signal that suspends it, the waitpid
call in the parent returns and lets the parent know what the signal was. This is how the shell knows to print “Stopped (tty input)”. What it’s telling you is that this job is suspended due to a SIGTTIN.
Since the process is suspended, nothing will happen to it until it’s resumed or killed (with a signal that the process doesn’t catch, because if the process has set a signal handler, it won’t run since the process is suspended). You can resume the process by sending it a SIGCONT, but that won’t achieve anything if the process is reading from the terminal, it’ll receive another SIGTTIN immediately. If you resume the process with fg
, it goes to the foreground and so the read succeeds.
Now you understand what happens when you run cat
in the background:
$ cat &
$
[1] + Stopped (tty input) cat
$
The case of SSH
Now let’s do the same thing with SSH.
$ ssh localhost sleep 999999 &
$
$
$
[1] + Stopped (tty input) ssh localhost sleep 999999
$
Pressing Enter sometimes goes to the shell (which is in the foreground), and sometimes to the SSH process (at which point it gets stopped by SIGTTIN). Why? If ssh
was reading from the terminal, it should receive SIGTTIN immediately, and if it wasn’t then why does it receive SIGTTIN?
What’s happening is that the SSH process calls the select
system call to know when input is available on any of the files it’s interested in (or if an output file is ready to receive more data). The input sources include at least the terminal and the network socket. Unlike read
, select
is not forbidden to background processes, and ssh
doesn’t receive a SIGTTIN when it calls select
. The intent of select
is to find out whether data is available, without disrupting anything. Ideally select
would not change the system state at all, but in fact this isn’t completely true. When select
tells the SSH process that input is available on the terminal file descriptor, the kernel has to commit to sending input if the process calls read
afterwards. (If it didn’t, and the process called read
, then there might be no input available at this point, so the return value from select
would have been a lie.) So if the kernel decides to route some input to the SSH process, it decides by the time the select
system call returns. Then SSH calls read
, and at that point the kernel sees that a background process tried to read from the terminal and suspends it with SIGTTIN.
Note that you don’t need to launch multiple connections to the same server. One is enough. Multiple connections merely increases the probability that the problem arises.
The solution: don’t read from the terminal
If you need the SSH session to read from the terminal, run it in the foreground.
If you don’t need the SSH session to read from the terminal, make sure that its input is not coming from the terminal. There are two ways to do this:
-
You can redirect the input:
ssh … </dev/null
-
You can instruct SSH not to forward a terminal connection with
-n
or-f
. (-n
is equivalent to</dev/null
;-f
allows SSH itself to read from the terminal, e.g. to read a password, but the command itself won’t have the terminal open.)ssh -n …
Note that the disconnection between the terminal and SSH has to happen on the client. The sleep
process running on the server will never read from the terminal, but SSH has no way to know that. If the client receives input on standard input, it must forward it to the server, which will make the data available in a buffer in case the application ever decides to read it (and if the application calls select
, it’ll be informed that data is available).