Tuesday, December 31, 2019

SSH Timeout Across a Firewall

SSH Timeout Across a Firewall

Abstract:

Firewalls represent a "choke point" between a Data Center and The Internet. The function of the Firewall is to protect the Data Center from unauthorized & malicious access. This "choke point" also typically terminate socket connections which have been idle for extended periods of time, to reduce unnecessary connections which need to be statefully inspected. This termination of idle sockets will sometimes stop the normal functioning of administrative sessions over SSH where longer running interactive jobs (i.e. backups, software installs, manual data loads, etc.) and corrupt databases. KeepAlive functionality in SSH can be engaged to inhibit this behavior.

What is the Error?

When a firewall terminates the connection, the client connecting to the Solaris Server in the remote Data Center may exhibit the following error message:

Received disconnect from 153.74.10.10: 2: Timeout, your session not responding.

And the connection is terminated.

In an outsourced Data Center environment, a controlling TTY over SSH connection was being terminated, when idle for 10 minutes, while manually running an interactive backup script (which produced no output to the controlling TTY over an SSH during the copy of a significant quantity of data across a WAN connection.)

What is a KeepAlive?

A KeepAlive packet is normally a 0 byte packet, sent along an open SSH session, on a regular interval, to keep a firewall from assuming the connection is idle, during longer periods of non-interactivity.

Even though it is a null byte packet, that does not add any additional data to the text sent or received by the application, the additional null byte packet has headers which are seen by the firewall, and keep the firewall from terminating the session because no traffic was seen.

How to Configure KeepAlive

Under Solaris 11.3, there is a system-wide configuration file which can be updated, on the server receiving the connections. By default, KeepAlive functionality is disabled under Solaris 11.3

SUN0101/root# egrep '(KeepAlive|ClientAlive)' /etc/ssh/sshd_config
# KeepAlive specifies whether keep alive messages are sent to the client.
#KeepAlive yes
ClientAliveCountMax 0
ClientAliveInterval 600


The following adjustment will enable KeepAlives to be sent every 120 seconds, while forcing a disconnection after 240 seconds, without responses (so the firewall is always getting data, and a truly idle connection will beterminated by SSH server, instead.)

SUN0101/root# egrep '(KeepAlive|ClientAlive)' /etc/ssh/sshd_config
KeepAlive yes
ClientAliveCountMax 2
ClientAliveInterval 120


A backup should be done to the original file, in case it needs to be rolled back.
Console access to the system will be needed, to perform the roll back, if ssh is mis-configured.

How to Enable KeepAlive

Changing the configuration file will not enable changes for new sessions, nor make changes to open sessions. If you wish to enable the change for new sessions, refresh the config through the services.

SUN0101/root# svcs ssh
STATE          STIME    FMRI                    .
online         May_03   svc:/network/ssh:default


SUN0101/root# svcadm refresh svc:/network/ssh:default


Any existing ssh sessions will be timed out by the firewall, within the configured limit.
Any new ssh session will not be timed out by the firewall, with the keep alive enabled.
Any new ssh session, which goes into an abnormal state where the client does not respond, will be terminated by the SSH service, in 2 minutes.

Conclusions

Network Management of Solaris Systems in Cloud based Data Centers is still quite usable, when firewalls are deployed by Cloud Providers and clean up idle connections.  These types of environments have long been used in mission critical arenas, with secured servers residing in DMZ's and ISZ's - so a remote data center with a perimeter firewall is just "old hat".