How To Expose Multiple Containers On The Same Port
How To Expose Multiple Containers On The Same Port
Curated About
Don't miss new posts in the series! Subscribe to the blog updates and get deep
technical write-ups on Cloud Native topics direct into your inbox.
Disclaimer: In 2021, there is still a place for simple setups with just one machine serving all
traffic. So, no Kubernetes and no cloud load balancers in this post. Just good old Docker
and Podman.
Even when you have just one physical or virtual server, it's often a good idea to run multiple
instances of your application on it. Luckily, when the application is containerized, it's
actually relatively simple. With multiple application containers, you get horizontal scaling
and a much-needed redundancy for a very little price. Thus, if there is a sudden need for
handling more requests, you can adjust the number of containers accordingly. And if one of
the containers dies, there are others to handle its traffic share, so your app isn't a SPOF
anymore.
The tricky part here is how to expose such a multi-container application to the clients.
Multiple containers mean multiple listening sockets. But most of the time, clients just want
to have a single point of entry.
Surprisingly or not, neither Docker nor Podman support exposing multiple containers on the
same host's port right out of the box.
The most widely suggested workaround is to use an extra container with a reverse proxy
like Nginx, HAProxy, Envoy, or Traefik. Such a proxy should know the exact set of
application containers and load balance the client traffic between them. The only port that
needs to be exposed on the host in such a setup is the port of the proxy itself. Additionally,
since modern reverse proxies usually come with advanced routing capabilities, you can get
canary and blue-green deployments or even coalesce diverse backend apps into a
single fronted app almost for free.
For any production setup, I'd recommend going with a reverse proxy approach first. But in
this post, I want to explore the alternatives. Are there other ways to expose multiple Docker
or Podman containers on the same host's port?
The goal of the below exercise is manifold. First of all, to solidify the knowledge obtained
while working on my container networking and iptables write-ups. But also to show that the
proxy is not the only possible way to achieve a decent load balancing. So, if you are in a
restricted setup where you can't use a proxy for some reason, the techniques from this post
may come in handy.
Let's forget about containers for a second and talk about sockets in general.
To make a server socket listen() on a certain address, you need to explicitly bind() it to
an interface and port. For a long time, binding a socket to an (interface, port) pair was an
exclusive operation. If you bound a socket to a certain address from one process, no other
processes on the same machine would be able to use the same address for their sockets
until the original process closes its socket (hence, releases the port). And it's kind of
reasonable behavior - an interface and port define a packet destination on a machine.
Having ambiguous receivers would be bizarre.
But... modern servers may need to handle tens of thousands of TCP connections per
second. A single process accept() -ing all the client connections quickly becomes a
bottleneck with such a high connection rate. So, starting from Linux 3.9, you can bind an
arbitrary number of sockets to exactly the same (interface, port) pair as long as all of them
use the SO_REUSEPORT socket option. The operating system then will make sure that TCP
connections are evenly distributed between all the listening processes (or threads).
Apparently, the same technique can be applied to containers. However, the SO_REUSEPORT
option works only if all the sockets reside in the same network stack. And that's obviously
not the case for the default Docker/Podman approach, where every container gets its own
network namespace, hence an isolated network stack.
The simplest way to overcome this is to sacrifice the isolation a bit and run all the
containers in the host's network namespace with docker run --network host :
But there is a more subtle way to share a single network namespace between multiple
containers. Docker allows reusing a network namespace of an already existing container
while launching a new one. So, we can start a sandbox container that will do nothing but
sleep. This container will originate a network namespace and also expose the target port to
the host (other namespaces will also be created, but it doesn't really matter). All the
application containers will then be attached to this network namespace using docker run -
-network container:<sandbox_name> syntax.
Fun fact - We've just reinvented Kubernetes Pods here. Want to learn more?
Check out the Kubernetes CRI spec and my post explaining the difference between
Pods and regular containers.
Of course, all the instances of the application server need to set the SO_REUSEPORT option,
so there won't be a port conflict, and the incoming requests will be evenly distributed
between the containers listening on the same port.
Here is a step by step instruction on how to launch the sandbox and the application
containers:
In actuality, the userland docker-proxy is rarely used. Instead, a single iptables rule in the
NAT table does all the heavy lifting. Whenever a packet destined to the (host,
published_port) arrives, a destination address translation to (container, target_port)
happens. So, the port publishing boils down to adding this iptables rule upon the container
startup.
The iptables trick doesn't cover all the scenarios - for instance, traffic from
localhost cannot be NAT-ed. So the docker-proxy is not fully useless.
The problem with the above DNAT is that it can do only the one-to-one translation. Thus, if
we want to support multiple containers behind a single host's port, we need a more
sophisticated solution. Luckily, Kubernetes uses a similar trick in kube-proxy for the
Service ClusterIP to Pod IPs translation while implementing a built-in service discovery.
Long story short, iptables rules can be applied with some probability. So, if you have ten
potential destinations for a packet, try applying the destination address translation to the
first nine one of them with just a 10% chance. And if none of them worked out, apply a
fallback for the very last destination with a 100% chance. As a result, you'll get ten equally
loaded destinations.
Of course, iptables are smart enough to apply the DNAT only to the new connections.
For an already established connection, an existing address mapping is looked up on
the fly.
The huge advantage of this approach comparing to the SO_REUSEPORT option is that it's
absolutely transparent to the application.
Example Go server.
$ CONT_PORT=9090
http_server
Configure iptables DNAT rules - for local ( OUTPUT ) and external ( PREROUTING ) traffic:
$ FRONT_PORT=80
Testing it on a vagrant box game with curl gave me the following request distribution:
Luckily, there is a more modern proxy called Traefik with built-in support for many service
discovery mechanisms, including labeled Docker containers. If you want to see Traefik in
action, check out my write-up on canary container deployments.
Resources
Historical overview of SO_REUSEADDR and SO_REUSEPORT options
The SO_REUSEPORT socket option - LWN article by Michael Kerrisk
The docker-proxy
Cracking Kubernetes node proxy (aka kube-proxy)
Dynamic Nginx configuration for Docker with Python
Automated Nginx Reverse Proxy for Docker
nginx-proxy/nginx-proxy and nginx-proxy/docker-gen GitHub projects.
Configuring Envoy to Auto-Discover Pods on Kubernetes
Related posts
Don't miss new posts in the series! Subscribe to the blog updates and get deep
technical write-ups on Cloud Native topics direct into your inbox.
Name
p.s.
Found another one solution how to Expose Multiple Containers On the Same Port - just use
Docker Swarm with the given docker-compose.yml
version: "3.9"
services:
test-server:
image: nginxdemos/hello
ports:
- "80:80"
deploy:
replicas: 2
△ ▽ • Reply • Share ›
Indeed, seems like Node.js intentionally doesn't allow to set SO_REUSEPORT because
it won't be portable otherwise.
Regarding your docker-compose solution, that's great that you found an alternative! But
it's actually Docker Swarm that does all the magic, not the compose part. In other
words, you could have solved it the same way with a local Kubernetes cluster. And any
solution involving a full-blown orchestrator would actually defeat the purpose of this
article :) Also, beware that Docker Swarm is deprecated since ~2020.
△ ▽ • Reply • Share ›
Series
Newsletter
Twitter
GitHub