Docker over multiple network interfaces

So in my previous post, we walked through how you would configure and bring up a new interface on an AWS Ubuntu EC2 instance. This was just the pre-requisite to the problem that will be addressed in this post.

We were running a docker container on an instance that had an eth0 and an eth1 interface. Everything was working fine with the software that was installed on the host level communicating over either IP. However, as soon as I tried to communicate with a Docker container over the eth1 (non default interface), I began to have issues. This is because of what Docker does at the ip tables level.

Now this is completely acceptable and needed however, what we were seeing was that when the packets would come in over eth1 they would be appropriately routed through the docker0 interface to the correct docker container listening on that port (let’s just 4000 for our example). The docker container application would service the request and attempt to respond back out the docker0 interface. Since it lost the source IP (our eth1 IP) it would assume the default interface and therefore use the default route table which would attempt to respond using the eth0 IP. From the requester’s point of view, there would never be a response.

We solved this by adding to the work we had already done to configure and ready another network interface. At a high level, we added a FW mark (firewall mark) to any connection coming in over the eth1 interface so that as the docker0 serviced the request and was about to send it back out, it would restore the connection mark. Then we have a rule that would run to say, if there is a connection mark, it is to use the eth1 route table to figure out which IP to respond with.

We added another upstart script that would do the iptables commands we needed when we knew the docker0 interface would be available. It would simple add/remove to/from the mangle table in the PREROUTING section for any connection coming from the docker0 interface to restore any connection marks that might have been set. The script would then also remove the rule if the docker service was stopped (machine rebooting or shutting down).

# etc/init/docker-finish.conf

# docker-finish - update routing tables after docker is up

description "update routing tables after docker is up"

start on started docker
stop on stopped docker

pre-start script
  /sbin/modprobe nf_conntrack
  /sbin/iptables -w -t mangle -A PREROUTING -i docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff
end script

post-stop script
  /sbin/iptables -t mangle -D PREROUTING -i docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff
end script

Next, when we were configuring our additional interface, eth1 in our example, we would also add the connection settings to set the mark. Please see the code below:

# excerpt from ec2net-functions

rewrite_primary() {
  if [ "${INTERFACE}" == "eth0" ]; then
    return
  fi
  cidr=$(get_cidr)
  if [ -z ${cidr} ]; then
    return
  fi
  network=$(echo ${cidr}|cut -d/ -f1)
  router=$(( $(echo ${network}|cut -d. -f4) + 1))
  gateway="$(echo ${network}|cut -d. -f1-3).${router}"
  cat <<- EOF > ${config_file}
# This file is automaticatically generated
# See https://github.com/jessecollier/ubuntu-ec2net for source
auto ${INTERFACE}
iface ${INTERFACE} inet dhcp
  metric ${RTABLE}
post-up ip route add default via ${gateway} dev ${INTERFACE} table ${RTABLE} || true
# use fwmark for connections with masquerading (docker support)
post-up iptables -t mangle -A PREROUTING -i ${INTERFACE} -j MARK --set-xmark 0x${RTABLE}/0xffffffff
post-up iptables -t mangle -A PREROUTING -i ${INTERFACE} -j CONNMARK --save-mark --nfmask 0xffffffff --ctmask 0xffffffff
post-up sysctl -wq net.ipv4.conf.${INTERFACE}.rp_filter=2
post-up bash -c 'bash -c '"'"'until [ "\$(ip route show dev docker0 2>/dev/null)" != "" ]; do sleep 1; done; ip route add \$(ip route show dev docker0) dev docker0 table ${RTABLE}'"'"' &'
pre-down bash -c 'ip route del \$(ip route show dev docker0) dev docker0 table ${RTABLE}' || true
EOF
  # Use broadcast address instead of unicast dhcp server address.
  # Works around an issue with two interfaces on the same subnet.
  # Unicast lease requests go out the first available interface,
  # and dhclient ignores the response. Broadcast requests go out
  # the expected interface, and dhclient accepts the lease offer.
  cat <<- EOF > ${dhclient_file}
  supersede dhcp-server-identifier 255.255.255.255;
EOF
}

Now the rule that was added, would say that for any connection with a FW mark that we set coming in from eth1, it would use the new route table that we configured for the eth1 interface.

# excerpt from ec2net-functions line 230

/sbin/ip rule add from all fwmark 0x${RTABLE} lookup ${RTABLE}

With these tweaks, we now have the ability to communicate over either eth0 or eth1 to docker containers running on an EC2 instance.

comments powered by Disqus