Oracle Linux Automation Manager 2.0 in the Oracle Cloud – A Story about 10.0.2.0/24

no route to host

Since some days I was struggling after an OLAM2 – Oracle Automation Manager 2.0 – setup, to connect to specific hosts. As you can see here in the picture, I was able to execute OLAM2 job templates (Ansible Playbooks) against a host in same subnet in 10.0.1.0/24, but not for the other one in subnet 10.0.2.0/24. The message was always the same: no route to host. But, in OCI basically each subnet has connection to the the other one, there is no special subnet routing withing a VCN Virtual Cloud Network. The setup on an OL8 machine was execute as described in the installation guide: Installation Guide (oracle.com). The setup type was Single Host.

For testing and debugging purpose, I have used a small playbook in a local defined project in folder /var/lib/awx/projects/pause. The gather_facts parameter ensures, that OLAM tries to connect to the target servers.

---
- hosts: all
  gather_facts: yes

  tasks:
  - name: Pause for 100 minutes
    ansible.builtin.pause:
       minutes: 100

Test Run

ci-linux-server-alpha – same subnet 10.0.1.0/24

 

ci-linux-server-fox – other subnet 10.0.2.0/24

 

Connectivity

OS Host-to-Target Connectivity Check – OK

OLAM2 was not able to connect to the target server 10.0.2.70. Some short basics about OLAM2:

  • In this setup, Tower, the repository database and the execution node are running on same host
  • When a job template is executed, a Podman container is started in the back where the job runs
  • The Podman configuration is “rootless”, this means OS user awx handled the container and all around
  • To connect from container to external host, the Podman “bridge” mode for networking is set

Let’s check the SSH connectivity from OLAM2 host to target server to exclude OCI network firewall settings. This works.

[opc@ci-olam2 ~]$ ssh -i ~/.ssh/id_rsa_devops_2023 10.0.2.70
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Mon Feb 6 20:46:04 2023 from 192.168.1.45
[opc@ci-linux-server-fox ~]$ hostname
ci-linux-server-fox

OS Container-to-Target Connectivity Check – NOT OK

Therefore, the Ansible playbook was adapted not to gather the instance statistics. In the background, a Podman container is started and running for 100 minutes.

---
- hosts: all
  gather_facts: no

  tasks:

  - name: Pause for 100 minutes
    ansible.builtin.pause:
       minutes: 100

Job Template is running

 

Login into Podman Container

We get the ID to connect to the container and start Bash. As Podman runs “rootless”, we have to use the OS user awx. As you can see here, the OLAM-EE image is used for the container. 

# sudo su -l awx -s /bin/bash

$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
da61ceeb81c1 container-registry.oracle.com/oracle_linux_automation_manager/olam-ee:latest ssh-agent sh -c t... 5 minutes ago Up 5 minutes ago ansible_runner_24

Bash Connect to gather Container Network Information – no route to host

We verify the IP address settings of the container. 

$ podman exec -it da61ceeb81c1 /bin/bash
[root@da61ceeb81c1 project]#
[root@da61ceeb81c1 project]# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether aa:a4:28:f4:f8:13 brd ff:ff:ff:ff:ff:ff
    inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0
       valid_lft forever preferred_lft forever
    inet6 fd00::a8a4:28ff:fef4:f813/64 scope global dynamic mngtmpaddr
       valid_lft 86308sec preferred_lft 14308sec
    inet6 fe80::a8a4:28ff:fef4:f813/64 scope link
       valid_lft forever preferred_lft forever

[root@da61ceeb81c1 project]# ip route
default via 10.0.2.2 dev tap0
10.0.2.0/24 dev tap0 proto kernel scope link src 10.0.2.100

SSH fails – no route to host

[root@da61ceeb81c1 project]# ssh 10.0.2.70
ssh: connect to host 10.0.2.70 port 22: No route to host

Here we are – no route to host. The facts:

  1. SSH from OLAM host to target works
  2. SSH from Podman container to target fails
  3. OCI networking can be excluded

Network Subnet Comparison – 10.0.2.100?

I was wondering, where the IP address 10.0.2.100 was coming from. This IP is from the same network range than the OCI compute instance ci-linux-server-fox is coming from (10.0.2.0/24). But where does it come from? After some research in the www, I found the answer in the method, how Podman is used: “rootless”.  From the podman-run — Podman documentation:

slirp4netns[:OPTIONS,…]: use slirp4netns(1) to create a user network stack. This is the default for rootless containers. It is possible to specify these additional options, they can also be set with network_cmd_options in containers.conf:

  • allow_host_loopback=true|false: Allow slirp4netns to reach the host loopback IP (default is 10.0.2.2 or the second IP from slirp4netns cidr subnet when changed, see the cidr option below). The default is false.
  • mtu=MTU: Specify the MTU to use for this network. (Default is 65520).
  • cidr=CIDR: Specify ip range to use for this network. (Default is 10.0.2.0/24).

Looks like Podman shares by slirpnet4netns the same IP address range that I use in Oracle Cloud Infrastructure…

Oracle Cloud Infrastructure Test with Subnet 10.0.3.0/24

For another test, a new subnet with range 10.0.3.0/24 is created and a new machine is running: ci-linux-server-golf with IP 10.0.3.87. Let’s see what happens there when we add this host to OLAM and enable information gathering in the Ansible playbook again. Tataaaaaa – the job template is running fine!

 

Solution & Summary

Take care when running OLAM2 that internally subnet range 10.0.2.0/24 is used. If the same subnet is available in OCI, we have a conflict. As workaround: DO NOT USE 10.0.2.0/24 for a target server subnet. And now I have to watch into the slirp4netns documentation – let’s see if there is a change to change the internal used CIDR in “rootless” mode. #networkfun #research