[microstack][k8s][calico] Not ping through between pods across different subnets in k8s

Two vms created from microstack for deploying k8s with calico CNI
VM1(extest-1): internal:192.168.122.204/external:128.224.157.145
VM1(extest-2): internal:192.168.122.72/external:128.224.157.139

calico config:

ubuntu@extest-1:~$ cat custom-resources.yaml 
# This section includes base Calico installation configuration.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  # Configures Calico networking.
  calicoNetwork:
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
    - blockSize: 26
      cidr: 172.22.0.0/16
      encapsulation: VXLANCrossSubnet
      natOutgoing: Enabled
      nodeSelector: all()

---

# This section configures the Calico API server.
# For more information, see: https://docs.tigera.io/calico/latest/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}


ubuntu@extest-1:~$ ip route
default via 128.224.157.1 dev ens4 proto static 
default via 192.168.122.1 dev ens3 proto dhcp src 192.168.122.204 metric 100 
default via 192.168.122.1 dev ens3 proto dhcp metric 100 
128.224.157.0/24 dev ens4 proto kernel scope link src 128.224.157.145 
128.224.160.11 via 192.168.122.1 dev ens3 proto dhcp src 192.168.122.204 metric 100 
128.224.160.12 via 192.168.122.1 dev ens3 proto dhcp src 192.168.122.204 metric 100 
169.254.0.0/16 dev ens4 scope link metric 1000 
169.254.169.254 via 192.168.122.2 dev ens3 proto dhcp src 192.168.122.204 metric 100 
169.254.169.254 via 192.168.122.2 dev ens3 proto dhcp metric 100 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
blackhole 172.22.184.128/26 proto 80 
172.22.184.129 dev cali06f33a9668e scope link 
172.22.184.130 dev cali91df5b91d11 scope link 
172.22.184.131 dev cali3082f3602b7 scope link 
172.22.184.132 dev cali24ee372a81b scope link 
172.22.184.133 dev cali2a89713a9c5 scope link 
172.22.184.134 dev cali0393bfc615a scope link 
172.22.184.135 dev cali8c163d3f0a7 scope link 
172.22.246.192/26 via 128.224.157.139 dev ens4 proto 80 onlink 
192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.204 metric 100 
192.168.122.1 dev ens3 proto dhcp scope link src 192.168.122.204 metric 100 
192.168.122.2 dev ens3 proto dhcp scope link src 192.168.122.204 metric 100
ubuntu@extest-1:~$ kubectl get pods -A -owide
NAMESPACE          NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE       NOMINATED NODE   READINESS GATES
calico-apiserver   calico-apiserver-6dc9d48f8b-j294c          1/1     Running   0          3h39m   172.22.184.133    extest-1   <none>           <none>
calico-apiserver   calico-apiserver-6dc9d48f8b-wqglr          1/1     Running   0          3h39m   172.22.184.134    extest-1   <none>           <none>
calico-system      calico-kube-controllers-74895d748f-kb5x8   1/1     Running   0          3h44m   172.22.184.132    extest-1   <none>           <none>
calico-system      calico-node-c76ps                          1/1     Running   0          3h36m   192.168.122.72    extest-2   <none>           <none>
calico-system      calico-node-qm6zf                          1/1     Running   0          3h44m   192.168.122.204   extest-1   <none>           <none>
calico-system      calico-typha-57bb44dfd5-pvmsd              1/1     Running   0          3h44m   192.168.122.204   extest-1   <none>           <none>
calico-system      csi-node-driver-k274s                      2/2     Running   0          3h36m   172.22.246.193    extest-2   <none>           <none>
calico-system      csi-node-driver-n6wv2                      2/2     Running   0          3h44m   172.22.184.131    extest-1   <none>           <none>
default            pingtest-7b5d44b647-dlf4w                  1/1     Running   0          142m    172.22.184.135    extest-1   <none>           <none>
default            pingtest-7b5d44b647-wgnht                  1/1     Running   0          142m    172.22.246.194    extest-2   <none>           <none>
kube-system        coredns-76f75df574-gfmvg                   1/1     Running   0          3h49m   172.22.184.129    extest-1   <none>           <none>
kube-system        coredns-76f75df574-pbmt2                   1/1     Running   0          3h49m   172.22.184.130    extest-1   <none>           <none>
kube-system        etcd-extest-1                              1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
kube-system        kube-apiserver-extest-1                    1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
kube-system        kube-controller-manager-extest-1           1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
kube-system        kube-proxy-8zqpm                           1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
kube-system        kube-proxy-dpb6s                           1/1     Running   0          3h36m   192.168.122.72    extest-2   <none>           <none>
kube-system        kube-scheduler-extest-1                    1/1     Running   0          3h49m   192.168.122.204   extest-1   <none>           <none>
tigera-operator    tigera-operator-55585899bf-84997           1/1     Running   0          3h47m   192.168.122.204   extest-1   <none>           <none>

In pingtest-7b5d44b647-dlf4w(172.22.184.135)

/ # ping 172.22.246.194
PING 172.22.246.194 (172.22.246.194): 56 data bytes
^C
--- 172.22.246.194 ping statistics ---
10 packets transmitted, 0 packets received, 100% packet loss
/ # nslookup www.google.com 10.96,0,10
nslookup: bad address '10.96,0,10'
/ # nslookup www.google.com 10.96.0.10
Server:     10.96.0.10
Address:    10.96.0.10:53

Non-authoritative answer:
Name:   www.google.com
Address: 199.96.62.21

Non-authoritative answer:
Name:   www.google.com
Address: 2a03:2880:f129:83:face:b00c:0:25de

It can access service and other pods within the same subnet, but not in another subnet(node)(like pod pingtest-7b5d44b647-wgnht:172.22.246.194)

ubuntu@extest-1:~$ ifconfig
...
ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1442
        inet 192.168.122.204  netmask 255.255.255.0  broadcast 192.168.122.255
        inet6 fe80::f816:3eff:fe72:3ab6  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:72:3a:b6  txqueuelen 1000  (Ethernet)
        RX packets 31201  bytes 20398464 (20.3 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 30892  bytes 6099260 (6.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ens4: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 128.224.157.145  netmask 255.255.255.0  broadcast 128.224.157.255
        inet6 fe80::f816:3eff:feb0:23a0  prefixlen 64  scopeid 0x20<link>
        ether fa:16:3e:b0:23:a0  txqueuelen 1000  (Ethernet)
        RX packets 542425  bytes 1055748029 (1.0 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 224454  bytes 71843285 (71.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
...

If I debug with tcpdump, found the packet can reach ens4 on vm extest-1

ubuntu@extest-1:~$ sudo tcpdump -i ens4 icmp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on ens4, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:58:40.134615 IP 172.22.184.135 > 172.22.246.194: ICMP echo request, id 30, seq 27, length 64
10:58:41.134940 IP 172.22.184.135 > 172.22.246.194: ICMP echo request, id 30, seq 28, length 64
10:58:42.135462 IP 172.22.184.135 > 172.22.246.194: ICMP echo request, id 30, seq 29, length 64
10:58:43.135831 IP 172.22.184.135 > 172.22.246.194: ICMP echo request, id 30, seq 30, length 64

ubuntu@extest-1:~$ ip route get 172.22.246.194
172.22.246.194 via 128.224.157.139 dev ens4 src 128.224.157.145 uid 1001 
    cache 

ubuntu@extest-1:~$ ping 128.224.157.139
PING 128.224.157.139 (128.224.157.139) 56(84) bytes of data.
64 bytes from 128.224.157.139: icmp_seq=1 ttl=64 time=3.89 ms
64 bytes from 128.224.157.139: icmp_seq=2 ttl=64 time=2.09 ms

tcpdump on extest-2, there is no packet arrived
And for openstack I also open almost all rules for security groups, including BGP protocol 4

Has anyone ever met such a problem before?

Asked By: Johnny Sun

||

The problem solved. The roor cause is because of calico ippool configuration.
Seems the routes added during the init installation of calico misconfigured.
I changed to IPIPmode to Never, then applied the changes. After that, Change IPIPmode to Always, applied it again. The problem solved.

Answered By: Johnny Sun