Difference between revisions of "OCP4-IPI-libvirt"

From p0f
Jump to: navigation, search
(update expectations, add outcomes and prereqs)
(added linux network settings)
Line 6: Line 6:
 
* You are familiar and comfortable with <code>qemu-img</code> tool.
 
* You are familiar and comfortable with <code>qemu-img</code> tool.
 
* You understand the different types of network interfaces on Linux and different <code>libvirt</code> networks.
 
* You understand the different types of network interfaces on Linux and different <code>libvirt</code> networks.
 +
* You are familiar and comfortable with NetworkManager and the <code>nmcli</code> tool.
 
* You know how OpenShift installation works and what the difference between IPI and UPI is.
 
* You know how OpenShift installation works and what the difference between IPI and UPI is.
 
* You know about the OpenShift Machine API and various underlying mechanisms.
 
* You know about the OpenShift Machine API and various underlying mechanisms.
Line 22: Line 23:
  
 
* 136 GiB RAM (32 GiB per control plane, 20 GiB per compute node), max overcommit ratio of 1.5 (make sure enough swap is available)
 
* 136 GiB RAM (32 GiB per control plane, 20 GiB per compute node), max overcommit ratio of 1.5 (make sure enough swap is available)
* 52 vCPUs (12 per control plane, 8 per compute node), max overcommit ratio of 1.3 (may work with more, but will slow down the installation horribly and may ultimately fail)
+
* 52 vCPUs (12 per control plane, 8 per compute node), max overcommit ratio of 1.3 (higher might work, but will slow down the installation horribly and may ultimately fail)
 
* one physical network interface that will be used for the public bridged network
 
* one physical network interface that will be used for the public bridged network
 
* a physical or virtual network interface that will be used for the provisioning network bridge
 
* a physical or virtual network interface that will be used for the provisioning network bridge
Line 33: Line 34:
 
Due to the fact provisioner needs access to both networks, and the provisioning network in this guide is a virtual one, it might be best if you define the provisioner as a VM, with the same network interface settings as the control/compute nodes.
 
Due to the fact provisioner needs access to both networks, and the provisioning network in this guide is a virtual one, it might be best if you define the provisioner as a VM, with the same network interface settings as the control/compute nodes.
  
In the case you want to run the workloads spread across several hypervisor hosts, there are some extra steps, but nothing big. More on that in [[#Host Configuration]] below.
+
In the case you want to run the workloads spread across several hypervisor hosts, there are some extra steps, but nothing big. More on that in [[#Network Settings]] below.
  
 
Software artifacts needed on the provisioner host:
 
Software artifacts needed on the provisioner host:
  
 
* <code>oc</code>, the command line client, of the corresponding version - download from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/
 
* <code>oc</code>, the command line client, of the corresponding version - download from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/
* <code>libvirt-client</code> package is required for <code>openshift-baremetal-install</code> to be able to communicate to hypervisors
+
* <code>libvirt-client</code> package is required for <code>openshift-baremetal-install</code> to be able to communicate to hypervisor(s)
 
* <code>ipmitool</code> or some other IPMI client
 
* <code>ipmitool</code> or some other IPMI client
 
* a <code>pull-secret</code> file containing authentication credentials for OpenShift Container Platform registries - download from https://console.redhat.com/openshift/
 
* a <code>pull-secret</code> file containing authentication credentials for OpenShift Container Platform registries - download from https://console.redhat.com/openshift/
Line 44: Line 45:
  
 
== Host Configuration ==
 
== Host Configuration ==
 +
 +
Beyond the logical requirement of having <code>libvirt</code> installed and started, here are the other configuration details for the hypervisor.
 +
 +
=== Network Settings ===
 +
 +
First thing you definitely need to make sure of, is that IP forwarding is enabled.
 +
 +
<pre>
 +
$ sysctl net.ipv4.ip_forward
 +
1
 +
</pre>
 +
 +
Linux network settings need to be configured to have two '''Linux''' bridges, a public and a private provisioning one.
 +
 +
* public bridge, call it <code>bridge0</code>, needs to have the public network interface enslaved to it
 +
* private bridge, call it <code>provbr0</code>, can be a virtual bridge since it is only needed for the provisioning network, which is supposed to be isolated and without any infrastructure services (such as DHCP, DNS, etc.)
 +
 +
It would be wonderful if the bridges could be OpenVSwitch ones, but unfortunately the Terraform bundled with <code>openshift-baremetal-install</code> currently does not include an OpenVSwitch provider, so there's goodbye to that.
 +
 +
As an example, here is my host configuration.
 +
 +
Public bridge:
 +
<pre>
 +
$ ip addr show bridge0
 +
6: bridge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
 +
    link/ether 48:21:0b:57:0e:06 brd ff:ff:ff:ff:ff:ff
 +
    inet 172.25.35.2/24 brd 172.25.35.255 scope global noprefixroute bridge0
 +
      valid_lft forever preferred_lft forever
 +
    inet6 fe80::4a21:bff:fe57:e06/64 scope link
 +
      valid_lft forever preferred_lft forever
 +
 +
$ ip addr show enp86s0
 +
2: enp86s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master bridge0 state UP group default qlen 1000
 +
    link/ether 48:21:0b:57:0e:06 brd ff:ff:ff:ff:ff:ff
 +
 +
$ bridge link | grep "master bridge0"
 +
2: enp86s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state forwarding priority 32 cost 100
 +
</pre>
 +
 +
Provisioning bridge:
 +
<pre>
 +
$ ip addr show provbr0
 +
5: provbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
 +
    link/ether ce:70:26:9c:88:a4 brd ff:ff:ff:ff:ff:ff
 +
    inet 10.1.1.2/24 brd 10.1.1.255 scope global noprefixroute provbr0
 +
      valid_lft forever preferred_lft forever
 +
    inet6 fe80::cc70:26ff:fe9c:88a4/64 scope link
 +
      valid_lft forever preferred_lft forever
 +
 +
$ bridge link | grep "master provbr0"
 +
</pre>
 +
 +
=== Installation Spanning Multiple Hypervisors ===
 +
 +
If you want to have your cluster spanning multiple hypervisors, make sure there is also a VXLAN connection between all the provisioning bridges.
 +
 +
You can do that by creating a <code>vxlan</code> type interface, which is a slave connection of type <code>bridge</code>, and the master is set to <code>provbr0</code>. Choose any unique VXLAN ID, and make sure it is the same on all interconnected hosts.
 +
 +
As an example, here is one VXLAN interface connecting hypervisor A to B.
 +
 +
<pre>
 +
$ nmcli con show provbr0-vxlan10 | grep -E '^(connection|vxlan)' | grep -vE '(default|uuid|--|-1|unknown)'
 +
connection.id:                          provbr0-vxlan10
 +
connection.type:                        vxlan
 +
connection.interface-name:              provbr0-vxlan10
 +
connection.autoconnect:                yes
 +
connection.autoconnect-priority:        0
 +
connection.timestamp:                  1703164860
 +
connection.read-only:                  no
 +
connection.master:                      provbr0
 +
connection.slave-type:                  bridge
 +
connection.gateway-ping-timeout:        0
 +
vxlan.id:                              10
 +
vxlan.local:                            172.25.35.2
 +
vxlan.remote:                          172.25.35.3
 +
vxlan.source-port-min:                  0
 +
vxlan.source-port-max:                  0
 +
vxlan.destination-port:                4790
 +
vxlan.tos:                              0
 +
vxlan.ttl:                              0
 +
vxlan.ageing:                          300
 +
vxlan.limit:                            0
 +
vxlan.learning:                        yes
 +
vxlan.proxy:                            no
 +
vxlan.rsc:                              no
 +
vxlan.l2-miss:                          no
 +
vxlan.l3-miss:                          no
 +
</pre>
 +
 +
And this is the corresponding VXLAN interface definition connecting host B to A.
 +
 +
<pre>
 +
$ nmcli con show  provbr0-vxlan10 | grep -E '^(connection|vxlan)' | grep -vE '(default|uuid|--|-1|unknown)'
 +
connection.id:                          provbr0-vxlan10
 +
connection.type:                        vxlan
 +
connection.interface-name:              provbr0-vxlan10
 +
connection.autoconnect:                yes
 +
connection.autoconnect-priority:        0
 +
connection.timestamp:                  1697549049
 +
connection.read-only:                  no
 +
connection.master:                      provbr0
 +
connection.slave-type:                  bridge
 +
connection.gateway-ping-timeout:        0
 +
vxlan.id:                              10
 +
vxlan.local:                            172.25.35.3
 +
vxlan.remote:                          172.25.35.2
 +
vxlan.source-port-min:                  0
 +
vxlan.source-port-max:                  0
 +
vxlan.destination-port:                4790
 +
vxlan.tos:                              0
 +
vxlan.ttl:                              0
 +
vxlan.ageing:                          300
 +
vxlan.limit:                            0
 +
vxlan.learning:                        yes
 +
vxlan.proxy:                            no
 +
vxlan.rsc:                              no
 +
vxlan.l2-miss:                          no
 +
vxlan.l3-miss:                          no
 +
</pre>
 +
 +
In this case, <code>bridge link</code> will of course initially also show the <code>provbr0-vxlan10</code> interface as a slave, and will not show empty as above.
 +
 +
<pre>
 +
$ bridge link | grep "master provbr0"
 +
7: provbr0-vxlan10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master provbr0 state forwarding priority 32 cost 100
 +
</pre>
  
 
== Virtual Machine Configuration ==
 
== Virtual Machine Configuration ==

Revision as of 12:03, 19 January 2024

Introduction

What I Assume

  • You are familiar and comfortable with libvirt CLI and XML.
  • You are familiar and comfortable with qemu-img tool.
  • You understand the different types of network interfaces on Linux and different libvirt networks.
  • You are familiar and comfortable with NetworkManager and the nmcli tool.
  • You know how OpenShift installation works and what the difference between IPI and UPI is.
  • You know about the OpenShift Machine API and various underlying mechanisms.

Outcomes

The installation described is for a fully managed IPI running OpenShift Container Platform v4.14, initially with three master and two worker nodes.

At a later point, I will add a couple of steps needed to grow the cluster by one extra worker node.

OpenShift Container Platform IPI Installation Using Libvirt

Prerequisites

Hardware requirements for the cluster:

  • 136 GiB RAM (32 GiB per control plane, 20 GiB per compute node), max overcommit ratio of 1.5 (make sure enough swap is available)
  • 52 vCPUs (12 per control plane, 8 per compute node), max overcommit ratio of 1.3 (higher might work, but will slow down the installation horribly and may ultimately fail)
  • one physical network interface that will be used for the public bridged network
  • a physical or virtual network interface that will be used for the provisioning network bridge

Hardware requirements for the installation client (provisioner) machine:

  • a minimum of 8 GiB RAM and 4 CPUs
  • a network connection to both the public bridged network and the provisioning network

Due to the fact provisioner needs access to both networks, and the provisioning network in this guide is a virtual one, it might be best if you define the provisioner as a VM, with the same network interface settings as the control/compute nodes.

In the case you want to run the workloads spread across several hypervisor hosts, there are some extra steps, but nothing big. More on that in #Network Settings below.

Software artifacts needed on the provisioner host:

  • oc, the command line client, of the corresponding version - download from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/
  • libvirt-client package is required for openshift-baremetal-install to be able to communicate to hypervisor(s)
  • ipmitool or some other IPMI client
  • a pull-secret file containing authentication credentials for OpenShift Container Platform registries - download from https://console.redhat.com/openshift/
  • an SSH keypair that can be used for accessing OpenShift nodes

Host Configuration

Beyond the logical requirement of having libvirt installed and started, here are the other configuration details for the hypervisor.

Network Settings

First thing you definitely need to make sure of, is that IP forwarding is enabled.

$ sysctl net.ipv4.ip_forward
1

Linux network settings need to be configured to have two Linux bridges, a public and a private provisioning one.

  • public bridge, call it bridge0, needs to have the public network interface enslaved to it
  • private bridge, call it provbr0, can be a virtual bridge since it is only needed for the provisioning network, which is supposed to be isolated and without any infrastructure services (such as DHCP, DNS, etc.)

It would be wonderful if the bridges could be OpenVSwitch ones, but unfortunately the Terraform bundled with openshift-baremetal-install currently does not include an OpenVSwitch provider, so there's goodbye to that.

As an example, here is my host configuration.

Public bridge:

$ ip addr show bridge0
6: bridge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 48:21:0b:57:0e:06 brd ff:ff:ff:ff:ff:ff
    inet 172.25.35.2/24 brd 172.25.35.255 scope global noprefixroute bridge0
       valid_lft forever preferred_lft forever
    inet6 fe80::4a21:bff:fe57:e06/64 scope link
       valid_lft forever preferred_lft forever

$ ip addr show enp86s0
2: enp86s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master bridge0 state UP group default qlen 1000
    link/ether 48:21:0b:57:0e:06 brd ff:ff:ff:ff:ff:ff

$ bridge link | grep "master bridge0"
2: enp86s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state forwarding priority 32 cost 100

Provisioning bridge:

$ ip addr show provbr0
5: provbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ce:70:26:9c:88:a4 brd ff:ff:ff:ff:ff:ff
    inet 10.1.1.2/24 brd 10.1.1.255 scope global noprefixroute provbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::cc70:26ff:fe9c:88a4/64 scope link
       valid_lft forever preferred_lft forever

$ bridge link | grep "master provbr0"

Installation Spanning Multiple Hypervisors

If you want to have your cluster spanning multiple hypervisors, make sure there is also a VXLAN connection between all the provisioning bridges.

You can do that by creating a vxlan type interface, which is a slave connection of type bridge, and the master is set to provbr0. Choose any unique VXLAN ID, and make sure it is the same on all interconnected hosts.

As an example, here is one VXLAN interface connecting hypervisor A to B.

$ nmcli con show provbr0-vxlan10 | grep -E '^(connection|vxlan)' | grep -vE '(default|uuid|--|-1|unknown)'
connection.id:                          provbr0-vxlan10
connection.type:                        vxlan
connection.interface-name:              provbr0-vxlan10
connection.autoconnect:                 yes
connection.autoconnect-priority:        0
connection.timestamp:                   1703164860
connection.read-only:                   no
connection.master:                      provbr0
connection.slave-type:                  bridge
connection.gateway-ping-timeout:        0
vxlan.id:                               10
vxlan.local:                            172.25.35.2
vxlan.remote:                           172.25.35.3
vxlan.source-port-min:                  0
vxlan.source-port-max:                  0
vxlan.destination-port:                 4790
vxlan.tos:                              0
vxlan.ttl:                              0
vxlan.ageing:                           300
vxlan.limit:                            0
vxlan.learning:                         yes
vxlan.proxy:                            no
vxlan.rsc:                              no
vxlan.l2-miss:                          no
vxlan.l3-miss:                          no

And this is the corresponding VXLAN interface definition connecting host B to A.

$ nmcli con show  provbr0-vxlan10 | grep -E '^(connection|vxlan)' | grep -vE '(default|uuid|--|-1|unknown)'
connection.id:                          provbr0-vxlan10
connection.type:                        vxlan
connection.interface-name:              provbr0-vxlan10
connection.autoconnect:                 yes
connection.autoconnect-priority:        0
connection.timestamp:                   1697549049
connection.read-only:                   no
connection.master:                      provbr0
connection.slave-type:                  bridge
connection.gateway-ping-timeout:        0
vxlan.id:                               10
vxlan.local:                            172.25.35.3
vxlan.remote:                           172.25.35.2
vxlan.source-port-min:                  0
vxlan.source-port-max:                  0
vxlan.destination-port:                 4790
vxlan.tos:                              0
vxlan.ttl:                              0
vxlan.ageing:                           300
vxlan.limit:                            0
vxlan.learning:                         yes
vxlan.proxy:                            no
vxlan.rsc:                              no
vxlan.l2-miss:                          no
vxlan.l3-miss:                          no

In this case, bridge link will of course initially also show the provbr0-vxlan10 interface as a slave, and will not show empty as above.

$ bridge link | grep "master provbr0"
7: provbr0-vxlan10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master provbr0 state forwarding priority 32 cost 100

Virtual Machine Configuration

Installer Configuration

Installation

Post-Install Smoke Tests

Conclusion