Difference between revisions of "OCP4-IPI-libvirt"
(→Installer Configuration: added extract tools) |
(→Creating Configuration Manually: remove moaning about network interface matching, TBFO) |
||
(12 intermediate revisions by the same user not shown) | |||
Line 437: | Line 437: | ||
=== Preparing the Software === | === Preparing the Software === | ||
− | + | First, make sure your <code>pull-secret</code> file is on the provisioner. | |
+ | |||
+ | <pre> | ||
+ | $ ls -l pull-secret | ||
+ | -rw-r-----. 1 provisioner provisioner 2734 Oct 27 12:21 pull-secret | ||
+ | </pre> | ||
+ | |||
+ | Then after downloading <code>oc</code> (if you didn't do that already)... | ||
<pre> | <pre> | ||
Line 452: | Line 459: | ||
$ sudo cp ./oc /usr/local/bin/oc | $ sudo cp ./oc /usr/local/bin/oc | ||
− | $ rm -f ./oc | + | $ rm -f ./oc ./openshift-client-linux.tar.gz |
</pre> | </pre> | ||
Line 480: | Line 487: | ||
$ sudo cp ./openshift-baremetal-install /usr/local/bin/ | $ sudo cp ./openshift-baremetal-install /usr/local/bin/ | ||
+ | $ rm -f ./openshift-baremetal-install | ||
$ openshift-baremetal-install completion bash > oinst.completion | $ openshift-baremetal-install completion bash > oinst.completion | ||
Line 486: | Line 494: | ||
</pre> | </pre> | ||
− | ''NOTE'': The extraction process can take up to 5 minutes, depending on your network speed and Quay.io responsiveness. | + | '''NOTE''': The extraction process can take up to 5 minutes, depending on your network speed and Quay.io responsiveness. |
+ | |||
+ | === Creating Installer Configuration === | ||
+ | |||
+ | Installer configuration file can be created interactively, using <code>openshift-baremetal-install</code>, or you can simply write it out with an editor. | ||
+ | |||
+ | ==== Using Interactive Mode ==== | ||
+ | |||
+ | '''IMPORTANT''': In interactive mode, machine network (the network segment the nodes have their IPs allocated from) is expected to be <code>10.0.0.0/16</code>. If your VIPs are not on that network, the installer will fail. If your machine network is not <code>10.0.0.0/16</code>, the only way to install is to create <code>install-config.yaml</code> manually. | ||
+ | |||
+ | '''IMPORTANT''': Interactive installer expects three control plane nodes and three workers. It will fail if you add less than three of each type of nodes to the cluster. | ||
+ | |||
+ | The interactive mode will ask you a series of questions and generate <code>install-config.yaml</code> at the end, but there are very few customisation options using this method. | ||
+ | |||
+ | Initially, you must select the SSH public key to be published on cluster nodes, and specify some general settings: | ||
+ | |||
+ | <pre> | ||
+ | $ openshift-baremetal-install --dir=cluster create install-config | ||
+ | ? SSH Public Key /home/provisioner/.ssh/id_rsa.pub | ||
+ | ? Platform baremetal | ||
+ | ? Provisioning Network Managed | ||
+ | ? Provisioning Network CIDR 172.22.0.0/24 | ||
+ | ? Provisioning bridge provbr0 | ||
+ | ? Provisioning Network Interface enp0s3 | ||
+ | ? External bridge bridge0 | ||
+ | ? Add a Host: [Use arrows to move, type to filter] | ||
+ | > control plane | ||
+ | worker | ||
+ | </pre> | ||
+ | |||
+ | As you can see above, after those have been answered, you may add any number of nodes, either control plane or worker. | ||
+ | |||
+ | For both type of nodes, the questions are the same: | ||
+ | |||
+ | <pre> | ||
+ | ? Add a Host: control plane | ||
+ | ? Name controlplane1 | ||
+ | ? BMC Address ipmi://192.168.1.1:6211 | ||
+ | ? BMC Username admin | ||
+ | ? BMC Password ******** | ||
+ | ? Boot MAC Address 52:54:00:00:fb:11 | ||
+ | ? Add another host? (y/N) y | ||
+ | </pre> | ||
+ | |||
+ | After all the hosts have been added, there is a final series of cluster-related questions: | ||
+ | |||
+ | <pre> | ||
+ | ? Add another host? No | ||
+ | ? Base Domain example.com | ||
+ | ? Cluster Name mycluster | ||
+ | ? Pull Secret [? for help] ****************************... | ||
+ | </pre> | ||
+ | |||
+ | At this point, some checks are performed against DNS server to see if <code>api.mycluster.example.com</code> resolves, and if the IP is a part of the machine network. The same check is performed for the ingress VIP. | ||
+ | |||
+ | '''NOTE''': If your VIPs are not on the default machine network, the installer will fail at this point. | ||
+ | |||
+ | <pre> | ||
+ | FATAL failed to fetch Install Config: failed to generate asset "Install Config": invalid install config: [platform.baremetal.apiVIPs: Invalid value: "172.25.3.10": IP expected to be in one of the machine networks: 10.0.0.0/16, platform.baremetal.ingressVIPs: Invalid value: "172.25.3.11": IP expected to be in one of the machine networks: 10.0.0.0/16] | ||
+ | </pre> | ||
+ | |||
+ | '''NOTE''': If you added fewer than three control plane nodes, or fewer than three workers, the installer will fail at this point. | ||
+ | |||
+ | <pre> | ||
+ | FATAL failed to fetch Install Config: failed to generate asset "Install Config": invalid install config: [...] | ||
+ | </pre> | ||
+ | |||
+ | ==== Creating Configuration Manually ==== | ||
+ | |||
+ | The installer configuration file consists of a number of mandatory sections: | ||
+ | |||
+ | * cluster domain and name | ||
+ | * network settings (such as machine and cluster IP ranges) | ||
+ | * control plane and compute node settings | ||
+ | * infrastructure platform settings | ||
+ | * pull secret and ssh key | ||
+ | |||
+ | There are also some optional sections which are useful in special cases such as disconnected installation or special install modes. | ||
+ | |||
+ | '''NOTE''': There is an <code>explain</code> subcommand in the installer binary that explains the structure of the <code>installconfig</code> resource. | ||
+ | |||
+ | <pre> | ||
+ | $ openshift-baremetal-install explain installconfig | ||
+ | KIND: InstallConfig | ||
+ | VERSION: v1 | ||
+ | |||
+ | RESOURCE: <object> | ||
+ | InstallConfig is the configuration for an OpenShift install. | ||
+ | |||
+ | FIELDS: | ||
+ | additionalTrustBundle <string> | ||
+ | AdditionalTrustBundle is a PEM-encoded X.509 certificate bundle that will be added to the nodes' trusted certificate store. | ||
+ | ... | ||
+ | |||
+ | $ openshift-baremetal-install explain installconfig.platform.baremetal | ||
+ | KIND: InstallConfig | ||
+ | VERSION: v1 | ||
+ | |||
+ | RESOURCE: <object> | ||
+ | BareMetal is the configuration used when installing on bare metal. | ||
+ | |||
+ | FIELDS: | ||
+ | apiVIP <string> | ||
+ | Format: ip | ||
+ | DeprecatedAPIVIP is the VIP to use for internal API communication Deprecated: Use APIVIPs | ||
+ | |||
+ | apiVIPs <[]string> | ||
+ | Format: ip | ||
+ | APIVIPs contains the VIP(s) to use for internal API communication. In dual stack clusters it contains an IPv4 and IPv6 address, otherwise only one VIP | ||
+ | ... | ||
+ | </pre> | ||
+ | |||
+ | An example <code>install-config.yaml</code> for baremetal IPI looks like this: | ||
+ | |||
+ | <pre> | ||
+ | apiVersion: v1 | ||
+ | baseDomain: example.com | ||
+ | metadata: | ||
+ | name: mycluster | ||
+ | networking: | ||
+ | networkType: OVNKubernetes | ||
+ | # These are the networks external IPs will be allocated from. | ||
+ | machineNetwork: | ||
+ | - cidr: 172.25.3.0/24 | ||
+ | # This is the pod network. | ||
+ | clusterNetworks: | ||
+ | - cidr: 10.200.0.0/14 | ||
+ | hostPrefix: 23 | ||
+ | # Only one entry is supported. | ||
+ | serviceNetwork: | ||
+ | - 172.30.0.0/16 | ||
+ | compute: | ||
+ | - name: worker | ||
+ | replicas: 2 | ||
+ | controlPlane: | ||
+ | name: master | ||
+ | replicas: 3 | ||
+ | platform: | ||
+ | baremetal: {} | ||
+ | platform: | ||
+ | baremetal: | ||
+ | apiVIPs: | ||
+ | - 172.25.3.10 | ||
+ | ingressVIPs: | ||
+ | - 172.25.3.11 | ||
+ | provisioningNetwork: Managed | ||
+ | provisioningNetworkCIDR: 10.1.1.0/24 | ||
+ | provisioningDHCPRange: 10.1.1.200,10.1.1.210 | ||
+ | # These settings are to configure the temporary bootstrap node as a VM | ||
+ | externalBridge: bridge0 | ||
+ | externalMACAddress: '52:54:00:00:fa:0f' | ||
+ | bootstrapProvisioningIP: 10.1.1.9 | ||
+ | provisioningBridge: provbr0 | ||
+ | provisioningMACAddress: '52:54:00:00:fb:0f' | ||
+ | # This needs to be done to avoid nested virtualisation if provisioner is a VM. | ||
+ | libvirtURI: qemu+ssh://root@hypervisor.example.com/system | ||
+ | hosts: | ||
+ | - name: controlplane1 | ||
+ | role: master | ||
+ | bmc: | ||
+ | address: ipmi://hypervisor.example.com:6211 | ||
+ | disableCertificateVerification: true | ||
+ | username: admin | ||
+ | password: password | ||
+ | # This is the provisioning network interface. | ||
+ | bootMACAddress: 52:54:00:00:fb:11 | ||
+ | bootMode: legacy | ||
+ | # We need this for proper targetting of root device (vda). | ||
+ | hardwareProfile: libvirt | ||
+ | - name: controlplane2 | ||
+ | role: master | ||
+ | bmc: | ||
+ | address: ipmi://hypervisor.example.com:6212 | ||
+ | disableCertificateVerification: true | ||
+ | username: admin | ||
+ | password: password | ||
+ | bootMACAddress: 52:54:00:00:fb:12 | ||
+ | bootMode: legacy | ||
+ | hardwareProfile: libvirt | ||
+ | - name: controlplane3 | ||
+ | role: master | ||
+ | bmc: | ||
+ | address: ipmi://hypervisor.example.com:6213 | ||
+ | disableCertificateVerification: true | ||
+ | username: admin | ||
+ | password: password | ||
+ | bootMACAddress: 52:54:00:00:fb:13 | ||
+ | bootMode: legacy | ||
+ | hardwareProfile: libvirt | ||
+ | - name: worker1 | ||
+ | role: master | ||
+ | bmc: | ||
+ | address: ipmi://hypervisor.example.com:6221 | ||
+ | disableCertificateVerification: true | ||
+ | username: admin | ||
+ | password: password | ||
+ | bootMACAddress: 52:54:00:00:fb:21 | ||
+ | bootMode: legacy | ||
+ | hardwareProfile: libvirt | ||
+ | - name: worker2 | ||
+ | role: master | ||
+ | bmc: | ||
+ | address: ipmi://hypervisor.example.com:6222 | ||
+ | disableCertificateVerification: true | ||
+ | username: admin | ||
+ | password: password | ||
+ | bootMACAddress: 52:54:00:00:fb:22 | ||
+ | bootMode: legacy | ||
+ | hardwareProfile: libvirt | ||
+ | pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"...","email":"..."},"quay.io":{"auth":"...","email":"..."},...}}' | ||
+ | sshKey: "ssh-rsa ..." | ||
+ | </pre> | ||
== Installation == | == Installation == | ||
+ | |||
+ | With the above installation configuration file created, place a copy of it in a subdirectory, such as <code>./mycluster/</code> and run the installer. | ||
+ | |||
+ | <pre> | ||
+ | $ mkdir mycluster | ||
+ | $ cp install-config.yaml ./mycluster/ | ||
+ | |||
+ | $ openshift-baremetal-install --dir=./mycluster/ --log-level=debug create cluster | ||
+ | DEBUG OpenShift Installer 4.14.9 | ||
+ | DEBUG Built from commit dfafb5ca972a6ed4677257aebfe4f284ac020830 | ||
+ | DEBUG Fetching Metadata... | ||
+ | DEBUG Loading Metadata... | ||
+ | ... | ||
+ | DEBUG Loading Install Config... | ||
+ | DEBUG Loading Bootstrap Ignition Config... | ||
+ | ... | ||
+ | INFO Consuming Install Config from target directory | ||
+ | ... | ||
+ | INFO Obtaining RHCOS image file from 'https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.14-9.2/builds/414.92.202310210434-0/x86_64/rhcos-414.92.202310210434-0-qemu.x86_64.qcow2.gz?sha256=aab55f3ee088b88562f8fdcde5be78ace023e06fa01263e7cb9de2edc7131d6f' | ||
+ | ... | ||
+ | INFO Creating infrastructure resources... | ||
+ | ... | ||
+ | </pre> | ||
+ | |||
+ | When you see the above message, check the hypervisor for the presence of the temporary bootstrap VM. | ||
+ | |||
+ | <pre> | ||
+ | $ virsh list | ||
+ | Id Name State | ||
+ | --------------------------------------- | ||
+ | 4 provisioner running | ||
+ | 5 mycluster-tmkmv-bootstrap running | ||
+ | </pre> | ||
+ | |||
+ | Once you see the VM running, you can inspect any containers on it by using the SSH key configured in install config to log into it and having a look around. | ||
+ | |||
+ | <pre> | ||
+ | $ ssh -i ~/.ssh/id_rsa core@bootstrap.mycluster.example.com | ||
+ | The authenticity of host 'bootstrap.mycluster.example.com (172.25.3.9)' can't be established. | ||
+ | ... | ||
+ | Red Hat Enterprise Linux CoreOS 414.92.202310210434-0 | ||
+ | Part of OpenShift 4.14, RHCOS is a Kubernetes native operating system | ||
+ | managed by the Machine Config Operator (`clusteroperator/machine-config`). | ||
+ | |||
+ | WARNING: Direct SSH access to machines is not recommended; instead, | ||
+ | make configuration changes via `machineconfig` objects: | ||
+ | ... | ||
+ | [core@localhost ~]$ sudo -i | ||
+ | |||
+ | [root@localhost ~]# systemctl is-active crio | ||
+ | active | ||
+ | |||
+ | [root@localhost ~]# systemctl is-active bootkube | ||
+ | active | ||
+ | |||
+ | [root@localhost ~]# podman ps | ||
+ | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | ||
+ | da3b3e74fc7f quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:... /bin/rundnsmasq About a minute ago Up About a minute dnsmasq | ||
+ | |||
+ | [root@localhost ~]# crictl pods | ||
+ | POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME | ||
+ | af9df933d3b91 49 seconds ago Ready etcd-bootstrap-member-localhost.localdomain openshift-etcd 0 (default) | ||
+ | |||
+ | [root@localhost ~]# crictl ps | ||
+ | CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD | ||
+ | 9ad74f6433a33 753b0a16ba606f4c579690ed0035.. 6 seconds ago Running etcd 0 af9df933d3b91 etcd-bootstrap-member-localhost.localdomain | ||
+ | 5c9371b25e646 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:... 6 seconds ago Running etcdctl 0 af9df933d3b91 etcd-bootstrap-member-localhost.localdomain | ||
+ | </pre> | ||
+ | |||
+ | Once you see pods like <code>kube-system</code>, <code>openshift-kube-apiserver</code>, and <code>openshift-cluster-version</code> show up in ready state, you can leave the shell and use the generated <code>kubeconfig</code> file to track the progress of the installation from the provisioner machine. | ||
+ | |||
+ | <pre> | ||
+ | $ export KUBECONFIG=$(pwd)/cluster/auth/kubeconfig | ||
+ | |||
+ | $ oc get clusterversion | ||
+ | NAME VERSION AVAILABLE PROGRESSING SINCE STATUS | ||
+ | version False True 15m Unable to apply 4.14.9: an unknown error has occurred: MultipleErrors | ||
+ | |||
+ | $ oc get clusteroperators | ||
+ | $ oc get co | ||
+ | NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE | ||
+ | authentication | ||
+ | baremetal | ||
+ | cloud-controller-manager | ||
+ | cloud-credential True False False 15m | ||
+ | cluster-autoscaler | ||
+ | config-operator | ||
+ | console | ||
+ | control-plane-machine-set | ||
+ | csi-snapshot-controller | ||
+ | dns | ||
+ | etcd | ||
+ | image-registry | ||
+ | ingress | ||
+ | insights | ||
+ | kube-apiserver | ||
+ | kube-controller-manager | ||
+ | kube-scheduler | ||
+ | kube-storage-version-migrator | ||
+ | machine-api | ||
+ | machine-approver | ||
+ | machine-config | ||
+ | marketplace | ||
+ | monitoring | ||
+ | network | ||
+ | node-tuning | ||
+ | openshift-apiserver | ||
+ | openshift-controller-manager | ||
+ | openshift-samples | ||
+ | operator-lifecycle-manager | ||
+ | operator-lifecycle-manager-catalog | ||
+ | operator-lifecycle-manager-packageserver | ||
+ | service-ca | ||
+ | storage | ||
+ | </pre> | ||
+ | |||
+ | '''NOTE''': It is normal for cluster version and various cluster operators to report transient error states as the progress of one impacts the progress of others. Eventually all these errors should go away. | ||
== Post-Install Smoke Tests == | == Post-Install Smoke Tests == | ||
= Conclusion = | = Conclusion = |
Latest revision as of 10:31, 5 February 2024
Contents
Introduction
What I Assume
- You know how OpenShift installation works and what the difference between IPI and UPI is.
- You know about the OpenShift Machine API and various underlying mechanisms.
- You understand the different types of network interfaces on Linux and different
libvirt
networks. - You are familiar and comfortable with NetworkManager and the
nmcli
tool. - You are familiar and comfortable with
libvirt
CLI and XML. - You are familiar and comfortable with
qemu-img
tool. - We are not talking about any firewall restrictions here - it is your responsibility to ensure traffic is not blocked.
Outcomes
The installation described is for a fully managed IPI running OpenShift Container Platform v4.14, initially with three master and two worker nodes.
At a later point, I will add a couple of steps needed to grow the cluster by one extra worker node.
OpenShift Container Platform IPI Installation Using Libvirt
Prerequisites
Hardware requirements for the cluster:
- 136 GiB RAM (32 GiB per control plane, 20 GiB per compute node), max overcommit ratio of 1.5 (make sure enough swap is available)
- 52 vCPUs (12 per control plane, 8 per compute node), max overcommit ratio of 1.3 (higher might work, but will slow down the installation horribly and may ultimately fail)
- one physical network interface that will be used for the public bridged network
- a physical or virtual network interface that will be used for the provisioning network bridge
Hardware requirements for the installation client (provisioner) machine:
- a minimum of 8 GiB RAM and 4 CPUs
- a network connection to both the public bridged network and the provisioning network
Due to the fact provisioner needs access to both networks, and the provisioning network in this guide is a virtual one, it might be best if you define the provisioner as a VM, with the same network interface settings as the control/compute nodes.
In the case you want to run the workloads spread across several hypervisor hosts, there are some extra steps, but nothing big. More on that in #Installation Spanning Multiple Hypervisors below.
Software artifacts needed on the provisioner host:
-
oc
, the command line client, of the corresponding version - download from https://mirror.openshift.com/pub/openshift-v4/clients/ocp/ -
libvirt-client
package is required foropenshift-baremetal-install
to be able to communicate to hypervisor(s) -
ipmitool
or some other IPMI client - a
pull-secret
file containing authentication credentials for OpenShift Container Platform registries - download from https://console.redhat.com/openshift/ - an SSH keypair that can be used for accessing OpenShift nodes
IMPORTANT: The external IP addresses of cluster nodes must be assigned by your infrastructure DHCP server.
Host Configuration
Beyond the logical requirement of having libvirt
installed and started, here are the other configuration details for the hypervisor.
Network Settings
First thing you definitely need to make sure of, is that IP forwarding is enabled.
$ sysctl net.ipv4.ip_forward 1
Linux network settings need to be configured to have two Linux bridges, a public and a private provisioning one.
- public bridge, call it
bridge0
, needs to have the public network interface enslaved to it - private bridge, call it
provbr0
, can be a virtual bridge since it is only needed for the provisioning network, which is supposed to be isolated and without any infrastructure services (such as DHCP, DNS, etc.)
It would be wonderful if the bridges could be OpenVSwitch ones, but unfortunately the Terraform bundled with openshift-baremetal-install
currently does not include an OpenVSwitch provider, so there's goodbye to that.
As an example, here is my host configuration.
Public bridge:
$ ip addr show bridge0 6: bridge0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 48:21:0b:57:0e:06 brd ff:ff:ff:ff:ff:ff inet 172.25.35.2/24 brd 172.25.35.255 scope global noprefixroute bridge0 valid_lft forever preferred_lft forever inet6 fe80::4a21:bff:fe57:e06/64 scope link valid_lft forever preferred_lft forever $ ip addr show enp86s0 2: enp86s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master bridge0 state UP group default qlen 1000 link/ether 48:21:0b:57:0e:06 brd ff:ff:ff:ff:ff:ff $ bridge link | grep "master bridge0" 2: enp86s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state forwarding priority 32 cost 100
Provisioning bridge:
$ ip addr show provbr0 5: provbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether ce:70:26:9c:88:a4 brd ff:ff:ff:ff:ff:ff inet 10.1.1.2/24 brd 10.1.1.255 scope global noprefixroute provbr0 valid_lft forever preferred_lft forever inet6 fe80::cc70:26ff:fe9c:88a4/64 scope link valid_lft forever preferred_lft forever $ bridge link | grep "master provbr0"
Installation Spanning Multiple Hypervisors
If you want to have your cluster spanning multiple hypervisors, make sure there is also a VXLAN connection between all the provisioning bridges.
You can do that by creating a vxlan
type interface, which is a slave connection of type bridge
, and the master is set to provbr0
. Choose any unique VXLAN ID, and make sure it is the same on all interconnected hosts.
As an example, here is one VXLAN interface connecting hypervisor A to B.
$ nmcli con show provbr0-vxlan10 | grep -E '^(connection|vxlan)' | grep -vE '(default|uuid|--|-1|unknown)' connection.id: provbr0-vxlan10 connection.type: vxlan connection.interface-name: provbr0-vxlan10 connection.autoconnect: yes connection.autoconnect-priority: 0 connection.timestamp: 1703164860 connection.read-only: no connection.master: provbr0 connection.slave-type: bridge connection.gateway-ping-timeout: 0 vxlan.id: 10 vxlan.local: 172.25.35.2 vxlan.remote: 172.25.35.3 vxlan.source-port-min: 0 vxlan.source-port-max: 0 vxlan.destination-port: 4790 vxlan.tos: 0 vxlan.ttl: 0 vxlan.ageing: 300 vxlan.limit: 0 vxlan.learning: yes vxlan.proxy: no vxlan.rsc: no vxlan.l2-miss: no vxlan.l3-miss: no
And this is the corresponding VXLAN interface definition connecting host B to A.
$ nmcli con show provbr0-vxlan10 | grep -E '^(connection|vxlan)' | grep -vE '(default|uuid|--|-1|unknown)' connection.id: provbr0-vxlan10 connection.type: vxlan connection.interface-name: provbr0-vxlan10 connection.autoconnect: yes connection.autoconnect-priority: 0 connection.timestamp: 1697549049 connection.read-only: no connection.master: provbr0 connection.slave-type: bridge connection.gateway-ping-timeout: 0 vxlan.id: 10 vxlan.local: 172.25.35.3 vxlan.remote: 172.25.35.2 vxlan.source-port-min: 0 vxlan.source-port-max: 0 vxlan.destination-port: 4790 vxlan.tos: 0 vxlan.ttl: 0 vxlan.ageing: 300 vxlan.limit: 0 vxlan.learning: yes vxlan.proxy: no vxlan.rsc: no vxlan.l2-miss: no vxlan.l3-miss: no
In this case, bridge link
will of course initially also show the provbr0-vxlan10
interface as a slave, and will not show empty as above.
$ bridge link | grep "master provbr0" 7: provbr0-vxlan10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master provbr0 state forwarding priority 32 cost 100
libvirt
Settings
Your libvirt
will of course need to know about those network bridges in order to be able to attach VMs to them.
For that, you will need two network definitions, looking a bit like the following XML. Make sure they are autostart for least headache.
$ sudo virsh net-dumpxml external <network> <name>external</name> <uuid>whatever</uuid> <forward mode='bridge'/> <bridge name='bridge0'/> </network> $ sudo virsh net-dumpxml provisioning <network> <name>provisioning</name> <uuid>whatever</uuid> <forward mode='bridge'/> <bridge name='provbr0'/> </network>
Additionally, you want to ensure that the storage pool is big enough, but that is not directly related to the subject at hand.
$ sudo virsh pool-info default Name: default UUID: whatever State: running Persistent: yes Autostart: yes Capacity: 250.92 GiB Allocation: 0 GiB Available: 250.92 GiB
VirtualBMC
The most important part of IPI facilitation is to be able to simulate a baseboard management controller for your VMs. libvirt
obviously doesn't do this, but luckily there's a small bit of Python code that does, and it's called virtualbmc
.
In most Python environments you can install it using pip3
, just make sure pip3
is up-to-date first.
$ pip3 install --upgrade pip ... $ pip3 install virtualbmc ...
This gives you /usr/local/bin/vbmcd
which you can control using the following systemd unit:
[Unit] Description=vbmcd [Service] Type=forking ExecStart=/usr/local/bin/vbmcd [Install] WantedBy=multi-user.target
Put the above content into /etc/systemd/system/vbmcd.service
, reload systemd, and enable/start the service.
$ sudo systemctl daemon-reload $ sudo systemctl enable --now vbmcd
You now have the ability to associate a TCP port with a virtual machine defined on the hypervisor host, and have it simulate an IPMI BMC for that VM!
Virtual Machine Configuration
VM Definitions
The virtual machines need to be configured with sufficient amount of compute resources, as per #Prerequisites above.
This section ties into the #Network Settings section above. You need two bridges on your hypervisor(s), bridge0
and provbr0
.
An example control plane node definition in libvirt XML
would look like this:
<domain type='kvm'> <name>controlplane1</name> <memory unit='GiB'>32</memory> <currentMemory unit='GiB'>32</currentMemory> <vcpu placement='static'>12</vcpu> <os> <type arch='x86_64' machine='q35'>hvm</type> <boot dev='hd'/> <boot dev='network'/> <bootmenu enable='yes'/> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-model' check='partial'> <model fallback='allow'/> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/controlplane1-vda.qcow2'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='usb' index='0' model='qemu-xhci' ports='15'/> <controller type='pci' index='0' model='pcie-root'/> <interface type='bridge'> <mac address='52:54:00:00:fb:11'/> <source bridge='provbr0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <interface type='bridge'> <mac address='52:54:00:00:fa:11'/> <source bridge='bridge0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </interface> <console type='pty'/> <channel type='unix'> <source mode='bind'/> <target type='virtio' name='org.qemu.guest_agent.0'/> </channel> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' autoport='yes' listen='0.0.0.0'/> <video> <model type='virtio'/> </video> <memballoon model='virtio'/> <rng model='virtio'> <backend model='random'>/dev/urandom</backend> </rng> </devices> </domain>
A couple of things to note:
- the first network interface is attached to
provbr0
, and its PCI address is lower (0x03), causing it to be the PXE default device - the second network interface is attached to
bridge0
, and its PCI address is higher (0x09), making it the second interface (for external connections) - boot order is set to hard disk first, network second, which means the host will only PXE boot if the disk image is unbootable
- the disk image needs to be 64GiB in size at the minimum, but you can make it larger and/or add more disk images if you intend to use the local storage operator
When configuring other nodes, simply remember to change the node name, disk image name, and MAC addresses to be unique. Adjust hardware resources accordingly for compute nodes.
IPMI BMC
What is important for OCP IPI to be able to perform the installation properly is to register each virtual machine with vbmcd
and assign it with a port.
$ vbmc add --port=6211 controlplane1 $ vbmc list +-------------------+---------+---------+------+ | Domain name | Status | Address | Port | +-------------------+---------+---------+------+ | controlplane1 | down | :: | 6211 | +-------------------+---------+---------+------+ $ vbmc start controlplane1 $ vbmc list +-------------------+---------+---------+------+ | Domain name | Status | Address | Port | +-------------------+---------+---------+------+ | controlplane1 | running | :: | 6211 | +-------------------+---------+---------+------+ $ sudo ss -aunp | grep 6211 UNCONN 0 0 *:6211 *:* users:(("vbmcd",pid=766290,fd=21))
There is no need to run the vbmc
client as root as it is the daemon that is running as root and can see all the VMs accessible through the qemu:///system
URL.
Of course there are options. For any VM you add, you can specify a custom set of IPMI admin credentials (options --username
and --password
, they default to admin
and password
), a custom libvirt URL and credentials if necessary (options --libvirt-uri
, --libvirt-sasl-username
, and --libvirt-sasl-password
), and a custom IP address to listen on (defaults to all addresses, use option --address
to restrict it).
$ vbmc show controlplane1 +-----------------------+-------------------+ | Property | Value | +-----------------------+-------------------+ | active | True | | address | :: | | domain_name | controlplane1 | | libvirt_sasl_password | *** | | libvirt_sasl_username | None | | libvirt_uri | qemu:///system | | password | *** | | port | 6211 | | status | running | | username | admin | +-----------------------+-------------------+
Once started (which just opens the port) you can test the BMC connection using ipmitool
or similar.
$ ipmitool -I lanplus -H localhost -p 6211 -U admin -P password chassis status System Power : off Power Overload : false Power Interlock : inactive Main Power Fault : false Power Control Fault : false Power Restore Policy : always-off Last Power Event : Chassis Intrusion : inactive Front-Panel Lockout : inactive Drive Fault : false Cooling/Fan Fault : false
That's it! We're ready to install OCP!
Installer Configuration
This is where it all comes together. You need to execute the steps in this section (and next) on the provisioner host, that is, a system that has access to both external and provisioning networks of the OpenShift cluster-to-be. The access need not be direct, it can be routed, but if you, as in our example, configured the provisioning network to be an isolated virtual bridge, you will be best off by creating an additional VM that is directly connected to both bridges.
Gathering the Bits Together
First step is to make sure the following artifacts are available:
-
oc
-
pull-secret
- an SSH public key for compute node access
The global cluster network settings that we will need to configure are:
- the parent DNS domain of the cluster (such as
example.com
) - the name of the cluster (concatenated with DNS domain, for example
mycluster
will becomemycluster.example.com
) - the provisioning network CIDR (in our case, it can be any IP address block not overlapping with the external network as the provisioning network is isolated)
- from the external network address space, a designated:
- API server VIP (
api.mycluster.example.com
should point to it) - ingress load balancer VIP (any host within
apps.mycluster.example.com
should resolve to it, usually via a wildcard record)
- API server VIP (
Additionally, you will need, for each node its:
- node name
- provisioning interface MAC address
- IPMI BMC address and port
- IPMI BMC credentials
- the name of the provisioning interface as seen from within the VM (re #Virtual Machine Configuration above - since the PCI address of the interface is bus 0x0 slot 0x3 it will be named
enp0s3
)
As already said, DHCP address assignment to external interfaces is not managed by the installer. It must be handled by your infrastructure.
Preparing the Software
First, make sure your pull-secret
file is on the provisioner.
$ ls -l pull-secret -rw-r-----. 1 provisioner provisioner 2734 Oct 27 12:21 pull-secret
Then after downloading oc
(if you didn't do that already)...
$ curl -OL https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable-4.14/openshift-client-linux.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 60.9M 100 60.9M 0 0 4287k 0 0:00:14 0:00:14 --:--:-- 5642k $ tar xf openshift-client-linux.tar.gz oc $ ./oc version Client Version: 4.14.9 Kustomize Version: v5.0.1 $ sudo cp ./oc /usr/local/bin/oc $ rm -f ./oc ./openshift-client-linux.tar.gz
...and optionally generating a bash completion file...
$ oc completion bash > oc.completion $ sudo cp oc.completion /etc/bash_completion.d/oc $ rm -f oc.completion $ source /etc/bash_completion.d/oc
...you can use it to extract openshift-baremetal-install
from the release image you intend to use.
$ curl -s https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable-4.14/release.txt | grep "Pull From:" Pull From: quay.io/openshift-release-dev/ocp-release@sha256:f5eaf0248779a0478cfd83f055d56dc7d755937800a68ad55f6047c503977c44 $ oc adm release extract --registry-config=pull-secret --command=openshift-baremetal-install --to=. \ quay.io/openshift-release-dev/ocp-release@sha256:f5eaf0248779a0478cfd83f055d56dc7d755937800a68ad55f6047c503977c44 $ ./openshift-baremetal-install version ./openshift-baremetal-install 4.14.9 built from commit dfafb5ca972a6ed4677257aebfe4f284ac020830 release image quay.io/openshift-release-dev/ocp-release@sha256:f5eaf0248779a0478cfd83f055d56dc7d755937800a68ad55f6047c503977c44 release architecture amd64 $ sudo cp ./openshift-baremetal-install /usr/local/bin/ $ rm -f ./openshift-baremetal-install $ openshift-baremetal-install completion bash > oinst.completion $ sudo cp oinst.completion /etc/bash_completion.d/oinst $ rm -f oinst.completion
NOTE: The extraction process can take up to 5 minutes, depending on your network speed and Quay.io responsiveness.
Creating Installer Configuration
Installer configuration file can be created interactively, using openshift-baremetal-install
, or you can simply write it out with an editor.
Using Interactive Mode
IMPORTANT: In interactive mode, machine network (the network segment the nodes have their IPs allocated from) is expected to be 10.0.0.0/16
. If your VIPs are not on that network, the installer will fail. If your machine network is not 10.0.0.0/16
, the only way to install is to create install-config.yaml
manually.
IMPORTANT: Interactive installer expects three control plane nodes and three workers. It will fail if you add less than three of each type of nodes to the cluster.
The interactive mode will ask you a series of questions and generate install-config.yaml
at the end, but there are very few customisation options using this method.
Initially, you must select the SSH public key to be published on cluster nodes, and specify some general settings:
$ openshift-baremetal-install --dir=cluster create install-config ? SSH Public Key /home/provisioner/.ssh/id_rsa.pub ? Platform baremetal ? Provisioning Network Managed ? Provisioning Network CIDR 172.22.0.0/24 ? Provisioning bridge provbr0 ? Provisioning Network Interface enp0s3 ? External bridge bridge0 ? Add a Host: [Use arrows to move, type to filter] > control plane worker
As you can see above, after those have been answered, you may add any number of nodes, either control plane or worker.
For both type of nodes, the questions are the same:
? Add a Host: control plane ? Name controlplane1 ? BMC Address ipmi://192.168.1.1:6211 ? BMC Username admin ? BMC Password ******** ? Boot MAC Address 52:54:00:00:fb:11 ? Add another host? (y/N) y
After all the hosts have been added, there is a final series of cluster-related questions:
? Add another host? No ? Base Domain example.com ? Cluster Name mycluster ? Pull Secret [? for help] ****************************...
At this point, some checks are performed against DNS server to see if api.mycluster.example.com
resolves, and if the IP is a part of the machine network. The same check is performed for the ingress VIP.
NOTE: If your VIPs are not on the default machine network, the installer will fail at this point.
FATAL failed to fetch Install Config: failed to generate asset "Install Config": invalid install config: [platform.baremetal.apiVIPs: Invalid value: "172.25.3.10": IP expected to be in one of the machine networks: 10.0.0.0/16, platform.baremetal.ingressVIPs: Invalid value: "172.25.3.11": IP expected to be in one of the machine networks: 10.0.0.0/16]
NOTE: If you added fewer than three control plane nodes, or fewer than three workers, the installer will fail at this point.
FATAL failed to fetch Install Config: failed to generate asset "Install Config": invalid install config: [...]
Creating Configuration Manually
The installer configuration file consists of a number of mandatory sections:
- cluster domain and name
- network settings (such as machine and cluster IP ranges)
- control plane and compute node settings
- infrastructure platform settings
- pull secret and ssh key
There are also some optional sections which are useful in special cases such as disconnected installation or special install modes.
NOTE: There is an explain
subcommand in the installer binary that explains the structure of the installconfig
resource.
$ openshift-baremetal-install explain installconfig KIND: InstallConfig VERSION: v1 RESOURCE: <object> InstallConfig is the configuration for an OpenShift install. FIELDS: additionalTrustBundle <string> AdditionalTrustBundle is a PEM-encoded X.509 certificate bundle that will be added to the nodes' trusted certificate store. ... $ openshift-baremetal-install explain installconfig.platform.baremetal KIND: InstallConfig VERSION: v1 RESOURCE: <object> BareMetal is the configuration used when installing on bare metal. FIELDS: apiVIP <string> Format: ip DeprecatedAPIVIP is the VIP to use for internal API communication Deprecated: Use APIVIPs apiVIPs <[]string> Format: ip APIVIPs contains the VIP(s) to use for internal API communication. In dual stack clusters it contains an IPv4 and IPv6 address, otherwise only one VIP ...
An example install-config.yaml
for baremetal IPI looks like this:
apiVersion: v1 baseDomain: example.com metadata: name: mycluster networking: networkType: OVNKubernetes # These are the networks external IPs will be allocated from. machineNetwork: - cidr: 172.25.3.0/24 # This is the pod network. clusterNetworks: - cidr: 10.200.0.0/14 hostPrefix: 23 # Only one entry is supported. serviceNetwork: - 172.30.0.0/16 compute: - name: worker replicas: 2 controlPlane: name: master replicas: 3 platform: baremetal: {} platform: baremetal: apiVIPs: - 172.25.3.10 ingressVIPs: - 172.25.3.11 provisioningNetwork: Managed provisioningNetworkCIDR: 10.1.1.0/24 provisioningDHCPRange: 10.1.1.200,10.1.1.210 # These settings are to configure the temporary bootstrap node as a VM externalBridge: bridge0 externalMACAddress: '52:54:00:00:fa:0f' bootstrapProvisioningIP: 10.1.1.9 provisioningBridge: provbr0 provisioningMACAddress: '52:54:00:00:fb:0f' # This needs to be done to avoid nested virtualisation if provisioner is a VM. libvirtURI: qemu+ssh://root@hypervisor.example.com/system hosts: - name: controlplane1 role: master bmc: address: ipmi://hypervisor.example.com:6211 disableCertificateVerification: true username: admin password: password # This is the provisioning network interface. bootMACAddress: 52:54:00:00:fb:11 bootMode: legacy # We need this for proper targetting of root device (vda). hardwareProfile: libvirt - name: controlplane2 role: master bmc: address: ipmi://hypervisor.example.com:6212 disableCertificateVerification: true username: admin password: password bootMACAddress: 52:54:00:00:fb:12 bootMode: legacy hardwareProfile: libvirt - name: controlplane3 role: master bmc: address: ipmi://hypervisor.example.com:6213 disableCertificateVerification: true username: admin password: password bootMACAddress: 52:54:00:00:fb:13 bootMode: legacy hardwareProfile: libvirt - name: worker1 role: master bmc: address: ipmi://hypervisor.example.com:6221 disableCertificateVerification: true username: admin password: password bootMACAddress: 52:54:00:00:fb:21 bootMode: legacy hardwareProfile: libvirt - name: worker2 role: master bmc: address: ipmi://hypervisor.example.com:6222 disableCertificateVerification: true username: admin password: password bootMACAddress: 52:54:00:00:fb:22 bootMode: legacy hardwareProfile: libvirt pullSecret: '{"auths":{"cloud.openshift.com":{"auth":"...","email":"..."},"quay.io":{"auth":"...","email":"..."},...}}' sshKey: "ssh-rsa ..."
Installation
With the above installation configuration file created, place a copy of it in a subdirectory, such as ./mycluster/
and run the installer.
$ mkdir mycluster $ cp install-config.yaml ./mycluster/ $ openshift-baremetal-install --dir=./mycluster/ --log-level=debug create cluster DEBUG OpenShift Installer 4.14.9 DEBUG Built from commit dfafb5ca972a6ed4677257aebfe4f284ac020830 DEBUG Fetching Metadata... DEBUG Loading Metadata... ... DEBUG Loading Install Config... DEBUG Loading Bootstrap Ignition Config... ... INFO Consuming Install Config from target directory ... INFO Obtaining RHCOS image file from 'https://rhcos.mirror.openshift.com/art/storage/prod/streams/4.14-9.2/builds/414.92.202310210434-0/x86_64/rhcos-414.92.202310210434-0-qemu.x86_64.qcow2.gz?sha256=aab55f3ee088b88562f8fdcde5be78ace023e06fa01263e7cb9de2edc7131d6f' ... INFO Creating infrastructure resources... ...
When you see the above message, check the hypervisor for the presence of the temporary bootstrap VM.
$ virsh list Id Name State --------------------------------------- 4 provisioner running 5 mycluster-tmkmv-bootstrap running
Once you see the VM running, you can inspect any containers on it by using the SSH key configured in install config to log into it and having a look around.
$ ssh -i ~/.ssh/id_rsa core@bootstrap.mycluster.example.com The authenticity of host 'bootstrap.mycluster.example.com (172.25.3.9)' can't be established. ... Red Hat Enterprise Linux CoreOS 414.92.202310210434-0 Part of OpenShift 4.14, RHCOS is a Kubernetes native operating system managed by the Machine Config Operator (`clusteroperator/machine-config`). WARNING: Direct SSH access to machines is not recommended; instead, make configuration changes via `machineconfig` objects: ... [core@localhost ~]$ sudo -i [root@localhost ~]# systemctl is-active crio active [root@localhost ~]# systemctl is-active bootkube active [root@localhost ~]# podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES da3b3e74fc7f quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:... /bin/rundnsmasq About a minute ago Up About a minute dnsmasq [root@localhost ~]# crictl pods POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME af9df933d3b91 49 seconds ago Ready etcd-bootstrap-member-localhost.localdomain openshift-etcd 0 (default) [root@localhost ~]# crictl ps CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD 9ad74f6433a33 753b0a16ba606f4c579690ed0035.. 6 seconds ago Running etcd 0 af9df933d3b91 etcd-bootstrap-member-localhost.localdomain 5c9371b25e646 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:... 6 seconds ago Running etcdctl 0 af9df933d3b91 etcd-bootstrap-member-localhost.localdomain
Once you see pods like kube-system
, openshift-kube-apiserver
, and openshift-cluster-version
show up in ready state, you can leave the shell and use the generated kubeconfig
file to track the progress of the installation from the provisioner machine.
$ export KUBECONFIG=$(pwd)/cluster/auth/kubeconfig $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 15m Unable to apply 4.14.9: an unknown error has occurred: MultipleErrors $ oc get clusteroperators $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication baremetal cloud-controller-manager cloud-credential True False False 15m cluster-autoscaler config-operator console control-plane-machine-set csi-snapshot-controller dns etcd image-registry ingress insights kube-apiserver kube-controller-manager kube-scheduler kube-storage-version-migrator machine-api machine-approver machine-config marketplace monitoring network node-tuning openshift-apiserver openshift-controller-manager openshift-samples operator-lifecycle-manager operator-lifecycle-manager-catalog operator-lifecycle-manager-packageserver service-ca storage
NOTE: It is normal for cluster version and various cluster operators to report transient error states as the progress of one impacts the progress of others. Eventually all these errors should go away.