Installation guide for warewulf4
Warewulf
Preface
In High Performance Computing (HPC), computing tasks are usually distributed among many compute threads which are spread across multiples cores, sockets and machines. These threads are tightly coupled together. Therefore, compute clusters consist of a number of largely identical machines that need to be managed to maintain a well-defined and identical setup across all nodes. Once clusters scale up, there are many scalability factors to overcome. Warewulf is there to address this ‘administrative scaling’.
Warewulf is an operating system-agnostic installation and management system
for HPC clusters.
It is quick and easy to learn and use as many settings are pre-configured
to sensible defaults. Also, it still provides the flexibility allowing fine
tuning the configuration to local needs.
It is released under the BSD license, its source code is available at
https://github.com/warewulf/warewulf. This is where the development happens
as well.
This article gives an overview on how to set up Warewulf on openSUSE Leap 15.5 or openSUSE Tumbleweed.
Installing Warewulf
Compute clusters consist of at least one management (or head) node which is usually multi-homed connected both to an external network and a cluster private network, as well as multiple compute nodes which reside solely on the private network. Other private networks dedicated to high speed tasks like RDMA and storage access may exist as well. Warewulf gets installed on one of the management nodes of a cluster to manage and oversee the installation and management of the compute nodes. To install Warewulf on a cluster which is running openSUSE Leap 15.5 or openSUSE Tumbleweed, simply run:
zypper install warewulf
This package seamlessly integrates into a SUSE system and should therefore be preferred over packages provided on Github.
During the installation, the actual network configuration is written to
/etc/warewulf/warewulf.conf
. These settings should be verified, as for
multi homed hosts a sensible pre-configuration is not always possible.
Check /etc/warewulf/warewulf.conf
for the following values:
ipaddr: 172.16.16.250
netmask: 255.255.255.0
network: 172.16.16.0
where ipaddr
should be the IP address of this management host.
Also check the values of netmask
and network
- these should match
this network.
Additionally, you may want to configure the IP addresse range for dynamic/unknown hosts:
dhcp:
range start: 172.16.26.21
range end: 172.16.26.50
If the ISC DHCP server (dhcpd
) is used (default on SUSE), make sure the
value of DHCPD_INTERFACE
in the file /etc/sysconfig/dhcpd
has been
set to the correct value.
You are now ready to start the warewulfd
service itself which delivers the
images to the nodes:
systemctl enable --now warewulfd.service
Now wwctl
can be used to configure the the remaining services needed by
Warewulf. Run:
wwctl configure --all
which will configure all Warewulf related services.
To conveniently log into compute nodes, you should now log out of and back into the Warewulf host, as this will create an ssh key on the Warewulf host which allows password-less login to the compute nodes. Note however, that this key is not yet pass-phrase protected. If you require protecting your private key by a pass phrase, it is probably a good idea to do so now:
ssh-keygen -p -f $HOME/.ssh/cluster
Adding nodes and profiles to Warewulf
Warewulf uses the concept of profiles which hold the generalized settings
of the individual nodes. It comes with a predefined profile default
, to
which all new node will be assigned, if not set otherwise. You may obtain
the values of the default profile with:
wwctl profile list default
Now, a node can be added with the command assigning it an IP address:
wwctl node add node01 -I 172.16.16.101
if the MAC address is known for this node, you can specify this as well:
wwctl node add node01 -I 172.16.16.101 -H cc:aa:ff:ff:ee
For adding several nodes at once you may also use a node range, e.g.
wwctl node add node[01-10] -I 172.16.16.101
This will add the nodes with ip addresses starting at the specified address and incremented by Warewulf.
Importing a container
Warewulf uses a special 1 container as base to build OS images for the compute nodes. This is self contained and independent of the operating system installed on the Warewulf host.
To import an openSUSE Leap 15.5 container use the command
wwctl container import docker://registry.opensuse.org/science/warewulf/leap-15.5/containers/kernel:latest leap15.5 --setdefault
This will import the specified container for the default profile.
Alternative container sources
Alternative containers are available from the openSUSE registry under the science project at:
https://registry.opensuse.org/cgi-bin/cooverview?srch_term=project%3D%5Escience%3A
or from the upstream Warewulf community repository:
https://github.com/orgs/warewulf/packages?repo_name=warewulf-node-images
It is also possible to import an image from a local installation into a
directory (chroot
directories) by using the path to this directory as
argument for wwctl import
.
Booting nodes
As a final preparation you should rebuild the container image, now, by running:
wwctl container build leap15.5
as well as all the configuration overlays with the command:
wwctl overlay build
just in case the build of the image may have failed earlier due to an error. If you didn’t assign a hardware address to a node before, you should set the node into the discoverable state before powering it on. This is done with:
wwctl node set node01 --discoverable
Also you should run:
wwctl configure hostlist
to add the new nodes to the file /etc/hosts
.
Now you should make sure that the node(s) will boot over PXE from the network
interface connected to the specified network and power on the node(s) to
boot into assigned image.
Additional configuration
The configuration files for the nodes are managed as Golang text templates. The resulting files are overlayed over the node images. There are two types of overlays depending on how they are added to the compute node:
- the system overlay which is ‘baked’ into the image during boot as part of
the
wwinit
process. - the runtime overlay which is updated on the nodes on a regular base
(1 minute per default) via the
wwclient
service.
In the default configuration the overlay called wwinit
is used as system
overlay. You may list the files in this overlays with the command:
wwctl overlay list wwinit -a
which will show a list of all the files in the overlays. Files ending with the
suffix .ww
are interpreted as template by Warewulf, the suffix is removed
in the rendered file.
To inspect the content of an overlay file use the command:
wwctl overlay show wwinit /etc/issue.ww
To render the template using the values for node01 use:
wwctl overlay show wwinit /etc/issue.ww -r node01
The overlay template itself may be edited using the command:
wwctl overlay edit wwinit /etc/issue.ww
Please note that after editing templates, the overlays aren’t updated automatically and you should trigger a rebuild with the command:
wwctl overlay build
The variables available in a template can be listed with
wwctl overlay show debug /warewulf/template-variables.md.ww
Modifying the container
The node container is a self contained operating system image. You can open a shell in the image with the command:
wwctl container shell leap15.5
After you have opened a shell, you may install additional software using
zypper
.
The shell command provides the option --bind
which allows mounting arbitrary
host directories into the container during the shell session.
Please note that if a command exits with a non-zero status, the image won’t be rebuilt automatically. Therefore, it is advised to rebuild the container with:
wwctl conainer build leap15.5
after any change.
Network configuration
Warewulf allows configuring multiple network interfaces for the compute nodes. Therefore, you can add another network interface for example for infiniband using the command:
wwctl node set node01 --netname infininet -I 172.16.17.101 --netdev ib0 --mtu 9000 --type infiniband
This will add the infiniband interface ib0
to the node node01
. You can now
list the network interfaces of the node:
wwctl node list -n
As changes in the settings are not propagated to all configuration files, the node overlays should be rebuilt after this change by running the command:
wwctl overlay build
After a reboot, these changes will be present on the nodes; in the above case the Infiniband interface will be active on the node.
A more elegant way to get the same result is to create a profile to hold those
values which are identical for all interfaces. In this case, these are mtu
and netdev
.
Create a new profile for an Infiniband network using the command:
wwctl profile add infiniband-nodes --netname infininet --netdev ib0 --mtu 9000 --type infiniband
You may now add this profile to a node and remove the node specific settings which are now part of the common profile by executing:
wwctl node set node01 --netname infininet --netdev UNDEF --mtu UNDEF --type UNDEF --profiles default,infiniband-nodes
To list the data in a profile use the command:
wwctl profile list -A infiniband-nodes
Secure Boot
Switch to grub boot
By default, Warewulf boots nodes via iPXE, which isn’t signed by SUSE and
can’t be used when secure boot is enabled. In order to switch to grub as
the boot method you must add or change the following value in
/etc/warewulf/warewulf.conf
warewulf:
grubboot: true
After this change, you will have to reconfigure dhcpd
and tftp
executing:
wwctl configure dhcp
wwctl configure tftp
and rebuild the overlays with the command:
wwctl overlay build
Also make sure that the packages shim
and grub2-x86_64-efi
(for x86-64)
or grub2-arm64-efi
(for aarch64) are installed in the container. shim
is
required by secure boot.
Cross product secure boot
With secure boot is enabled on the compute nodes, if you need to boot different
products, you need to make sure that the compute nodes boot with the so-called
‘http’ boot method: For secure boot the signed shim
needs to match the
signature of the other pieces of the boot chain - including the kernel.
However, different products will have different sets of signatures in
their boot chain.
The ‘http’ boot method is handled by warewulfd
. This will look up the image
to boot and pick the right shim from the image to deploy to a particular node.
Therefore, you need to make sure, that the node container contains the shim
package.
The default boot method will extract the initial shim
for PXE boot from the
host running the warewulfd server.
Note, however, that the host system shim
will also be used for nodes which
are in the discoverable
state and subsequently have no hardware address
assigned, yet.
Disk management
It is possible to manage the disks of the compute nodes with Warewulf. Here,
Warewulf itself doesn’t manage the disks, but creates a configuration and
service files for ignition
to do this job.
Prepare container
As ignition
and its dependencies aren’t installed in most of the containers,
you should install the packages ignition
and gptfdisk
in the container.
wwctl container exec <container_name> zypper -n in -y ignition gptdisk
Add disk to configuration
Warewulf boots ephemeral systems, thus there is no need for local disk storage. Still, local disk storage may me useful to have, for instance as scratch storage for computational tasks. Warewulf is capable of setting up local disk storage. For this, it is necessary to configure the involved entities:
- physical storage device(s) to be used
- partition(s) on the disks
- filesystem(s) to be used
Warewulf doesn’t manage these entities itself, but creates a configuration
and service files for ignition
to perform this task.
Therefore, you need to make sure to install ignition
and gptfdisk
on
the compute node. Open a shell in the container and run:
zypper -n in -y zypper install ignition gptfdisk
Disks
The path to the device e.g. /dev/sda
must be used for diskname
.
The only valid configuration option for disks is diskwipe
, which
should be self-explanatory.
Partitions
The partname
is the name to the partition which iginition uses as the path
for the device files, e.g. /dev/disk/by-partlabel/$PARTNAME
.
Additionally, the size and number of the partition need be specified for all but the last partition (the one with the highest number) in which case this partition will be extended to the maximal size possible.
You should also set the boolean variable --partcreate
so that a parition
is created if it doesn’t exist.
Filesystems
Filesystems are defined by the partition which contains them, so the
name should have the format /dev/disk/by-partlabel/$PARTNAME
. A filesystem
needs to have a path if it is to be mounted, but its not mandatory.
Ignition will fail if there is no filesystem type defined.
Examples
You can add a scratch partition with
wwctl node set node01 \
--diskname /dev/vda --diskwipe \
--partname scratch --partcreate \
--fsname scratch --fsformat btrfs --fspath /scratch --fswipe
This will be the only (and last) partition, therefore it does not require a size. To add another partition as a swap partition, you may run:
wwctl node set n01 \
--diskname /dev/vda \
--partname swap --partsize=1024 --partnumber 1 \
--fsname swap --fsformat swap --fspath swap
This adds the partition number 1 which will be placed before
the scratch
partition.
-
This container is special only in that it is bootable, i.e. it contains a kernel and an init-implementation (systemd). ↩