If any file in the `overlay` directory changes, the `build-rootfs.sh`
script needs to be re-run in order to copy the changes into the
destination root and regenerate the SquashFS image.
Instead of copying the Portage configuration files to `/etc/portage` and
`/usr/${target}/etc/portage`, the build scripts now use the
configuration directories from the source directory. This avoids issues
with changes (especially removal of files) getting propagated to the
actual configuration paths.
For some reason, when OverlayFS is mounted at `/etc/ssh`, SELinux
prevents access both `sshd` and `ssh-keygen` access to the files there.
The AVC denials indicate that (some part of) the process is running in
the `mount_t` domain, which is not allowed to read or write `sshd_key_t`
files.
To work around this issue, without granting `mount_t` overly-permissive
access, we now configure the SSH daemon to read host keys from the
persistent data volume directly, instead of "tricking" it with
OverlayFS. The `ssh-keygen` tool does not read the `HostKey` options
from `sshd_config`, though, so it has to be explicitly instructed to
create keys in this alternate location. By using a systemd template
unit with `ConditionPathExists`, we avoid regnerating the keys on every
boot, since the `ssh-keygen` command is only run if the file does not
already exist.
Enabling SELinux on the target system needs build-time and run-time
configuration changes for ther kernel and userspace. Additionally,
SELinux requires a policy that defines allowed operations. Gentoo
provides a reasonable baseline for all of these changes, but some
modifications are required.
First and foremost, the Gentoo SELinux policy is missing several
necessary rules for systemd-based systems. Notably, services that use
alternate namespaces will fail to start because the base policy does not
allow systemd components the necessary privileges, so these rules have
to be added. Similarly, `systemd-journald` needs additional privileges
in order to be able to capture all metadata for processes generating
syslog messages. Finally, additional rules are necessary in order to
allow systemd to create files and directories prior to launching
servies.
Besides patching the policy, we also do some hackery to avoid shipping
the Python runtime in SELinux-enabled builds. Several SELinux-related
packages, including *libselinux* and *policycoreutils* have dependencies
on Python modules for some of their functionality. Unfortunately, the
Python build system does NOT properly cross-compile native extension
modules, so this functionality is not available on the target system.
Fortunately, none of the features provided by these modules are actually
needed at runtime, so we can safely ignore them and thus omit the entire
Python runtime and all Python programs from the final image.
It is important to note that it is impossible to build an
SELinux-enabled image on a host that is itself SELinux-enabled.
Operations such as changing file labels are checked against the SELinux
policy in the running kernel, and may be denied if the target policy
differs significantly from the running policy. The `setfiles` command
fails, for example, when run on a Fedora host. As such, building an
SELinux-enabled system should be done in a virtual machine using a
kernel that does not have a loaded SELinux policy. The `ocivm` script
can be used to create a suitable runtime from a container image.
The Portage packages that need to be built and/or installed are now
specified in the `build.packages` and `install.packages` files,
respectively. Similarly, packages to be installed on the host system
are specified in `host-tools.packages`. Finally, the
`installonly.packages` file contains a list of packages that are
installed in the destination root, but not built in the sysroot
beforehand.
This allows `make` to better track when the package sets change. It
will also make it easier to maintain different sets for different
variants in the future.
This script uses the `ocivm` tool to launch a QEMU micro VM to build
the operating system. This is necessary to produce an SELinux-enabled
system, since container runtimes interfere with the SELinux policy
build and filesystem labeling processes.
Since we have to build *sys-libs/libcap* with the default Portage
configuration in order to avoid the circular dependency with PAM,
our configuration for binary package builds is not yet in place. We
need to explicitly specify where to put the built packages and enable
multi-instance packages.
By default, `tar` copies file ownership UID/GID. This works fine when
the build is running in a rootless container, since the source UID/GID
numbers are mapped to 0/0 inside the container. In other scenarios,
though, such as building in a microvm with the source directory on a
shared filesystem, the original numbers are preserved. We need to
explicitly state that the files must be owned by root.
When running inside a QEMU microvm with the source directory shared
via 9pfs, the kernel build process fails
> Error: Could not mmap file: vmlinux
Thus, we need to run the build in a path on a local filesystem. To
support this, the Makefile now supports an `O` option, and all the build
scripts have been adjusted to make use of it as needed.
Since building in a local filesystem would ultimately discard the final
artifacts when the VM terminates, we need yet a different location for
the files we want to keep. The `IMAGESDIR` option can be used to
specify this path. This path can be on a shared filesystem, thus
saving the artifacts outside the microvm.
Several packages end up with circular dependencies, depending on which
Portage profile is selected. The default profiles have a circular
dependency between *sys-libs/pam* and *sys-libs/libcap*. Systemd and
SELinux profiles have even more issues.
We can break the circular dependencies by explicitly building *libcap*
with`USE=-pam` first, which happens to be the default configuration
generated by `crossdev`. Then, we need to switch to a more complete
profile in order to build *glibc* and *util-linux*. At this point, the
build root should be complete enough to build anything without circular
dependencies.
There's really no sense in creating a writable copy of the whole `/etc`
hierarchy at `/run/etc/rw`. Instead, let's just mount overlays at the
paths we want to make writable (which for now is only `/etc/ssh`).
In a "merged-usr" system, `/lib` is a symlink to `/usr/lib`. When
installing *sys-apps/systemd*, Portage checks to ensure this is the
case. If this happens after `make modules_install` is run, `/lib` is
a directory, which causes the installation to fail. To avoid this, we
need to explicitly install the modules into `/usr/lib` so that the
symlink can be created later.
Building the OS is now as simple as running `make` on a Gentoo system.
Interestingly, when `make` is executed as a (grand)child process of
another `make` process, it always prints an `Entering directory ...`
message. This breaks the `make kernelversion` command, by adding
extraneous text to the output.
The *ldconfig.service* fails because `/etc` is not writable and thus
`/etc/ld.so.cache` cannot be generated.
The files specified in the `provision.d` *tmpfiles.d(5)* configuration
are unnecessary, and many of them cannot be created at runtime because
the root filesystem is immutable.
When running inside a rootless Podman container on a SELinux-enabled
host, the `patch` command fails because it cannot copy SELinux labels
from the original file to the patched file. This only happens patching
files that are located in a bind mount.