We're going to want the ability for processes to have unique categories,
to enforce separation of container processes. Gentoo's SELinux policy
supports both Multi-Category Security and Multi-Level Security modes,
although the latter does not seem to work out of the box.
*systemd-tmpfiles* can create btrfs subvolumes with the `v` entry type.
Using this mechanism instead of the `init-storage` script will allow for
greater flexibility when adding other subvolumes later.
Unfortunately, the default configuration for *systemd-tmpfiles* already
includes an entry for `/var/log` with the `d` (directory) type. Since
individual entries cannot be overridden, we need to modify this entry.
Some *tmpfiles.d(5)* entries specify paths in the immutable root
filesystem. These need to be created at build time to prevent
*systemd-tmpfiles-setup.service* from failing at runtime.
Using `tar` to copy files and directories from the overlay directory to
the destination root preserves their timestamps. This is not really
desirable, particularly for directories, because it may result in the
destination paths appearing older than the build. This is especially
problematic for `/usr`, since its timestamps are important for systemd
units that use `ConditionNeedsUpdate` settings.
To ensure the timestamps are set correctly, we now use `rsync` to copy
the overlay, with the `-O` (`--omit-dir-times`) argument, to avoid
changing the timestamps of directories. Additionally, we explicitly
update the timestamp of `/usr` to ensure that every new build triggers
the "needs update" condition.
Instead of copying the Portage configuration files to `/etc/portage` and
`/usr/${target}/etc/portage`, the build scripts now use the
configuration directories from the source directory. This avoids issues
with changes (especially removal of files) getting propagated to the
actual configuration paths.
Enabling SELinux on the target system needs build-time and run-time
configuration changes for ther kernel and userspace. Additionally,
SELinux requires a policy that defines allowed operations. Gentoo
provides a reasonable baseline for all of these changes, but some
modifications are required.
First and foremost, the Gentoo SELinux policy is missing several
necessary rules for systemd-based systems. Notably, services that use
alternate namespaces will fail to start because the base policy does not
allow systemd components the necessary privileges, so these rules have
to be added. Similarly, `systemd-journald` needs additional privileges
in order to be able to capture all metadata for processes generating
syslog messages. Finally, additional rules are necessary in order to
allow systemd to create files and directories prior to launching
servies.
Besides patching the policy, we also do some hackery to avoid shipping
the Python runtime in SELinux-enabled builds. Several SELinux-related
packages, including *libselinux* and *policycoreutils* have dependencies
on Python modules for some of their functionality. Unfortunately, the
Python build system does NOT properly cross-compile native extension
modules, so this functionality is not available on the target system.
Fortunately, none of the features provided by these modules are actually
needed at runtime, so we can safely ignore them and thus omit the entire
Python runtime and all Python programs from the final image.
It is important to note that it is impossible to build an
SELinux-enabled image on a host that is itself SELinux-enabled.
Operations such as changing file labels are checked against the SELinux
policy in the running kernel, and may be denied if the target policy
differs significantly from the running policy. The `setfiles` command
fails, for example, when run on a Fedora host. As such, building an
SELinux-enabled system should be done in a virtual machine using a
kernel that does not have a loaded SELinux policy. The `ocivm` script
can be used to create a suitable runtime from a container image.
The Portage packages that need to be built and/or installed are now
specified in the `build.packages` and `install.packages` files,
respectively. Similarly, packages to be installed on the host system
are specified in `host-tools.packages`. Finally, the
`installonly.packages` file contains a list of packages that are
installed in the destination root, but not built in the sysroot
beforehand.
This allows `make` to better track when the package sets change. It
will also make it easier to maintain different sets for different
variants in the future.
By default, `tar` copies file ownership UID/GID. This works fine when
the build is running in a rootless container, since the source UID/GID
numbers are mapped to 0/0 inside the container. In other scenarios,
though, such as building in a microvm with the source directory on a
shared filesystem, the original numbers are preserved. We need to
explicitly state that the files must be owned by root.
The *ldconfig.service* fails because `/etc` is not writable and thus
`/etc/ld.so.cache` cannot be generated.
The files specified in the `provision.d` *tmpfiles.d(5)* configuration
are unnecessary, and many of them cannot be created at runtime because
the root filesystem is immutable.