By default, `systemd-tmpfiles` will create normal directories instead of
Btrfs subvolumes unless `/` is already a subvolume. According to
[Lennart][0], this has to do with subvolumes being too "heavy-weight,"
whatever that means.
Fortunately, we can override this nonsense with an environment variable.
[0]: https://github.com/systemd/systemd/pull/1915
Somewhat expectedly, attempting to avoid installing *app-admin/setools*
by listing it in `/etc/portage/profile/package.provided` proved more
trouble than it's worth.
Custom builds of Aimee OS can now specify additional paths under `/etc`
that should be writable. This is accomplished by populating a file
named `/etc/aimee-os/writable-etc` with a list of paths. Each line must
indicate the type of file (regular file: `f`, directory: `d`) and the
*relative* path under `/etc`.
It seems the bug that caused udev rules to be installed in the wrong
location has been fixed. As such, we need to make this corrective
action step conditional, only moving rules files if any are found in the
wrong place.
If multiple patches are provided for the same package, we need to ensure
that they all applied. Previously, only the last patch was applied,
because the ebuilds were copied from the main repository each time,
undoing all previous patches.
The base Aimee OS build does not need any post-installation tasks.
Custom builds can provide a `post-build.sh` script to implement the
tasks they need. For example, builds targeting Raspberry Pi devices
can use this script to install the firmware files.
The `build.packages` and `install.packages` files in the CONFIGDIR now
only need to include *additional* packages to install. The packages
*required* for Aimee OS are always installed, listed in the
corresponding files in the source directory.
Since the container images we're using as a base for the build system
only contain stable packages, setting ACCEPT_KEYWORDS to allow unstable
packages globally can cause a lot of rebuilds and potentially break
things. Instead, we only set ~arch for the packages we actually need
recent versions on the host.
This does not affect packages installed in the target root, of course.
As the scope of Aimee OS grows, and other applications are added to it,
the `init-storage` command will have an ever-growing list of file and
directory types to copy from the rootfs image. Originally, I wanted to
explicitly allow it to only copy files that are found in `/var`, but
this will become untenable very quickly. As such, to avoid having to
constantly update the SELinux policy for every new application that
stores anything in `/var` at install time, the `aimee_storinit_t` domain
can now manage all "non-security" files, directories, and symbolic
links. This covers pretty much everything in `/var` except
`/var/log/audit`, while still excluding the most sensitive files (e.g.
`/etc/shadow`),
Rather than hard-code the GPT partition label into the `init-storage`
and `factory-reset` scripts, these now determine the block device by
reading `/etc/fstab` and using the device specified for `/var`.
The persistent journal is stored in a subdirectory of `/var/log/journal`
named for the current machine ID. Since `/etc/machine-id` is not
writable, the machine ID changes with every boot. This effectively
makes the journal for previous boots inaccessible, so there's really not
much point in keeping them around.
It turns out that we cannot use `systemd-tmpfiles` to create our Btrfs
subvolumes. Since the directories we are interested in, specifically
`/var/log` and `/var/tmp` already exist in the rootfs image and are
therefore copied into the mutable filesystem, `systemd-tmpfiles` ignores
them.
To avoid having to explicitly specify the SELinux context for each
subvolume created on the persistent filesystem, `init-storage` now
executes `setfiles` to set the appropriate labels.
The `set-root-password` command sets up an alternate mount namespace
with a writable `/etc` directory and then runs `passwd` in it. This
allows `passwd` to create its lock files and backup files, without
requiring that the real `/etc` to be mutable. After `passwd` finishes
and has updated its private copy of `/etc/shadow`, the script rewrites
the real one with its contents.
In order for users to be able to log in locally or via SSH without an
authorized key, they will need to have passwords set in `/etc/shadow`.
We do not really want to make all of `/etc` writable, so we will store
the actual `shadow` file on the persistent data volume, in a separate
Btrfs subvolume, and then bind-mount it at `/etc/shadow`.
While this makes `/etc/shadow` mutable, it does not actually let the
`passwd` program modify it. This is because `passwd` creates lock files
and backup files in `/etc`. We will ultimately need a wrapper to
"trick" `passwd` into modifying `/etc/shadow`, without making the whole
`/etc` directory mutable.
Apparently, BusyBox's `cp` does NOT copy SELinux contexts when the `-a`
argument is specified. This differs from GNU coreutils's `cp`, and
explains why the files copied from the rootfs image to the persistent
storage volume were not being labelled correctly. The `-c` argument is
required.
Now that files are labelled correctly when they are copied, the step to
run `restorecon` is no longer necessary.
In effort to support different builds of Aimee OS using the same
scripts, without necessarily having to fork this repository, the build
system now supports a `CONFIGDIR` setting. When this variable is set,
files defining the target environment, such as the lists of packages to
install, the kernel configuration, the Portage configuration, etc. are
found in the path it specifes.
The reference build, for the Home Assistant Yellow board, is configured
in the `yellow` directory. To build it, run:
```sh
CONFIGDIR=yellow ./vm-build.sh
```
We're going to want the ability for processes to have unique categories,
to enforce separation of container processes. Gentoo's SELinux policy
supports both Multi-Category Security and Multi-Level Security modes,
although the latter does not seem to work out of the box.
*systemd-tmpfiles* can create btrfs subvolumes with the `v` entry type.
Using this mechanism instead of the `init-storage` script will allow for
greater flexibility when adding other subvolumes later.
Unfortunately, the default configuration for *systemd-tmpfiles* already
includes an entry for `/var/log` with the `d` (directory) type. Since
individual entries cannot be overridden, we need to modify this entry.
The `factory-reset` command provides a way to completely wipe the data
partition, thus erasing any local configuration and state. The command
itself simply enables a special systemd service unit that is activated
during the shutdown process. This unit runs a script, after all
filesystems, except rootfs, have been unmmounted. It then erases the
signature of the filesystem on the data partition, so it will appear
blank the next time the system boots. This will trigger the
`init-storage` process, to create a new filesystem on the partition.
Gentoo uses GNU awk by default, but since we are using Busybox for the
rest of the userspace utilities, it makes sense to use awk from Busybox
as well.
Some *tmpfiles.d(5)* entries specify paths in the immutable root
filesystem. These need to be created at build time to prevent
*systemd-tmpfiles-setup.service* from failing at runtime.
This script can be used to rebuild a binary package in the SYSROOT and
reinstall it in the destination root.
```sh
./rebuild-pkg sec-policy/selinux-aimee-os
make -W /tmp/build/.built O=/tmp/build IMAGESDIR=${PWD}/images
```
The *aimee-os* SELinux policy module provides rules that are specific to
our custom commands and system configuration. These rules are not
suitable for including in the upstream policy, so we include them in a
separate package rather than patches to the base policy.
Currently, the policy module includes rules to allow the `init-storage`
and `system-update` programs to work. It also includes rules to allow
SSH host keys to be stored in `/var/lib/ssh` instead of `/etc/ssh`,
since our `/etc` is immutable.
There's no particular reason why the directory used as the temporary
mount point for the data volume needs to be random. Using a static
name, on the other hand, makes it easier for the SELinux policy to
apply the correct type transition and ensure the directory is labelled
correctly.
Using `tar` to copy files and directories from the overlay directory to
the destination root preserves their timestamps. This is not really
desirable, particularly for directories, because it may result in the
destination paths appearing older than the build. This is especially
problematic for `/usr`, since its timestamps are important for systemd
units that use `ConditionNeedsUpdate` settings.
To ensure the timestamps are set correctly, we now use `rsync` to copy
the overlay, with the `-O` (`--omit-dir-times`) argument, to avoid
changing the timestamps of directories. Additionally, we explicitly
update the timestamp of `/usr` to ensure that every new build triggers
the "needs update" condition.
If any file in the `overlay` directory changes, the `build-rootfs.sh`
script needs to be re-run in order to copy the changes into the
destination root and regenerate the SquashFS image.
Instead of copying the Portage configuration files to `/etc/portage` and
`/usr/${target}/etc/portage`, the build scripts now use the
configuration directories from the source directory. This avoids issues
with changes (especially removal of files) getting propagated to the
actual configuration paths.
For some reason, when OverlayFS is mounted at `/etc/ssh`, SELinux
prevents access both `sshd` and `ssh-keygen` access to the files there.
The AVC denials indicate that (some part of) the process is running in
the `mount_t` domain, which is not allowed to read or write `sshd_key_t`
files.
To work around this issue, without granting `mount_t` overly-permissive
access, we now configure the SSH daemon to read host keys from the
persistent data volume directly, instead of "tricking" it with
OverlayFS. The `ssh-keygen` tool does not read the `HostKey` options
from `sshd_config`, though, so it has to be explicitly instructed to
create keys in this alternate location. By using a systemd template
unit with `ConditionPathExists`, we avoid regnerating the keys on every
boot, since the `ssh-keygen` command is only run if the file does not
already exist.
Enabling SELinux on the target system needs build-time and run-time
configuration changes for ther kernel and userspace. Additionally,
SELinux requires a policy that defines allowed operations. Gentoo
provides a reasonable baseline for all of these changes, but some
modifications are required.
First and foremost, the Gentoo SELinux policy is missing several
necessary rules for systemd-based systems. Notably, services that use
alternate namespaces will fail to start because the base policy does not
allow systemd components the necessary privileges, so these rules have
to be added. Similarly, `systemd-journald` needs additional privileges
in order to be able to capture all metadata for processes generating
syslog messages. Finally, additional rules are necessary in order to
allow systemd to create files and directories prior to launching
servies.
Besides patching the policy, we also do some hackery to avoid shipping
the Python runtime in SELinux-enabled builds. Several SELinux-related
packages, including *libselinux* and *policycoreutils* have dependencies
on Python modules for some of their functionality. Unfortunately, the
Python build system does NOT properly cross-compile native extension
modules, so this functionality is not available on the target system.
Fortunately, none of the features provided by these modules are actually
needed at runtime, so we can safely ignore them and thus omit the entire
Python runtime and all Python programs from the final image.
It is important to note that it is impossible to build an
SELinux-enabled image on a host that is itself SELinux-enabled.
Operations such as changing file labels are checked against the SELinux
policy in the running kernel, and may be denied if the target policy
differs significantly from the running policy. The `setfiles` command
fails, for example, when run on a Fedora host. As such, building an
SELinux-enabled system should be done in a virtual machine using a
kernel that does not have a loaded SELinux policy. The `ocivm` script
can be used to create a suitable runtime from a container image.
The Portage packages that need to be built and/or installed are now
specified in the `build.packages` and `install.packages` files,
respectively. Similarly, packages to be installed on the host system
are specified in `host-tools.packages`. Finally, the
`installonly.packages` file contains a list of packages that are
installed in the destination root, but not built in the sysroot
beforehand.
This allows `make` to better track when the package sets change. It
will also make it easier to maintain different sets for different
variants in the future.
This script uses the `ocivm` tool to launch a QEMU micro VM to build
the operating system. This is necessary to produce an SELinux-enabled
system, since container runtimes interfere with the SELinux policy
build and filesystem labeling processes.
Since we have to build *sys-libs/libcap* with the default Portage
configuration in order to avoid the circular dependency with PAM,
our configuration for binary package builds is not yet in place. We
need to explicitly specify where to put the built packages and enable
multi-instance packages.
By default, `tar` copies file ownership UID/GID. This works fine when
the build is running in a rootless container, since the source UID/GID
numbers are mapped to 0/0 inside the container. In other scenarios,
though, such as building in a microvm with the source directory on a
shared filesystem, the original numbers are preserved. We need to
explicitly state that the files must be owned by root.
When running inside a QEMU microvm with the source directory shared
via 9pfs, the kernel build process fails
> Error: Could not mmap file: vmlinux
Thus, we need to run the build in a path on a local filesystem. To
support this, the Makefile now supports an `O` option, and all the build
scripts have been adjusted to make use of it as needed.
Since building in a local filesystem would ultimately discard the final
artifacts when the VM terminates, we need yet a different location for
the files we want to keep. The `IMAGESDIR` option can be used to
specify this path. This path can be on a shared filesystem, thus
saving the artifacts outside the microvm.