Run systemd in an unprivileged(!!!) docker container

Many people have tried running systemd inside a docker container.
Some have succeeded to some extend. See: http://developerblog.redhat.com/2014/05/05/running-systemd-within-docker-container/ for a reference.

The issues mentioned there are still problematic.
Also a drastic approach was used. Files in /lib/systemd/ were deleted and so on.

The biggest issue remaining however is the fact that you need to run docker in privileged mode.
Which adds a lot of kernel capabilities to the container and makes it less secure.
The root-cause of this is D-BUS. For reference: https://bugzilla.redhat.com/show_bug.cgi?id=1115533
D-BUS wants to drop some capabilities which it did not have in the first place….

To work around that issue I created a small fake C library and inject it into D-BUS via LD_PRELOAD. It tells D-BUS that those systemcalls succeeded. The source code is just a few lines and can be found here: https://github.com/maci0/docker-systemd-unpriv/blob/master/fakecap.c

So after getting the messagebus to work the next step is to get all the rest of systemd working.
As mentioned by Daniel Walsh in his blog entry. We need to disable some systemd unit files.
So I disabled ALL of them🙂
It was simple, just remove all files in /etc/systemd/system and systemctl mask all files in /lib/systemd/system/ a simple for-loop does the trick.

After that I unmask some mandatory systemd targets and unit files and enable some as well.

In addition to that we need the cgroup filesystem mounted in the container for systemd to work. This is achieved with a simple option -v /sys/fs/cgroup:/sys/fs/cgroup:ro.

So now I tried to run it but without any success.
The only output I ever got was
Failed to mount /run: Operation not permitted
There is an entry in bugzilla as well: https://bugzilla.redhat.com/show_bug.cgi?id=1033604
After trying to figure out a workaround for some time I just mounted a directory into the container as /run.
To my surprise after that it worked.
I started putting the pieces together and just uploaded everything to github:
https://github.com/maci0/docker-systemd-unpriv/

Simply execute build.sh and after that run.sh
Now you should have a docker container running which runs systemd.
You can login with user root, password root.
Or run docker inspect to find out the IP of the container and simply ssh into it😀

You can find out the IP of the container using docker inspect and ssh into the container.
You can login with user root, password root.

Give it a spin !

My thanks go to Daniel Walsh from RedHat who did the groundwork for this.

Run any application on RHEL7 containerized with 3D acceleration and pulseaudio ( Steam, Pidgin, VLC, …)

Red Hat Enterprise Linux 7 RC was released some time ago. Since the beta was already quite decent I wanted to toy around with the RC version.

My setup is quite unique. My root filesystem is on ZFS which I used on archlinux. Since I didn’t want to reinstall my laptop I just created another ZFS filesystem and manually bootstrapped RHEL7 with the RPMs from ftp://ftp.redhat.com and the version of yum which can be found in the archlinux user repository.
I will not go into the specifics of that though.

The overall experience with RHEL7 has been great so far. But I was missing some applications like Pidgin ( which has been removed in favour of Telepathy ), Steam and VLC.
Since I know my way around systemd, thanks to archlinux and Fedora I knew about systemd-nspawn.
It’s a tool to enable lightweight containerisation of single applications or booting whole Linux distributions in their own namespace.

So getting the Pidgin from my archlinux install running was quite easy.
sudo systemd-nspawn -M pidgin --bind=/home:/home -D /altroot/ARCH-ROOT su maci -c 'DISPLAY=:0 pidgin' &

Another great way to run applications in containers is Docker.
Docker has a great index of pre-built images to start applications in containers.
There even is an image to run Steam inside of a SteamOS environment. https://index.docker.io/u/tianon/steam/
In theory a simple docker run tianon/steam should be sufficient to boot up steam.
However Steam needs access to the hosts X server. So we need to bind-mount /tmp/.X11-unix into the docker container. Docker has a command line option for that -v its the equivalent to systemd-nspawn‘s --bind.
Now the Steam interface starts up just fine. There is no audio output though.
It took me some time to get the audio output working since the pulseaudio FAQ is quite outdated, it lists a bunch of bind mounts required to get audio working inside a chroot environment.
In reality what we do need is /run/user/${UID}/pulse, /etc/machine-id and /dev/shm.

Now we got graphical output and sound, but the 3D acceleration is not working.
It’s easy to get it working though. First make sure the OS image you use has the opengl libs for your physical video card, in the steam container the mesa libs are pre-installed, next bind mount /dev/dri into your container.

Update 2015-10-04: For some reason the image does not work out of the box anymore, I will investigate.

All combined we get something like:
docker run --name=steam \
-v /dev/dri:/dev/dri \
-v /tmp/.X11-unix:/tmp/.X11-unix -v /dev/shm:/dev/shm \
-v /run/user/${UID}/pulse:/run/user/${UID}/pulse \
-v /etc/machine-id:/etc/machine-id \
-v ${HOME}/Downloads:/tmp/Downloads \
-e DISPLAY=${DISPLAY} tianon/steam

Further information about the parameters can be found in the docker documentation.

That’s all. Using the methods described above you can get almost every application working in a container. VLC for example with systemd-nspawn
sudo systemd-nspawn --bind /tmp/.X11-unix:/tmp/.X11-unix \
--bind=/dev/shm:/dev/shm --bind=/dev/dri:/dev/dri \
--bind=/run/user/${UID}/pulse:/run/user/${UID}/pulse \
--bind=/etc/machine-id:/etc/machine-id \
-D /altroot/ARCH-ROOT \
su vlc -c "DISPLAY=:0 QT_X11_NO_MITSHM=1 dbus-launch /usr/bin/vlc --no-qt-privacy-ask"

Other ideas for applications you can run in a container: firefox, keepassX or basically every desktop application you want to separate from the system.
But you should remember to only bind-mount those resources into the container which are mandatory for your application to work.

Dynamic multi-machine Vagrantfile

Often I am facing the need to be able to rapidly test a multi-machine environment.
My tool of choice for this kind of tasks is Vagrant. Vagrant is really powerful.
For those of you who have not yet heard about it:

Vagrant is a tool for building complete development environments. With an easy-to-use workflow and focus on automation, Vagrant lowers development environment setup time, increases development/production parity, and makes the “works on my machine” excuse a relic of the past.
http://www.vagrantup.com/about.html

It’s usage is pretty straightforward. Simply install it on any of the supported platforms and run:

vagrant init
vagrant up

BAM, you got your VM.

What this will do is generate a default Vagrantfile in your current working directory and boot a virtual machine in VirtualBox.
This is fine for most purposes. However, In case you need something more sophisticated please read on.

In my case I wanted to use a Fedora 19 VM instead of the default Ubuntu VM.
This was quite simple. There are plenty of good boxes on vagrantbox.es this is somewhat easier and quicker than spinning your own using Veewee.

To add a box to vagrant just run the following command:

vagrant box add fedora19 https://dl.dropboxusercontent.com/u/86066173/fedora-19.box

The next step is to create a new Vagrantfile. This can be achieved in two ways. The first way is to write one by hand. The second way is to use this command:

vagrant init fedora19

After removing comments etc from the Vagrantfile we end up with something like:

# -*- mode: ruby -*-
# vi: set ft=ruby :

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "fedora19"
end

This Vagrantfile can be used to boot a fresh Fedora 19 VM.

I needed more customization, like being able to specify the amount of CPUs and RAM for the VM.
In addition I required a second network interface connected to an internal network.
Vagrant basically allows you to run vboxmanage so I ended up with this:

config.vm.provider "virtualbox" do |v|
  v.customize ["modifyvm", :id, "--memory", 256]
  v.customize ["modifyvm", :id, "--cpus", 2]
  v.customize ["modifyvm", :id, "--nic2", "intnet", "--intnet2", "glusternet"]
end

The next step was to execute some command on VM boot ( installing packages etc… )
Just create a script bootstrap.sh next to the Vagrantfile and modify the Vagrantfile

config.vm.provision :shell, :path => "bootstrap.sh"

There are other means of provisioning, puppet and chef are by far the most popular tools, and are nicely integrated into vagrant.

So far so good…

Since I needed more than one VM my initial approach was to define multiple VMs in the Vagrantfile.
This became somewhat tedious, and resulted in many duplicate lines.
Because a Varantfile is actually a ruby script, this can be greatly simplyfied.
So I ended up defining the number of VMs I want and put everything into a loop.

NODE_COUNT = 4
  NODE_COUNT.times do |i|
    node_id = "node#{i}"
    config.vm.define node_id do |node|
    node.vm.box_url = "https://dl.dropboxusercontent.com/u/86066173/fedora-19.box"
    node.vm.box = "fedora19"
    node.vm.hostname = "#{node_id}.intranet.local"
  end
end

This will result in 4 running VMs named node0, node1, node2 and node3.
I didn’t want this environment conflicting with others so I set the name of the internal network to some random string.

INTNET_NAME = [*('A'..'Z')].sample(8).join
v.customize ["modifyvm", :id, "--nic2", "intnet", "--intnet2", "#{INTNET_NAME}"]

This will give this internal network its own namespace.

Putting it all together, the finished Vagrantfile looks somewhat like:

# -*- mode: ruby -*-
# vi: set ft=ruby :

VAGRANTFILE_API_VERSION = "2"
NODE_COUNT = 4

INTNET_NAME = [*('A'..'Z')].sample(8).join

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

  config.vm.provider "virtualbox" do |v|
    v.customize ["modifyvm", :id, "--memory", 256]
    v.customize ["modifyvm", :id, "--cpus", 2]
    v.customize ["modifyvm", :id, "--nic2", "intnet", "--intnet2", "#{INTNET_NAME}"]
  end
  config.vm.provision :shell, :path => "bootstrap.sh"


  NODE_COUNT.times do |i|
    node_id = "node#{i}"
    config.vm.define node_id do |node|
      node.vm.box_url = "https://dl.dropboxusercontent.com/u/86066173/fedora-19.box"
      node.vm.box = "fedora19"
      node.vm.hostname = "#{node_id}.intranet.local"
    end
  end


end

If you want to know more about Vagrant, go ahead and visit their website.

Setting up Archipel agent on Fedora 18

Archipel logo

Recently I became quite interested in Archipel.

However, as it turns out their installation instruction can not be used on Fedora 18 without some minor modifications.

Requirements:

First it is necessary to install all dependencies, activate libvirt and then install Archipel through easy_install:

yum -y install python-imaging python-numeric python-devel gcc
yum -y install python-setuptools numpy python-xmpp python-sqlalchemy
yum -y install  qemu-kvm qemu-img libvirt libvirt-python psmisc
service libvirtd start
chkconfig libvirtd on
easy_install archipel-agent

There is a bug in the agent install routine preventing the systemd/init script from being deployed, we have to work around this:

cp /usr/lib/python2.7/site-packages/archipel_agent-0.6.0beta-py2.7.egg/install/etc/init.d/archipel /etc/init.d/
chmod +x /etc/init.d/archipel
chkconfig archipel on

The next step is to do the necessary configuration to prepare ejabberd for the use with Archipel.

archipel-tagnode --jid=admin@example.com --password=changeme --create
archipel-rolesnode --jid=admin@example.com --password=changeme --create
archipel-adminaccounts --jid=admin@example.com --password=changeme --create
archipel-vmparkingnode --jid=admin@example.com --password=changeme --create
archipel-vmrequestnode --jid=admin@example.com --password=changeme --create
archipel-testxmppserver --jid=admin@example.com --password=changeme

I hope you will be able to get everything running.