Revisiting the Enough CI

Bonjour,

I’m frustrated that the CI cannot run all the tests because the OpenStack control plane (at OVH or elsewhere for that matter; or even GCP or AWS) is, by design, asynchronous and prone to failure. That works well in a production environment but it is too noisy for a CI. It is difficult to sort out if an error comes from OpenStack or from the Ansible / python code that is under test.

About a year ago discussions with @pilou convinced me it would be a good idea to be able to run Enough on a single Docker machine. It would be useful in general for people who don’t want to rely on the cloud and prefer bare metal (@dam :eyes:). And it would allow to run almost all tests in the CI because the failure rate of docker-compose up and docker run or docker build is very low and won’t confuse the developer.

The work done on docker suffered regression in the past six month because of:

  • backup restoration (the backup strategy is 100% OpenStack specific)
  • upgrade tests (not sure how difficult it would be to run them on docker)
  • VPN implementation (tox -e openvpn runs a VPN client in a docker container but I’m not sure what it would mean for a VPN server to run on docker)
  • every VM now has two interfaces (one public, one private) and the docker container should be configured to be attached to two networks instead of one

On the plus side:

  • the tests now run from a dedicated docker container that could be attached to a docker network dedicated to testing and there would be no need to worry about exposing ports that already are in use
  • the ownca certificate authority is now well tested and exposing all web services via SSL is no longer a concern

In conclusion, I wonder if it would be worth resuming the work to run Enough on Docker. I got used to manually running tox -e wekan (and coping with environmental errors) before merging. But it may be too much for new contributors (@nqb :eyes:).

To be continued

Hello,

I’m interested by this topic.

@loic, could you try to describe what do you want to do ultimately with the CI ?

It’s super simple: I want a CI that runs all the tests every time a commit is added in a merge request or in master. There is one blocker preventing that from happening: the instability of the OpenStack control plane (a contrario, once a resource is provisioned, it is very stable).

Thanks !

We could also add a way to deploy using GitLab CI (manually if we want). That’s way, you don’t need to meet all prerequisite on your workstation to deploy against hosts.

1 Like

This would be great, yes :+1:

Having a CI helps a lot, especially for new contribuors.

I consider this as a high priority issue/feature and would like to work on this specific question in next weeks.

I would really appreciate to have a meeting on this specific topic in order to:

  • have the big picture of current situation
  • explain our need (almost done in this topic)
  • compare possible solutions we could use
  • choose a solution and implement it

@pilou and @loic, could we schedule that next week ? I’m sure @dam is also interested by this topic.

I will ask some questions and write some notes below before meeting.

OpenStack API is unstable

If I understand correctly, the OpenStack API is unstable. It’s not related to OVH but to upstream software. So we will have same problem with any OpenStack provider.

Perhaps, we can overcome these limitations with some tips like doing some retry with delays when an error is detected.

I’m sure there is people in earth that have a CI which is running under OpenStack instances without problem, we could see how they do and reuse their code if it is free software.

On top of that, current enough code is currently dependent on OVH API because as @loic mentioned, each OpenStack provider as its own custom OpenStack API. It means that if we stay with OpenStack provisioning and want to leave OVH at a moment, we will have to adapt enough code. Another drawback if that some people want to use enough for their my.enough.community, they have to use OVH.

I know that Terraform has a OpenStack provider which is mentioned by OVH in their official documentation. From my understanding (but @loic can confirm), Terraform can do the same than enough but it will be more agnostic.

Besides, we can run some commands at end of provisioning to run an Ansible playbook using provisioners. However, it seems that there is no builtin Ansible provisioner.

I’m pretty sure Terraform is used in many CI over the world and perhaps they have overcome OpenStack API limitations. I didn’t see too much issues opened related to OpenStack on their bug tracker.

The Docker solution

What about applications that don’t run inside Docker ? We still need to have a way to provision virtual machines. If everything is on one Docker host, we will have to use other virtualisation methods. Perhaps on same host.

Again, Terraform has a Docker provider.

Tests

In my previous notes, I only speak about provisioning machines or containers but testing strategy has to be taken into account like @singuliere mentioned in Using terraform to abstract OpenStack topic. Since this topic, Molecule has released a new major version and perhaps Terraform has progressed.

I also know that Testinfra can rely on connection backends.

The OpenNebula solution

Perhaps, OpenNebula could be a good alternative between OpenStack and a full Docker host. OpenNebula can run in the cloud or on bare-metal server. There is native support for Docker, KVM and LXD.

I know that DebOps maintainer used OpenNebula at work and only rely on DebOps roles to prepare hypervisor.

I don’t if its API is stable.

Again, there is a Terraform provider. available.

1 Like

Yes. People will fight over the fact that some providers are better than others. But my personal experience is that even the most expensive ones are unstable.

This unfortunately does not work. I’m yet to find a pattern in how instabilities manifest themselves… For instance there has been recurring cases of stacks that could not be deleted, during weeks. This started about four months ago but that never happened before. Working around this would requiring coding specific strategies that are unlikely to be useful or even possible to realistically test. If we go down this path the code will eventually be littered with half dead workarounds that we won’t be able to maintain. And the problem seems to be gone now: I ran tests daily in the past few weeks, creating and removing dozens of stacks without any problem.

I’m familiar with Teraform and contrary to the impression it may give at the beginning, it is by no mean generic. You need to customize the teraform specifications for OpenStack and for the provider you’re targeting, in our case OVH. In the end you’re adding yet another layer which does not help.

Excellent point :slight_smile: I think all applications we rely on run in Docker. Of course not all of them are Docker based, such as postfix or icinga. But when the Docker container runs ssh and systemd (which it does), postfix or icinga don’t know the difference. There are annoying details, of course, but I don’t see a blocker.

My experience with OpenNebula is more than five years old but I kind of remember that it is not a good fit. I don’t remember the reasons though. I’ve not heard about anyone running OpenNebula.

Libvirt

Since you’re exploring alternatives, there is one more: libvirt. It’s simple, it is well supported by Ansible, it is running VMs therefore closer to OpenStack. I’m inclined to think that would be our easiest way to stability for a CI. It would mean throwing away the work done on trying to implement Docker as a stable alternative to OpenStack but I’d be happy to work in the direction with you.

today, with @pilou help, I revisited libvirt (last visit was years ago :wink: ). It should not be too complicated to replace OpenStack with libvirt, using the python module (see also the guide).

The most time consuming task was to figure out how to create a virtual machine running a Debian GNU/Linux buster with a debian user usable for Ansible.

wget https://cloud.debian.org/images/cloud/buster/20210129-530/debian-10-generic-amd64-20210129-530.qcow2
virt-sysprep -a debian-10-generic-amd64-20210129-530.qcow2 --run-command 'dpkg-reconfigure --frontend=noninteractive openssh-server' --run-command 'useradd -s /bin/bash -m debian || true ; echo "debian ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-debian'  --ssh-inject debian:file:$HOME/.ssh/id_rsa.pub
sudo cp debian-10-generic-amd64-20210129-530.qcow2 /var/lib/libvirt/images/deb.qcow2
sudo chown libvirt-qemu /var/lib/libvirt/images/deb.qcow2
virsh  --connect qemu:///system net-start default
virt-install --connect qemu:///system --boot hd --name deb --memory 1024 --vcpus 1 --cpu host --disk path=/var/lib/libvirt/images/deb.qcow2,size=10,bus=virtio,format=qcow2 --os-type=linux --os-variant=debian10 --graphics spice --noautoconsole
virsh  --connect qemu:///system domifaddr deb
ssh debian@<IP>

Adding a disk is easier

sudo qemu-img create -f raw /var/lib/libvirt/images/a.img 1G
sudo chown libvirt-qemu /var/lib/libvirt/images/a.img
virsh --connect qemu:///system attach-disk deb --source /var/lib/libvirt/images/a.img --target vdb --persistent

Introspection from the command line is not easy because there is no JSON or XML output to, for instance, get the IP of the machine with:

virsh  --connect qemu:///system domifaddr deb

@nqb what about a meeting at 3pm friday 11th September ? It would work for @pilou was well.

Hello guys,

Looks good to me for 11th September at 3pm.

Libvirt is a good solution for me too !

1 Like

The OpenNebula solution

Something to add in mind:

In early June 2020, OpenNebula announced the release of a new Enterprise Edition for corporate users, along with a Community Edition.[2] OpenNebula CE is free and open-source software, released under the Apache License version 2. OpenNebula CE comes with free access to maintenance releases but with upgrades to new minor/major versions only available for users with non-commercial deployments or with significant contributions to the OpenNebula Community.[3] OpenNebula EE is distributed under a closed-source license and requires a commercial Subscription.[4]

Source: Wikipedia

1 Like

I did not realize OpenNebula became Open Core, it is good to know.

I went ahead and created an event in the agenda, and added a few topics in case you’re interested to hear about them. We can keep it brief and focus on the main subject, the CI.

For the record, support for running tests with libvirt was merged today :tada:

1 Like