While we started to use Terraform to manage our Rancher clusters, we started to manually run the terraform commands. This is not the way to go, so we started to look into solutions, starting with Atlantis.
Atlantis is an application for automating Terraform via pull requests. It is deployed as a standalone application into your infrastructure. No third-party has access to your credentials.
Atlantis listens for GitHub, GitLab or Bitbucket webhooks about Terraform pull requests. It then runs terraform plan and comments with the output back on the pull request.
When you want to apply, comment atlantis apply on the pull request and Atlantis will run `terraform apply`` and comment back with the output.
Even so the documentation is great, it usually is not as straight forward as it seems. So here is my situation
atlantis shall run on the on-prem kubernetes cluster
it uses a private endpoint, but with a direct connection
code repo is an on-prem Bitbucket instance
we use self-signed certificates
Setup - the basics
I won’t go to network details, but it’s clear that you might need to open fw holes so that atlantis can talk to [Bitbucket] and Azure.
As a start, luckily, the atlantis team offers a helm-chart, which is my preferred way to install apps in the Kubernetes platform. So the installation seems pretty easy:
However, the atlantis pod may not start due to the following error:
This seems to be a known issue and I just have to use an older chart (4.0.3). Unfortunately the communication to Bitbucket still did not work, cause Bitbucket uses a self-signed server certifcate, which by nature is not trusted.
Self-signed certificates
And as we are using self-signed certificates, the above setup will not work. Unfortunately it lacks of a way to inject your self-signed certificates into the trust. Sure I can build my own image which include the certs, but I prefer to rely on the offical images.
So as a first approach, I extended charts/atlantis/templates/statefulset.yaml so that one can inject additional certificates from a secret into /etc/ssl/certs
So you could add a secret .Values.additionalTrustCerts.secretName which contains root and subordinate certificates:
And then as part of the values.yaml you provide the necessary .Values.additionalTrustCerts.certs
However, this does now work, as one also has to run update-ca-certificates, which ultimately will fail, as the user who is running the pod is not the root user and hence has no access to override /etc/ssl/certs/cacerts. My alternative approach is outlined in this PR. It allows you to override the /etc/ssl/certs/cacerts by your own PEM-file.
Azure cli
So once atlantis trusts my self-signed certificates, I gave it a try, but unfortunately still no success. While atlantis could now communicate with Bitbucket, but atlantis plan failed due to lack of azure-cli.
Just copying a binary file does not work, as there is none. Unfortunatley azure-cli has a lot of dependecies to python and it’s libraries. So this is where, I had to start creating my own container. Here is the Dockerfile
We then have to use the new image in the values.yaml
But this is not all we also have to ensure, that python knows about the self-signed certificates. So I will eventually map the certificate from above to `/usr/lib/python3.9/site-packages/certifi/cacert.pem
At last, we have to authenticate against azure, so the state can be read from the storage account. Following this guide we need a service principal, which has the following permissions
Storage Blob Data Contributor - on the scope
Storage Queue Data Contributor - on the scope of the container
If not, terraform will not be able to properly connect:
In addition to that, we also need client_id and client_secret, which we can inject as environment variables ARM_CLIENT_ID and ARM_CLIENT_SECRET. I use a separate secret and add this to the values.yaml:
kubectl: This provider is the best way of managing Kubernetes resources in Terraform, by allowing you to use the thing Kubernetes loves best - yaml!
These require a proper configuration in the backend.tf
As you can see, we use environment variables, which we also inject the same way as we did for azurerm. So let’s extend the values.yaml
At last, we also need to ensure traffic goes troug the proxy where necessary - so mainly traffic for public endpoints like the azure portal. We simply inject the proxy values as environment variables
Sum up
I my environment, there are some additional steps necessary to get atlantis running. Here are all the steps to be performed:
Create a Webhook in the repo using the Secret from above
Create a Bitbucket token for user BBUSER
Add secret my-ca-certificates which self-signed root and subordinates certificates in single pem file called ca-certificates.crt
Create a service prinicipal in azure in azure, which has access to the storage account where the tf state is stored. It shall have the following permissions
In Rancher, create a serivce user (admin) and a token for it
Add secret rancher-credentialswith credentials for each Rancher instance
Add secret atlantis-azure-credentials with credentials for azure service principal