Manage Terraform with Atlantis in a restricted environment

Posted on July 18, 2022 by Adrian Wyssmann ‐ 8 min read

While we started to use Terraform to manage our Rancher clusters, we started to manually run the terraform commands. This is not the way to go, so we started to look into solutions, starting with Atlantis.

What is Atlantis

atlantis looks quite promising:

Atlantis is an application for automating Terraform via pull requests. It is deployed as a standalone application into your infrastructure. No third-party has access to your credentials.

Atlantis listens for GitHub, GitLab or Bitbucket webhooks about Terraform pull requests. It then runs terraform plan and comments with the output back on the pull request.

When you want to apply, comment atlantis apply on the pull request and Atlantis will run `terraform apply`` and comment back with the output.

Even so the documentation is great, it usually is not as straight forward as it seems. So here is my situation

  1. atlantis shall run on the on-prem kubernetes cluster

  2. I am sitting behind a corporate proxy

  3. we are using azure or better azure storage account to store the state of terraform

    • it uses a private endpoint, but with a direct connection
  4. code repo is an on-prem Bitbucket instance

  5. we use self-signed certificates

Setup - the basics

I won’t go to network details, but it’s clear that you might need to open fw holes so that atlantis can talk to [Bitbucket] and Azure.

As a start, luckily, the atlantis team offers a helm-chart, which is my preferred way to install apps in the Kubernetes platform. So the installation seems pretty easy:

  1. Create a Webhook Secret

  2. Create a Webhook in the repo using the Secret from above

  3. Create a Bitbucket token for user BBUSER

  4. Create a values.yaml

    bitbucket:
      user: BBUSER
      baseURL: https://git.intra
      # secret: passed via cli
      # token: passed via cli
    
    ingress:
      enabled: true
      path: /
      pathType: Prefix
      hosts:
        - host: atlantis.intra
          paths: ["/"]
      tls:
      - secretName: wildcard-ingress-cert
        hosts:
          - atlantis.intra
    
  5. Install atlantis using helm

    helm repo add remote https://docker.intra/remote-helm-rancher/
    helm install atlantis remote/atlantis -f values.yaml --set bitbucket.token=<TOKEN_FOR_BITBUCKET>  --set bitbucket.secret=<SECRET from step 1> -n atlantis
    

However, the atlantis pod may not start due to the following error:

docker-entrypoint.sh: detected /atlantis-data wrong filesystem permissions
currently owned by root:atlantis, changing to atlantis:atlantis...
chown: /atlantis-data/lost+found: Operation not permitted
chown: /atlantis-data/lost+found: Operation not permitted
chown: /atlantis-data: Operation not permitted
chown: /atlantis-data: Operation not permitted

This seems to be a known issue and I just have to use an older chart (4.0.3). Unfortunately the communication to Bitbucket still did not work, cause Bitbucket uses a self-signed server certifcate, which by nature is not trusted.

Self-signed certificates

And as we are using self-signed certificates, the above setup will not work. Unfortunately it lacks of a way to inject your self-signed certificates into the trust. Sure I can build my own image which include the certs, but I prefer to rely on the offical images.

So as a first approach, I extended charts/atlantis/templates/statefulset.yaml so that one can inject additional certificates from a secret into /etc/ssl/certs

apiVersion: apps/v1
kind: StatefulSet
metadata:
...
spec:
  ....
  template:
    ...
    spec:
      ...
      volumes:
      ...
      {{- if .Values.additionalTrustCerts.secretName }}
      - name: additional-trust-certs
        secret:
          secretName: {{ .Values.additionalTrustCerts.secretName }}
      {{- end }}
          ...
          volumeMounts:
          {{- if .Values.additionalTrustCerts.secretName }}
          {{- range .Values.additionalTrustCerts.certs }}
          - name: additional-trust-certs
            mountPath: "/etc/ssl/certs/{{ . }}"
            subPath: {{ . }}
          {{- end }}
          {{- end }}
      ...

So you could add a secret .Values.additionalTrustCerts.secretName which contains root and subordinate certificates:

apiVersion: v1
data:
  self-signed-root.crt: XXXXX
  self-signed-subord.crt: XXXX
kind: Secret
metadata:
  name: my-certs
  namespace: atlantis
type: Opaque

And then as part of the values.yaml you provide the necessary .Values.additionalTrustCerts.certs

additionalTrustCerts: 
  secretName: my-certs
  certs:
  - self-signed-root.crt
  - self-signed-subord.crt

However, this does now work, as one also has to run update-ca-certificates, which ultimately will fail, as the user who is running the pod is not the root user and hence has no access to override /etc/ssl/certs/cacerts. My alternative approach is outlined in this PR. It allows you to override the /etc/ssl/certs/cacerts by your own PEM-file.

Azure cli

So once atlantis trusts my self-signed certificates, I gave it a try, but unfortunately still no success. While atlantis could now communicate with Bitbucket, but atlantis plan failed due to lack of azure-cli.

running "/usr/local/bin/terraform init -input=false -upgrade" in "/atlantis-data/repos/Kubernetes/terraform/37/default/my-kubernetes-cluster": exit status 1
╷

│ Error: Error building ARM Config: please ensure you have installed Azure CLI version 2.0.79 or newer. Error parsing json result from the Azure CLI: launching Azure CLI: exec: "az": executable file not found in $PATH.
Initializing the backend...

Just copying a binary file does not work, as there is none. Unfortunatley azure-cli has a lot of dependecies to python and it’s libraries. So this is where, I had to start creating my own container. Here is the Dockerfile

FROM ghcr.io/runatlantis/atlantis:v0.19.3

ENV PYTHONUNBUFFERED=1
RUN apk add --update --no-cache python3 python3-dev musl-dev linux-headers gcc && ln -sf python3 /usr/bin/python
RUN python3 -m ensurepip
RUN pip3 install --no-cache --upgrade pip setuptools
RUN pip3 install wheel --no-cache

RUN pip install --upgrade  azure-cli --no-cache-dir

We then have to use the new image in the values.yaml

image:
  repository: docker.intra/papanito/atlantis-azure
  tag: v0.19.3

But this is not all we also have to ensure, that python knows about the self-signed certificates. So I will eventually map the certificate from above to `/usr/lib/python3.9/site-packages/certifi/cacert.pem

extraVolumeMounts:
  - name: additional-trust-certs
    mountPath: /usr/lib/python3.9/site-packages/certifi/cacert.pem
    subPath: ca-certificates.crt

At last, we have to authenticate against azure, so the state can be read from the storage account. Following this guide we need a service principal, which has the following permissions

  • Storage Blob Data Contributor - on the scope
  • Storage Queue Data Contributor - on the scope of the container

If not, terraform will not be able to properly connect:


bash-5.1$ /usr/local/bin/terraform init -input=false -upgrade

Initializing the backend...
│ Error: Failed to get existing workspaces: containers.Client#ListBlobs: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationPermissionMismatch" Message="This request is not authorized to perform this operation using this permission.\nRequestId:71979092-601e-0049-1489-9abb93000000\nTime:2022-07-18T09:35:06.7502023

Azure authentication

We already have azurerm configured in backend.tf:

terraform {
  backend "azurerm" {
    use_azuread_auth     = true
    tenant_id            = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    subscription_id      = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    resource_group_name  = "rg-atlantis"
    storage_account_name = "tfstate"
    container_name       = "xxxx"
    key                  = "xxxx"
  }
}

In addition to that, we also need client_id and client_secret, which we can inject as environment variables ARM_CLIENT_ID and ARM_CLIENT_SECRET. I use a separate secret and add this to the values.yaml:

environmentSecrets:
  - name: ARM_CLIENT_ID
    secretKeyRef:
      name: atlantis-azure-credentials
      key: ARM_CLIENT_ID
  - name: ARM_CLIENT_SECRET
    secretKeyRef:
      name: atlantis-azure-credentials
      key: ARM_CLIENT_SECRET

Rancher, kubectl and webproxy

We use the following two providers:

  • rancher2: Verified provider for Rancher
  • kubectl: This provider is the best way of managing Kubernetes resources in Terraform, by allowing you to use the thing Kubernetes loves best - yaml!

These require a proper configuration in the backend.tf

terraform {
  required_providers {
      rancher2 = {
          source = "rancher/rancher2"
          version = "1.22.2"
      }
      kubernetes = {
        source = "hashicorp/kubernetes"
        version = "2.11.0"
      }
      kubectl = {
        source  = "gavinbunney/kubectl"
        version = "1.14.0"
      }
    }

  backend "azurerm" {
    ...
}

provider "rancher2" {
    api_url = "${var.RANCHER_API_URL}"
    access_key = "${var.RANCHER_TOKEN}"
    secret_key = "${var.RANCHER__SECRET}"
}

provider "kubernetes" {
  host = "${var.RANCHER_API_URL}/k8s/clusters/${rancher2_cluster.cluster.id}"
  token = "${var.RANCHER_TOKEN}:${var.RANCHER_SECRET}"
}

provider "kubectl" {
  load_config_file = "false"
  host = "${var.RANCHER_API_URL}/k8s/clusters/${rancher2_cluster.cluster.id}"
  token = "${var.RANCHER_TOKEN}:${var.RANCHER__SECRET}"
}

As you can see, we use environment variables, which we also inject the same way as we did for azurerm. So let’s extend the values.yaml

environmentSecrets:
  ...
  - name: TF_VAR_RANCHER_TOKEN
    secretKeyRef:
      name: rancher-credentials
      key: RANCHER_TOKEN
  - name: TF_VAR_RANCHER_SECRET
    secretKeyRef:
      name: rancher-credentials
      key: RANCHER_SECRET
  - name: TF_VAR_RANCHER_API_URL
    secretKeyRef:
      name: rancher-credentials
      key: RANCHER_API_URL

At last, we also need to ensure traffic goes troug the proxy where necessary - so mainly traffic for public endpoints like the azure portal. We simply inject the proxy values as environment variables

environmentRaw:
  - name: HTTP_PROXY
    value: http://webproxy.intra:8888
  - name: HTTPS_PROXY
    value: http://webproxy.intra:8888
  - name: NO_PROXY 
    value: localhost,127.0.0.1,.intra,tfstate.blob.core.windows.net

Sum up

I my environment, there are some additional steps necessary to get atlantis running. Here are all the steps to be performed:

  1. Create a Webhook Secret

  2. Create a Webhook in the repo using the Secret from above

  3. Create a Bitbucket token for user BBUSER

  4. Add secret my-ca-certificates which self-signed root and subordinates certificates in single pem file called ca-certificates.crt

  5. Create a service prinicipal in azure in azure, which has access to the storage account where the tf state is stored. It shall have the following permissions

  6. In Rancher, create a serivce user (admin) and a token for it

  7. Add secret rancher-credentialswith credentials for each Rancher instance

    kubectl create secret generic rancher-credentials -n atlantis --context nop \
       --from-literal=TF_VAR_PLAYGROUND_TOKEN=<TOKEN> --from-literal=TF_VAR_RANCHER_SECRET=<SECRET> --from-literal=TF_VAR_RANCHER_API_URL=https://rancher.intra 
    
  8. Add secret atlantis-azure-credentials with credentials for azure service principal

    kubectl create secret generic atlantis-azure-credentials -n atlantis --context nop --from-literal=ARM_CLIENT_ID=<CLIENT_ID> --from-literal=ARM_CLIENT_SECRET=<CLIENT_SECRET>
    
  9. Create a values.yaml

    # this specific image contains az plus dependencies
    image:
      repository: docker.intra/papanito/atlantis-azure
      tag: v0.19.3
    bitbucket:
      user: BBUSER
      baseURL: https://git.intra
      # secret: passed via cli
      # token: passed via cli
    
    ingress:
      enabled: true
      path: /
      pathType: Prefix
      hosts:
        - host: atlantis.intra
          paths: ["/"]
      tls:
      - secretName: wildcard-ingress-cert
        hosts:
          - atlantis.intra
    
    orgAllowlist: git.intra/tf/rancher
    
    customPem: my-ca-certificates
    
    environmentRaw:
      - name: HTTP_PROXY
        value: http://webproxy.intra:8888
      - name: HTTPS_PROXY
        value: http://webproxy.intra:8888
      - name: NO_PROXY 
        value: localhost,127.0.0.1,.intra,tfstate.blob.core.windows.net
    
    environmentSecrets:
      - name: ARM_CLIENT_ID
        secretKeyRef:
          name: atlantis-azure-credentials
          key: ARM_CLIENT_ID
      - name: ARM_CLIENT_SECRET
        secretKeyRef:
          name: atlantis-azure-credentials
          key: ARM_CLIENT_SECRET
      - name: TF_VAR_RANCHER_TOKEN
        secretKeyRef:
          name: rancher-credentials
          key: RANCHER_TOKEN
      - name: TF_VAR_RANCHER_SECRET
        secretKeyRef:
          name: rancher-credentials
          key: RANCHER_SECRET
      - name: TF_VAR_RANCHER_API_URL
        secretKeyRef:
          name: rancher-credentials
          key: RANCHER_API_URL
    
  10. Install atlantis using helm

Remarks: Currently I am are using a modified chart until PR #163 is merged.

helm repo add remote https://docker.intra/remote-helm-rancher/
helm install atlantis remote/atlantis -f values.yaml --set bitbucket.token=<TOKEN_FOR_BITBUCKET>  --set bitbucket.secret=<SECRET from step 1> -n atlantis

So now, atlantis runs a terraform plan upon a pull request

atlantis plan
atlantis does a terraform plan for a PR