As a reader of my blog you are aware that at my employer we are using Rancher and MS Teams for alerting. Unfortunately this was not working properly, so I had to start debugging.
What is the issue?
A while back we setup the monitoring and alerting to MS Teams which works fine on most of the clusters. However, we encountered issues on some of the clusters where th alerting does not work, as we don’t receive alerts. While alertmanager handles alerts, with MS Teams there is and additional component involved: prom2teams. This driver receives the alerts from alertmanager and then forwards them to MS Teams.
Looking at the error prom2teams.app.exceptions.MicrosoftTeamsRequestException: Error performing request to: {}., the question is, whether the notification send by the alertmanager is fine or empty. For this purpose would like to use something like webhook.site
A site to easily test HTTP webhooks with this handy tool that displays requests instantly.
Using webhook.site in an air-gapped environment
However in our air-gapped environment I can’t connect to public endpoints. But luckily, this tools is available on Github and there they have a Webhook K8s Configuration Sample. Cool, so I can install this on my local cluster. I changed the ingress.yml as follows
As we only allow https traffic and the ingress is using a self-signed certificates we have to add the certificate to alertmanager. Firs we add the certificate as a secret
Then we configure alertmanager to have the knowledge of the certificate as follows, and update the monitoring deployment:
After that, we can setup the receiver using the url provided by the webhook.site deployment and the appropriate certificate:
At last, setup a rout using the Watchdog-alert and the endpoint from above:
Now we can check in the UI of webhook.intra and can confirm that the message sent from alertmanager looks fine:
Where to go from here?
webhook.site is a really nice tool to test https requests and I recommend to give it a try. For my issue, we are now confident that the issue is related prom2teams so I have to further check with Suse Support do narrow down the issue.