Install Datadog in Kubernetes | Claus Witt dot com

First of all, we use helm. So we start here: https://docs.datadoghq.com/agent/kubernetes/?tab=helm

Add the repository

helm repo add datadog https://helm.datadoghq.com
helm repo add stable https://charts.helm.sh/stable
helm repo update

Install the chart

helm install datadog -f values.yaml  --set datadog.apiKey=<DATADOG_API_KEY> datadog/datadog --set targetSystem=linux

Special case: AWS

And then because we use EKS and thus Amazon Linux 2 we need to do this (source: https://artifacthub.io/packages/helm/datadog/datadog#configuration-required-for-amazon-linux-2-based-nodes)

agents:
  # (...)
  podSecurity:
    # (...)
    apparmor:
      # (...)
      enabled: false

# (...)

Service discovery

Then we need to tell the pods where the datadog api is (it is no longer on localhost inside a container, as we were used to from the datadog buildpack, but on the node it is scheduled on). This can be pushed to us on pod creation (existing pods wont get this value though - but we deploy changes to all applications anyway and change a hardcoded localhost to a read of the environment variable DD_AGENT_HOST)

  admissionController:
    # clusterAgent.admissionController.enabled -- Enable the admissionController to be able to inject APM/Dogstatsd config and standard tags (env, service, version) automatically into your pods
    enabled: true

    # clusterAgent.admissionController.mutateUnlabelled -- Enable injecting config without having the pod label 'admission.datadoghq.com/enabled="true"'
    mutateUnlabelled: true

Findin all nodes: taint and tolerations

Finally we ran into a minor issue; which took forever to figure out. Our cluster have two autoscaling groups, one for long running pods, and one for pods that we allow to get rescheduled (most of our apps are ok with that). But this is set up as a taint in kubernetes; and we need to tell datadog about the toleration, otherwise only the nodes without the taint will get a datadog agent.

  tolerations:
    - key: "removable"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"

A quick test

echo "some.new.to.send.to:1|c" |nc -w0 -u $DD_AGENT_HOST 8125

on a newly scheduled pod, and everything worked!