44 Swarmprom

Swarmprom is a starter kit for Docker Swarm monitoring with Prometheus, Grafana, cAdvisor, Node Exporter, Alert Manager and Unsee. And it’s damn sexy. See for yourself:

Swarmprom Screenshot

So what do all these components do?

  • Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud.
  • Grafana is a tool to make data beautiful.
  • cAdvisor

cAdvisor (Container Advisor) provides container users an understanding of the resource usage and performance characteristics of their running containers. It is a running daemon that collects, aggregates, processes, and exports information about running containers.

  • Node Exporter is a Prometheus exporter for hardware and OS metrics
  • Alert Manager Alertmanager handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integrations such as email, Slack, etc.
  • Unsee is an alert dashboard for Alert Manager

44.1 How does this magic work?

I’d encourage you to spend some time reading https://github.com/stefanprodan/swarmprom. Stefan has included detailed explanations about which elements perform which functions, as well as how to customize your stack. (This is only a starting point, after all)

44.2 Ingredients

  1. Docker swarm cluster on 17.09.0 or newer (doesn’t work with CentOS Atomic, unfortunately) with persistent shared storage
  2. Traefik configured per design
  3. DNS entry for the hostnames you intend to use, pointed to your keepalived IP

44.3 Preparation

This is basically a rehash of stefanprodan’s instructions to match the way I’ve configured other recipes.

Setup oauth provider

Grafana includes decent login protections, but from what I can see, Prometheus, AlertManager, and Unsee do no authentication. In order to expose these publicly for your own consumption (my assumption for the rest of this recipe), you’ll want to prepare to run oauth_proxy containers in front of each of the 4 web UIs in this recipe.

Setup metrics

Edit (or create, depending on your OS) /etc/docker/daemon.json, and add the following, to enable the experimental export of metrics to Prometheus:

{
  "metrics-addr" : "0.0.0.0:9323",
  "experimental" : true
}

Restart docker with systemctl restart docker

Setup and populate data locations

We’ll need several files to bind-mount into our containers, so create directories for them and get the latest copies:

mkdir -p /var/data/swarmprom/dockerd-exporter/
cd /var/data/swarmprom/dockerd-exporter/
wget https://raw.githubusercontent.com/stefanprodan/swarmprom/master/dockerd-\
exporter/Caddyfile

mkdir -p /var/data/swarmprom/prometheus/rules/
cd /var/data/swarmprom/prometheus/rules/
wget https://raw.githubusercontent.com/stefanprodan/swarmprom/master/promethe\
us/rules/swarm_task.rules.yml
wget https://raw.githubusercontent.com/stefanprodan/swarmprom/master/promethe\
us/rules/swarm_node.rules.yml

# Directories for holding runtime data
mkdir /var/data/runtime/swarmprom/grafana/
mkdir /var/data/runtime/swarmprom/alertmanager/
mkdir /var/data/runtime/prometheus

chown nobody:nogroup /var/data/runtime/prometheus

Prepare Grafana

Grafana will make all the data we collect from our swarm beautiful.

Create /var/data/swarmprom/grafana.env, and populate with the following variables

OAUTH2_PROXY_CLIENT_ID=
OAUTH2_PROXY_CLIENT_SECRET=
OAUTH2_PROXY_COOKIE_SECRET=

# Disable basic auth (it conflicts with oauth_proxy)
GF_AUTH_BASIC_ENABLED=false

# Set this to the real-world URL to your grafana install (else you get screwy\
 CSS thanks to oauth_proxy)
GF_SERVER_ROOT_URL=https://grafana.example.com
GF_SERVER_DOMAIN=grafana.example.com

# Set your default admin/pass here
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=ilovemybatmanunderpants

Setup Docker Swarm

Create a docker swarm config file in docker-compose syntax (v3), based on the original swarmprom docker-compose.yml file

???+ note “This example is 274 lines long. Click here to collapse it for better readability”

with my patreon patronspremixgit pulldocker stack deploy
      version: "3.3"

      networks:
        net:
          driver: overlay
          attachable: true

      volumes:
          prometheus: {}
          grafana: {}
          alertmanager: {}

      configs:
        dockerd_config:
          file: /var/data/swarmprom/dockerd-exporter/Caddyfile
        node_rules:
          file: /var/data/swarmprom/prometheus/rules/swarm_node.rules.yml
        task_rules:
          file: /var/data/swarmprom/prometheus/rules/swarm_task.rules.yml

      services:
        dockerd-exporter:
          image: stefanprodan/caddy
          networks:
            - internal
          environment:
            - DOCKER_GWBRIDGE_IP=172.18.0.1
          configs:
            - source: dockerd_config
              target: /etc/caddy/Caddyfile
          deploy:
            mode: global
            resources:
              limits:
                memory: 128M
              reservations:
                memory: 64M

        cadvisor:
          image: google/cadvisor
          networks:
            - internal
          command: -logtostderr -docker_only
          volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
            - /:/rootfs:ro
            - /var/run:/var/run
            - /sys:/sys:ro
            - /var/lib/docker/:/var/lib/docker:ro
          deploy:
            mode: global
            resources:
              limits:
                memory: 128M
              reservations:
                memory: 64M

        grafana:
          image: stefanprodan/swarmprom-grafana:5.3.4
          networks:
            - internal
          env_file: /var/data/config/swarmprom/grafana.env
          environment:
            - GF_USERS_ALLOW_SIGN_UP=false
            - GF_SMTP_ENABLED=${GF_SMTP_ENABLED:-false}
            - GF_SMTP_FROM_ADDRESS=${GF_SMTP_FROM_ADDRESS:-grafana@test.com}
            - GF_SMTP_FROM_NAME=${GF_SMTP_FROM_NAME:-Grafana}
            - GF_SMTP_HOST=${GF_SMTP_HOST:-smtp:25}
            - GF_SMTP_USER=${GF_SMTP_USER}
            - GF_SMTP_PASSWORD=${GF_SMTP_PASSWORD}
          volumes:
            - /var/data/runtime/swarmprom/grafana:/var/lib/grafana
          deploy:
            mode: replicated
            replicas: 1
            placement:
              constraints:
                - node.role == manager
            resources:
              limits:
                memory: 128M
              reservations:
                memory: 64M

        grafana-proxy:
          image: a5huynh/oauth2_proxy
          env_file : /var/data/config/swarmprom/grafana.env
          networks:
            - internal
            - traefik_public
          deploy:
            labels:
              - traefik.frontend.rule=Host:grafana.swarmprom.example.com
              - traefik.docker.network=traefik_public
              - traefik.port=4180
          volumes:
            - /var/data/config/swarmprom/authenticated-emails.txt:/authentica\
ted-emails.txt
          command: |
            -cookie-secure=false
            -upstream=http://grafana:3000
            -redirect-url=https://grafana.swarmprom.example.com
            -http-address=http://0.0.0.0:4180
            -email-domain=example.com
            -provider=github
            -authenticated-emails-file=/authenticated-emails.txt

        alertmanager:
          image: stefanprodan/swarmprom-alertmanager:v0.14.0
          networks:
            - internal
          environment:
            - SLACK_URL=${SLACK_URL:-https://hooks.slack.com/services/TOKEN}
            - SLACK_CHANNEL=${SLACK_CHANNEL:-general}
            - SLACK_USER=${SLACK_USER:-alertmanager}
          command:
            - '--config.file=/etc/alertmanager/alertmanager.yml'
            - '--storage.path=/alertmanager'
          volumes:
            - /var/data/runtime/swarmprom/alertmanager:/alertmanager
          deploy:
            mode: replicated
            replicas: 1
            placement:
              constraints:
                - node.role == manager
            resources:
              limits:
                memory: 128M
              reservations:
                memory: 64M

        alertmanager-proxy:
          image: a5huynh/oauth2_proxy
          env_file : /var/data/config/swarmprom/alertmanager.env
          networks:
            - internal
            - traefik_public
          deploy:
            labels:
              - traefik.frontend.rule=Host:alertmanager.swarmprom.example.com
              - traefik.docker.network=traefik_public
              - traefik.port=4180
          volumes:
            - /var/data/config/swarmprom/authenticated-emails.txt:/authentica\
ted-emails.txt
          command: |
            -cookie-secure=false
            -upstream=http://alertmanager:9093
            -redirect-url=https://alertmanager.swarmprom.example.com
            -http-address=http://0.0.0.0:4180
            -email-domain=example.com
            -provider=github
            -authenticated-emails-file=/authenticated-emails.txt

        unsee:
          image: cloudflare/unsee:v0.8.0
          networks:
            - internal
          environment:
            - "ALERTMANAGER_URIS=default:http://alertmanager:9093"
          deploy:
            mode: replicated
            replicas: 1

        unsee-proxy:
          image: a5huynh/oauth2_proxy
          env_file : /var/data/config/swarmprom/unsee.env
          networks:
            - internal
            - traefik_public
          deploy:
            labels:
              - traefik.frontend.rule=Host:unsee.swarmprom.example.com
              - traefik.docker.network=traefik_public
              - traefik.port=4180
          volumes:
            - /var/data/config/swarmprom/authenticated-emails.txt:/authentica\
ted-emails.txt
          command: |
            -cookie-secure=false
            -upstream=http://unsee:8080
            -redirect-url=https://unsee.swarmprom.example.com
            -http-address=http://0.0.0.0:4180
            -email-domain=example.com
            -provider=github
            -authenticated-emails-file=/authenticated-emails.txt


        node-exporter:
          image: stefanprodan/swarmprom-node-exporter:v0.16.0
          networks:
            - internal
          environment:
            - NODE_ID={{.Node.ID}}
          volumes:
            - /proc:/host/proc:ro
            - /sys:/host/sys:ro
            - /:/rootfs:ro
            - /etc/hostname:/etc/nodename
          command:
            - '--path.sysfs=/host/sys'
            - '--path.procfs=/host/proc'
            - '--collector.textfile.directory=/etc/node-exporter/'
            - '--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|ho\
st|etc)($$|/)'
            # no collectors are explicitely enabled here, because the default\
s are just fine,
            # see https://github.com/prometheus/node_exporter
            # disable ipvs collector because it barfs the node-exporter logs \
full with errors on my centos 7 vm's
            - '--no-collector.ipvs'
          deploy:
            mode: global
            resources:
              limits:
                memory: 128M
              reservations:
                memory: 64M

        prometheus:
          image: stefanprodan/swarmprom-prometheus:v2.5.0
          networks:
            - internal
          command:
            - '--config.file=/etc/prometheus/prometheus.yml'
            - '--web.console.libraries=/etc/prometheus/console_libraries'
            - '--web.console.templates=/etc/prometheus/consoles'
            - '--storage.tsdb.path=/prometheus'
            - '--storage.tsdb.retention=24h'
          volumes:
            - /var/data/runtime/swarmprom/prometheus:/prometheus
          configs:
            - source: node_rules
              target: /etc/prometheus/swarm_node.rules.yml
            - source: task_rules
              target: /etc/prometheus/swarm_task.rules.yml
          deploy:
            mode: replicated
            replicas: 1
            placement:
              constraints:
                - node.role == manager
            resources:
              limits:
                memory: 2048M
              reservations:
                memory: 128M

        prometheus-proxy:
          image: a5huynh/oauth2_proxy
          env_file : /var/data/config/swarmprom/prometheus.env
          networks:
            - internal
            - traefik_public
          deploy:
            labels:
              - traefik.frontend.rule=Host:prometheus.swarmprom.example.com
              - traefik.docker.network=traefik_public
              - traefik.port=4180
          volumes:
            - /var/data/config/swarmprom/authenticated-emails.txt:/authentica\
ted-emails.txt
          command: |
            -cookie-secure=false
            -upstream=http://prometheus:9090
            -redirect-url=https://prometheus.swarmprom.example.com
            -http-address=http://0.0.0.0:4180
            -email-domain=example.com
            -provider=github
            -authenticated-emails-file=/authenticated-emails.txt


      networks:
        traefik_public:
          external: true
        internal:
          driver: overlay
          ipam:
            config:
              - subnet: 172.16.29.0/24
my list

44.4 Serving

Launch Swarmprom stack

Launch the Swarm stack by running docker stack deploy swarmprom -c <path -to-docker-compose.yml>

Log into your new grafana instance, check out your beautiful graphs. Move onto drooling over Prometheus, AlertManager, and Unsee.

44.5 Chef’s Notes

1. Pay close attention to the grafana.env config. If you encounter errors about basic auth failed, or failed CSS, it’s likely due to misconfiguration of one of the grafana environment variables.