Architecture
Built with the Prometheus Operator. Metrics are collected from every namespace by a Cluster Scope Prometheus Instance. High availibility is managed through Thanos. Tenants query metrics through central multi-tenancy enabled query endpoint built with kube-rbac-proxy and prom-label-proxy.
flowchart LR
A[Tenant] -->|1| grafana(Grafana)
subgraph Tenant Namespace
sm([ServiceMonitor]) --> dep(Deployments)
qsa([Query Service Account]) <-.-> grafana
end
sm <-.-> Prometheus
subgraph Monitoring Namespace
subgraph Thanos Query Frontend
grafana -->|2| krp(kube-rbac-proxy)
krp -->|6| plp(prom-label-proxy)
end
plp -->|7| Prometheus
subgraph Prometheus
thanos(Thanos) --> prom(Prometheus TSDB)
thanos --> s3[(BackBlaze B2)]
end
end
subgraph Kubernetes
krp -->|3| sar{{SubjectAccessReview}}
sar <-->|4| qsa
sar -->|5| krp
end
Metrics are stored within the Prometheus TSDB for 7 days. Thanos retains them for 30 days, then downsamples points to 5 minute intervals at before 60 days & 1 hour after.
Usage
Existing Metrics
Per default, the kubelet and kube-state-metrics are collected cluster wide. These metrics give insight into cluster performance, Kubernetes resources & other related points.
Generating Metrics
To feed metrics into Prometheus, use ServiceMonitors
to periodically scrape data from application exporter endpoints.
There are a few other ways to achieve this, like PodMonitors
or Annotations.
Visualizing Metrics
Create a Service Account as described in Observability Usage. Then create a Grafana Instance & a Prometheus GrafanaDataSource
.