Architecture
Built with the Prometheus Operator. Metrics are collected from every namespace by a Cluster Scope Prometheus Instance. High availibility is managed through Thanos. Tenants query metrics through central multi-tenancy enabled query endpoint built with kube-rbac-proxy and prom-label-proxy.
flowchart LR A[Tenant] -->|1| grafana(Grafana) subgraph Tenant Namespace sm([ServiceMonitor]) --> dep(Deployments) qsa([Query Service Account]) <-.-> grafana end sm <-.-> Prometheus subgraph Monitoring Namespace subgraph Thanos Query Frontend grafana -->|2| krp(kube-rbac-proxy) krp -->|6| plp(prom-label-proxy) end plp -->|7| Prometheus subgraph Prometheus thanos(Thanos) --> prom(Prometheus TSDB) thanos --> s3[(BackBlaze B2)] end end subgraph Kubernetes krp -->|3| sar{{SubjectAccessReview}} sar <-->|4| qsa sar -->|5| krp end
Metrics are stored within the Prometheus TSDB for 7 days. Thanos retains them for 30 days, then downsamples points to 5 minute intervals at before 60 days & 1 hour after.
Usage
Existing Metrics
Per default, the kubelet and kube-state-metrics are collected cluster wide. These metrics give insight into cluster performance, Kubernetes resources & other related points.
Generating Metrics
To feed metrics into Prometheus, use ServiceMonitors
to periodically scrape data from application exporter endpoints.
There are a few other ways to achieve this, like PodMonitors
or Annotations.
Visualizing Metrics
Create a Service Account as described in Observability Usage. Then create a Grafana Instance & a Prometheus GrafanaDataSource
.