Architecture
Built with the Prometheus Operator. Metrics are collected from every namespace by a Cluster Scope Prometheus Instance. High availibility is managed through Thanos. Tenants query metrics through central multi-tenancy enabled query endpoint built with kube-rbac-proxy and prom-label-proxy.
flowchart LR A[Tenant] -->|1| grafana(Grafana) subgraph Tenant Namespace sm([ServiceMonitor]) --> dep(Deployments) qsa([Query Service Account]) <-.-> grafana end sm <-.-> Prometheus subgraph Monitoring Namespace subgraph Thanos Query Frontend grafana -->|2| krp(kube-rbac-proxy) krp -->|6| plp(prom-label-proxy) end plp -->|7| Prometheus subgraph Prometheus thanos(Thanos) --> prom(Prometheus TSDB) thanos --> s3[(BackBlaze B2)] end end subgraph Kubernetes krp -->|3| sar{{SubjectAccessReview}} sar <-->|4| qsa sar -->|5| krp end
Metrics are stored within the Prometheus TSDB for 7 days. Thanos retains them for 30 days, then downsamples points to 5 minute intervals at before 60 days & 1 hour after.
Usage
Existing Metrics
Per default, the kubelet and kube-state-metrics are collected cluster wide. These metrics give insight into cluster performance, Kubernetes resources & other related points.
Generating Metrics
To feed metrics into Prometheus, use ServiceMonitors
to periodically scrape data from application exporter endpoints.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: example-monitoring
spec:
selector:
matchLabels:
# scrapes all services which match this label
application: example
endpoints:
# on the monitoring port, every 30s
- port: monitoring
interval: 30s
There are a few other ways to achieve this, like PodMonitors
or Annotations.
Visualizing Metrics
Create a Service Account as described in Observability Usage. Then create a Grafana Instance & a Prometheus GrafanaDataSource
.
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
name: metrics
labels:
app: grafana
spec:
instanceSelector:
matchLabels:
app: grafana
valuesFrom:
- targetPath: secureJsonData.httpHeaderValue1
valueFrom:
secretKeyRef:
key: token
name: grafana-ds-sa-token
datasource:
name: metrics
type: prometheus
uid: prometheus1
access: proxy
url: "https://thanos-querier-frontend.monitoring.svc:9090/"
isDefault: false
editable: false
jsonData:
# set the query parameter to the correct namespace
customQueryParameters: "namespace=<tenant>"
# pass the Service Account JWT from the secret
httpHeaderName1: Authorization
httpMethod: GET
manageAlerts: false
queryTimeout: 5m
timeInterval: 30s
tlsSkipVerify: true
secureJsonData:
httpHeaderValue1: "Bearer ${token}"