# Prometheus Collector
The Prometheus Collector report gathers workload metrics from a Prometheus installation in order to provide fine-grained resource usage data. This can be used to gauge how much different workloads cost, understand cost trends and help set resource requests and limits.
Note: Prometheus Collector requires
kube-state-metrics
andmetrics-server
to be installed and running in the cluster.
# Use an Existing Prometheus Installation
If you already have Prometheus installed you can point Insights to the service endpoint of your installation. If you installed the Prometheus operator, the service endpoint will likely end in port 9090
, and if you only installed the prometheus-server the service endpoint will probably end in port 80
. To configure this in the values.yaml
file use the following format:
prometheus-metrics:
enabled: true
address: "http://<prometheus-service-name>.<namespace>.svc.cluster.local:<port>"
# Install a New Prometheus
The Insights Agent chart can also install a new Prometheus server in your cluster to use.
To install Prometheus alongside the Agent, add the following to your values.yaml
:
prometheus-metrics:
enabled: true
installPrometheusServer: true
# Sample Report
Prometheus Collector contains CPU and Memory usage for different workloads
{
"Values": [
{
"Container": "autoscaler",
"ControllerKind": "Deployment",
"ControllerName": "kube-dns-autoscaler",
"ControllerNamespace": "kube-system",
"LimitValue": 0,
"Metric": "Memory",
"PodName": "kube-dns-autoscaler-b48d96894-mjtkt",
"Request": 10485760,
"StartTime": "2021-02-01T13:20:00Z",
"Value": 8777728
},
{
"Container": "autoscaler",
"ControllerKind": "Deployment",
"ControllerName": "kube-dns-autoscaler",
"ControllerNamespace": "kube-system",
"LimitValue": 0,
"Metric": "CPU",
"PodName": "kube-dns-autoscaler-b48d96894-mjtkt",
"Request": 20,
"StartTime": "2021-02-01T13:21:00Z",
"Value": 0
}
]
}
# Integration with GKE Autopilot / GCP Managed Prometheus
Insights requires a Prometheus server to collect metrics for workload usage. Typically, this is a Prometheus server that is already running in a Kubernetes cluster, or a Prometheus server that is installed directly via the Insights Agent Helm Chart.
In GKE Autopilot, users are required to use the GCP Managed Prometheus offering to collect the require container metrics. GCP Managed Prometheus may increase your overall GCP spend and requires additional configuration for the Insights Agent to read those metrics.
Follow the below steps for setting up GCP Managed Prometheus and connecting it to Fairwinds Insights.
# 1. Collect Kubelet/cAdvisor metrics
GCP Managed Prometheus must be configured to scrape the Kubelet for Kubelet and cAdvisor metrics. This can be setup by editing the OperatorConfig resource as documented here: Install kubelet-cadvisor (opens new window)
# 2. Install kube-state-metrics
GCP Managed Prometheus needs a Kube State Metrics instance installed in order to get metrics from the Kubernetes API. Use the configuration in the "Install Kube State Metrics" section at link below to set this up: Configure kube-state-metrics (opens new window)
Note: Google yaml does not include job and cronjob. You may beed to update yaml to include those.
regex: kube_(cronjob|daemonset|deployment|job|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler|job_created)(_.+)?
# 3. Create Google service account to run Prometheus query
- Go to IAM & Admin > Select Service Account
- Click Create Service Account
- Give the service account a name then "Create and Continue"
- Grant roles: "Monitoring Viewer" and "Service Account Token Creator" and click Done
- Use the service account when configuring prometheus-metrics with the service account created
prometheus-metrics:
enabled: true
installPrometheusServer: false
address: https://monitoring.googleapis.com/v1/projects/gcp-prime/location/global/prometheus # managed prometheus address
managedPrometheusClusterName: "my-autopilot-cluster"
serviceAccount:
annotations:
iam.gke.io/gcp-service-account: <my-service-account>@gcp-prime.iam.gserviceaccount.com
- address: required when you are not using our standard prometheus installation, at the example above provides the GCP Managed Prometheus address
- managedPrometheusClusterName: required only when using Managed Promehteus, as Managed Prometheus may have data from multiple clusters
- Make kubernetes insights-agent-prometheus-metrics service account member to google service account and bind to workload identity role
gcloud iam service-accounts add-iam-policy-binding <my-service-account>@gcp-prime.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:gcp-prime.svc.id.goog[insights-agent/insights-agent-prometheus-metrics]"
# Terraform
Integration with GKE Autopilot / GCP Managed Prometheus using Terraform
# versions.tf
terraform {
required_version = ">= 0.13"
required_providers {
aws = {
source = "hashicorp/google"
}
}
}
# variables.tf
variable "project_name" {
type = string
}
variable "config_path" {
type = string
}
variable "gke_cluster_name" {
type = string
}
# gcp-managed-prometheus.auto.tfvars
project_name = "my-gcp-project"
config_path= "~/.kube/config"
gke_cluster_name = "gke_gcp-prime_us-central1_my_gcp_cluster"
# main.tf
provider "kubernetes" {
config_path = "${var.config_path}"
config_context = "${var.gke_cluster_name}"
}
resource "null_resource" "prometheus_enable_cadvisor" {
provisioner "local-exec" {
command = <<EOF
kubectl patch operatorconfig/config --namespace gmp-public --type merge --patch '{"collection": { "kubeletScraping": {"interval": "30s" }}}'
EOF
}
}
resource "kubectl_manifest" "install_kube_state_metrics" {
yaml_body = <<YAML
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.8.2
namespace: gmp-public
name: kube-state-metrics
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
serviceName: kube-state-metrics
template:
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.8.2
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- arm64
- amd64
- key: kubernetes.io/os
operator: In
values:
- linux
containers:
- name: kube-state-metric
image: k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.8.2
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
args:
- --pod=$(POD_NAME)
- --pod-namespace=$(POD_NAMESPACE)
- --port=8080
- --telemetry-port=8081
ports:
- name: metrics
containerPort: 8080
- name: metrics-self
containerPort: 8081
resources:
requests:
cpu: 100m
memory: 190Mi
limits:
memory: 250Mi
securityContext:
allowPrivilegeEscalation: false
privileged: false
capabilities:
drop:
- all
runAsUser: 1000
runAsGroup: 1000
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /
port: 8081
initialDelaySeconds: 5
timeoutSeconds: 5
serviceAccountName: kube-state-metrics
---
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.8.2
namespace: gmp-public
name: kube-state-metrics
spec:
clusterIP: None
ports:
- name: metrics
port: 8080
targetPort: metrics
- name: metrics-self
port: 8081
targetPort: metrics-self
selector:
app.kubernetes.io/name: kube-state-metrics
---
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: gmp-public
name: kube-state-metrics
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.8.2
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gmp-public:kube-state-metrics
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.8.2
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gmp-public:kube-state-metrics
subjects:
- kind: ServiceAccount
namespace: gmp-public
name: kube-state-metrics
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gmp-public:kube-state-metrics
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.8.2
rules:
- apiGroups:
- ""
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
- ingresses
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs:
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- get
- apiGroups:
- batch
resources:
- cronjobs
- jobs
verbs:
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- authentication.k8s.io
resources:
- tokenreviews
verbs:
- create
- apiGroups:
- authorization.k8s.io
resources:
- subjectaccessreviews
verbs:
- create
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- list
- watch
- apiGroups:
- certificates.k8s.io
resources:
- certificatesigningrequests
verbs:
- list
- watch
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
- volumeattachments
verbs:
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resources:
- mutatingwebhookconfigurations
- validatingwebhookconfigurations
verbs:
- list
- watch
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
- ingresses
verbs:
- list
- watch
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs:
- list
- watch
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: kube-state-metrics
namespace: gmp-public
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: kube-state-metrics
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 60
behavior:
scaleDown:
policies:
- type: Pods
value: 1
# Under-utilization needs to persist for `periodSeconds` before any action can be taken.
# Current supported max from https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2beta2/.
periodSeconds: 1800
# Current supported max from https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2beta2/.
stabilizationWindowSeconds: 3600
---
apiVersion: monitoring.googleapis.com/v1
kind: ClusterPodMonitoring
metadata:
name: kube-state-metrics
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: google-cloud-managed-prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
endpoints:
- port: metrics
interval: 30s
metricRelabeling:
- action: keep
# Curated subset of metrics to reduce costs while populating default set of sample dashboards at
# https://github.com/GoogleCloudPlatform/monitoring-dashboard-samples/tree/master/dashboards/kubernetes
# Change this regex to fit your needs for which objects you want to monitor
regex: kube_(cronjob|daemonset|deployment|job|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler|job_created)(_.+)?
sourceLabels: [__name__]
targetLabels:
metadata: [] # explicitly empty so the metric labels are respected
---
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
namespace: gmp-public
name: kube-state-metrics
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: google-cloud-managed-prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: kube-state-metrics
endpoints:
- port: metrics-self
interval: 30s
YAML
}
resource "google_service_account" "prometheusqueryaccess" {
account_id = "prometheusqueryaccess"
display_name = "Prometheus query Access"
}
resource "google_project_iam_member" "prometheus_project_iam_viewer_member" {
role = "roles/monitoring.viewer"
member = "serviceAccount:${google_service_account.prometheusqueryaccess.email}"
project = "${var.project_name}"
}
resource "google_project_iam_member" "prometheus_project_iam_token_creator_member" {
role = "roles/iam.serviceAccountTokenCreator"
member = "serviceAccount:${google_service_account.prometheusqueryaccess.email}"
project = "${var.project_name}"
}
resource "google_service_account_iam_binding" "prometheus_workload_identity" {
service_account_id = "${google_service_account.prometheusqueryaccess.name}"
role = "roles/iam.workloadIdentityUser"
members = [
"serviceAccount:${var.project_name}.svc.id.goog[insights-agent/insights-agent-prometheus-metrics]",
]
}
# Integration with AKS / Azure Monitor
If Azure Monitor managed service for Prometheus is being used for Prometheus in the cluster, prometheus-metrics can be configured to pull from its API.
If Azure Monitor has not been enabled, follow these steps in this guide: Enable Azure Monitor in an existing cluster (opens new window)
# 1. Deploy a Prometheus authorization proxy
An authorization proxy is used for prometheus-metrics to pull metrics from the Azure Monitor API. Follow this guide to configure and deploy the proxy to your AKS cluster: Deploy a prometheus authorization proxy (opens new window)
# 2. Update the insights-agent
values
Update the insights-agent
values with the the service name of the authorization proxy created in the previous step:
prometheus-metrics:
enabled: true
installPrometheusServer: false
address: http://<proxy-service-name>.<proxy-service-namespace>.svc.cluster.local
# Troubleshooting
If the current resource values of your workloads are missing or reporting as 'unset' in the Efficency section and you are instaling your own prometheus instance, it's likely that kube-state-metrics (KSM) is not installed.
If you are installing with the kube-prometheus-stack chart, kube-state-metrics is enabled by default and is controlled with the top level key kube-state-metrics.enabled: true (opens new window)
It can also be installed via the dedicated kube-state-metrics chart here: Install kube-state-metrics (opens new window)
If KSM appears to be running fine, check for any network policies that might prevent Prometheus from scraping kube-state-metrics
.
← Goldilocks OPA →