# Prometheus Collector

The Prometheus Collector report gathers workload metrics from a Prometheus installation in order to provide fine-grained resource usage data. This can be used to gauge how much different workloads cost, understand cost trends and help set resource requests and limits.

Note: Prometheus Collector requires kube-state-metrics and metrics-server to be installed and running in the cluster.

# Report schedule

Even though it's possible to change the report schedule we recommend to keep the default value for this report, which is every 10 minutes. Changing this schedule may cause prometheus data and Costs to be much less accurate. Report missing for more than 3h is considered Offline in Clusters page.

# Use an Existing Prometheus Installation

If you already have Prometheus installed you can point Insights to the service endpoint of your installation. If you installed the Prometheus operator, the service endpoint will likely end in port 9090, and if you only installed the prometheus-server the service endpoint will probably end in port 80. To configure this in the values.yaml file use the following format:

prometheus-metrics:
  enabled: true
  address: "http://<prometheus-service-name>.<namespace>.svc.cluster.local:<port>"

# Install a New Prometheus

The Insights Agent chart can also install a new Prometheus server in your cluster to use. To install Prometheus alongside the Agent, add the following to your values.yaml:

prometheus-metrics:
  enabled: true
  installPrometheusServer: true

# Sample Report

Prometheus Collector contains CPU and Memory usage for different workloads

{
    "Values": [
        {
            "Container": "autoscaler",
            "ControllerKind": "Deployment",
            "ControllerName": "kube-dns-autoscaler",
            "ControllerNamespace": "kube-system",
            "LimitValue": 0,
            "Metric": "Memory",
            "PodName": "kube-dns-autoscaler-b48d96894-mjtkt",
            "Request": 10485760,
            "StartTime": "2021-02-01T13:20:00Z",
            "Value": 8777728
        },
        {
            "Container": "autoscaler",
            "ControllerKind": "Deployment",
            "ControllerName": "kube-dns-autoscaler",
            "ControllerNamespace": "kube-system",
            "LimitValue": 0,
            "Metric": "CPU",
            "PodName": "kube-dns-autoscaler-b48d96894-mjtkt",
            "Request": 20,
            "StartTime": "2021-02-01T13:21:00Z",
            "Value": 0
        }
    ]
}

# Steps to Intall Insights when running integration with GKE Autopilot / GCP Managed Prometheus

Insights requires a Prometheus server to collect metrics for workload usage. Typically, this is a Prometheus server that is already running in a Kubernetes cluster, or a Prometheus server that is installed directly via the Insights Agent Helm Chart.

In GKE Autopilot, users are required to use the GCP Managed Prometheus offering to collect the require container metrics. GCP Managed Prometheus may increase your overall GCP spend and requires additional configuration for the Insights Agent to read those metrics.

Follow the below steps for setting up GCP Managed Prometheus and connecting it to Fairwinds Insights.

# 1. Collect Kubelet/cAdvisor metrics

GCP Managed Prometheus must be configured to scrape the Kubelet for Kubelet and cAdvisor metrics. This can be setup by editing the OperatorConfig resource as documented here: Install kubelet-cadvisor (opens new window)

# 2. Update kube-state-metrics

Autopilot has Kube State Metrics installed by default in newer versions (double check if that's your case) but we need some additional metrics. Update kube-state-metrics using this yaml:

apiVersion: monitoring.googleapis.com/v1
kind: ClusterPodMonitoring
metadata:
  name: kube-state-metrics-fw
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/part-of: google-cloud-managed-prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  endpoints:
  - port: metrics
    interval: 30s
    metricRelabeling:
    - action: keep
      # Curated subset of metrics to reduce costs while populating default set of sample dashboards at
      # https://github.com/GoogleCloudPlatform/monitoring-dashboard-samples/tree/master/dashboards/kubernetes
      # Change this regex to fit your needs for which objects you want to monitor    
      regex: kube_(cronjob|daemonset|deployment|job|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler|job_created)(_.+)?
      sourceLabels: [__name__]
  targetLabels:
    metadata: [] # explicitly empty so the metric labels are respected

# 3. Create Google service account to run Prometheus query

You can create service account to run Prometheus query: manually, using Workload Identity Federation or Terraform.

# 3.a Create Google service account to run Prometheus query manually

You can create the the service account either manually or using Terraform. We provide some example on section 3.b Use Terraform

  1. Go to IAM & Admin > Select Service Account
  2. Click Create Service Account
  3. Give the service account a name then "Create and Continue"
  4. Grant roles: "Monitoring Viewer" and "Service Account Token Creator" and click Done
  5. Use the service account when configuring prometheus-metrics with the service account created

Example of snipet configuration for prometheus-metrics that needs to be provided in file values.yaml in step 5 (Install insights-agent):

prometheus-metrics:
  enabled: true
  installPrometheusServer: false
  address: https://monitoring.googleapis.com/v1/projects/<project-name>/location/global/prometheus # managed prometheus address
  managedPrometheusClusterName: "my-autopilot-cluster"
  serviceAccount:
    annotations:
      iam.gke.io/gcp-service-account: <my-service-account>@<project-name>.iam.gserviceaccount.com  
  • address: required when you are not using our standard prometheus installation, at the example above provides the GCP Managed Prometheus address
  • managedPrometheusClusterName: required only when using Managed Prometheus, as Managed Prometheus may have data from multiple clusters
  1. Make kubernetes insights-agent-prometheus-metrics service account member to google service account and bind to workload identity role
gcloud iam service-accounts add-iam-policy-binding <my-service-account>@<project-name>.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:<project-name>.svc.id.goog[insights-agent/insights-agent-prometheus-metrics]"

# 3.b Use Workload Identity Federation to give permissions to Kubernetes service account

Run:

gcloud projects add-iam-policy-binding projects/<project-name> --role=roles/monitoring.Viewer --member=principal://iam.googleapis.com/projects/<project-id-number>/locations/global/workloadIdentityPools/<project-name>.svc.id.goog/subject/ns/insights-agent/sa/insights-agent-prometheus-metrics --condition=None

gcloud projects add-iam-policy-binding projects/<project-name> --role=roles/iam.serviceAccountTokenCreator --member=principal://iam.googleapis.com/projects/<project-id-number>/locations/global/workloadIdentityPools/<project-name>.svc.id.goog/subject/ns/insights-agent/sa/insights-agent-prometheus-metrics --condition=None

Example of snipet configuration for prometheus-metrics that needs to be provided in file values.yaml in step 5 (Install insights-agent):

prometheus-metrics:
  enabled: true
  installPrometheusServer: false
  address: https://monitoring.googleapis.com/v1/projects/<project-name>/location/global/prometheus # managed prometheus address
  managedPrometheusClusterName: "my-autopilot-cluster"
  • address: required when you are not using our standard prometheus installation, at the example above provides the GCP Managed Prometheus address
  • managedPrometheusClusterName: required only when using Managed Prometheus, as Managed Prometheus may have data from multiple clusters

# 3.c Use Terraform

Integration with GKE Autopilot / GCP Managed Prometheus using Terraform

# versions.tf

terraform {
  required_version = ">= 0.13"
  required_providers {
    aws = {
      source = "hashicorp/google"
    }
  }
}

# variables.tf

variable "project_name" {
  type = string
}

variable "config_path" {
  type = string
}

variable "gke_cluster_name" {
  type = string
}

# gcp-managed-prometheus.auto.tfvars

project_name = "my-gcp-project"
config_path= "~/.kube/config"
gke_cluster_name = "gke_myproject_us-central1_my_gcp_cluster"

# main.tf

provider "kubernetes" {
  config_path    = "${var.config_path}"
  config_context = "${var.gke_cluster_name}"
}

resource "null_resource" "prometheus_enable_cadvisor" {
  provisioner "local-exec" {
    command = <<EOF
kubectl patch operatorconfig/config --namespace gmp-public --type merge --patch '{"collection": { "kubeletScraping": {"interval": "30s" }}}'
EOF  
  }
}

resource "kubectl_manifest" "install_kube_state_metrics" {
    yaml_body = <<YAML
apiVersion: monitoring.googleapis.com/v1
kind: ClusterPodMonitoring
metadata:
  name: kube-state-metrics-fw
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/part-of: google-cloud-managed-prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  endpoints:
  - port: metrics
    interval: 30s
    metricRelabeling:
    - action: keep
      # Curated subset of metrics to reduce costs while populating default set of sample dashboards at
      # https://github.com/GoogleCloudPlatform/monitoring-dashboard-samples/tree/master/dashboards/kubernetes
      # Change this regex to fit your needs for which objects you want to monitor    
      regex: kube_(cronjob|daemonset|deployment|job|replicaset|pod|namespace|node|statefulset|persistentvolume|horizontalpodautoscaler|job_created)(_.+)?
      sourceLabels: [__name__]
  targetLabels:
    metadata: [] # explicitly empty so the metric labels are respected
YAML
}

resource "google_service_account" "prometheusqueryaccess" {
  account_id   = "prometheusqueryaccess"
  display_name = "Prometheus query Access"
}

resource "google_project_iam_member" "prometheus_project_iam_viewer_member" {
  role    = "roles/monitoring.viewer"
  member  = "serviceAccount:${google_service_account.prometheusqueryaccess.email}"
  project = "${var.project_name}"
}

resource "google_project_iam_member" "prometheus_project_iam_token_creator_member" {
  role    = "roles/iam.serviceAccountTokenCreator"
  member  = "serviceAccount:${google_service_account.prometheusqueryaccess.email}"
  project = "${var.project_name}"
}

resource "google_service_account_iam_binding" "prometheus_workload_identity" {
  service_account_id = "${google_service_account.prometheusqueryaccess.name}"
  role               = "roles/iam.workloadIdentityUser"
  members = [
    "serviceAccount:${var.project_name}.svc.id.goog[insights-agent/insights-agent-prometheus-metrics]",
  ]
}

# 4. Optionally you can install integration with GCP Billing in order to have more accurate costs. Instructions can be found here:

Google Cloud Provider (GCP) Billing Integration (opens new window)

# 5. Install cert-manager if you are enabling insights admission:

helm install --create-namespace --namespace cert-manager --set installCRDs=true --set global.leaderElection.namespace=cert-manager cert-manager jetstack/cert-manager

# 6. Install insights-agent. Intructions can be found here:

Install insights-agent (opens new window)

# Integration with AKS / Azure Monitor

If Azure Monitor managed service for Prometheus is being used for Prometheus in the cluster, prometheus-metrics can be configured to pull from its API.

If Azure Monitor has not been enabled, follow these steps in this guide: Enable Azure Monitor in an existing cluster (opens new window)

# 1. Deploy a Prometheus authorization proxy

An authorization proxy is used for prometheus-metrics to pull metrics from the Azure Monitor API. Follow this guide to configure and deploy the proxy to your AKS cluster: Deploy a prometheus authorization proxy (opens new window)

# 2. Update the insights-agent values

Update the insights-agent values with the the service name of the authorization proxy created in the previous step:

prometheus-metrics:
  enabled: true
  installPrometheusServer: false
  address: http://<proxy-service-name>.<proxy-service-namespace>.svc.cluster.local

# Troubleshooting

If the current resource values of your workloads are missing or reporting as 'unset' in the Efficency section and you are instaling your own prometheus instance, it's likely that kube-state-metrics (KSM) is not installed.

If you are installing with the kube-prometheus-stack chart, kube-state-metrics is enabled by default and is controlled with the top level key kube-state-metrics.enabled: true (opens new window)

It can also be installed via the dedicated kube-state-metrics chart here: Install kube-state-metrics (opens new window)

If KSM appears to be running fine, check for any network policies that might prevent Prometheus from scraping kube-state-metrics.