# Cloud Costs Report

The Fairwinds Cloud Costs report syncs your Cloud billing data with Insights so it can know precisely what you're spending on nodes and use that information to infer accurate workload costs.

We currently support AWS, GCP (including GKE Standard and GKE Autopilot), and Azure (Alpha).

The Insights Agent Helm chart (insights-agent) exposes the cloud-costs options; see the chart README (opens new window) for the full configuration reference.

# AWS Billing Integration

The AWS Costs Report is built on AWS Costs and Usage Report (opens new window).

The first step is to create the Athena infrastructure using Terraform, CloudFormation, etc. The CUR report is created by AWS and stored in an AWS S3 bucket.

The Athena process in AWS collects CUR data from S3 and makes it available as a SQL table that can be queried.

If you use AWS Glue, you can see the infrastructure previously created to connect S3 CUR data into Athena.

Note: You will need to setup the one CUR Processing Infrastructure Setup per AWS account.

# CUR Processing Infrastructure Setup

Follow the steps below to setup your CUR Processing Infrastructure for each AWS account:

  • Ensure nodes for different clusters are tagged in a consistent way
    • E.g. nodes in your staging cluster have tag cluster=staging and your production cluster nodes have cluster=prod
  • Following the AWS CUR docs, create an S3 bucket where billing data can be stored
  • Create an Athena database for querying the S3 data
  • Create a Glue crawler to populate the data
  • Finally, configure the cloudcosts report within the values.yaml file of your Insights Agent

For convenience, we've provided some Terraform scripts, which can create the necessary AWS resources below.

# Configuring the Insights cloudcosts report for AWS

Once the AWS resources are in place, you'll need to configure the Insights Agent to start uploading your cost data from AWS.

Your Insights Agent values.yaml should include the section below, replacing any values with your own.

cloudcosts:
  enabled: true
  provider: aws
  # Credentials to AWS can be done with either access keys or IRSA. Choose one of the following:

  # Credentials with AWS Access Keys:
  # The AWS credentials should come from the aws-costs-service-account created below.
  # We recommend creating the awscostssecret yourself and specify secretName, but you can
  # also pass aws.accessKeyId and aws.secretAccessKey directly to the Helm chart (it will create the secret).
  secretName: awscostssecret

  # Credentials with IRSA (recommended for EKS): use serviceAccount.annotations instead of secretName.
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/IAM_ROLE_NAME
  tagkey: kubernetes_cluster
  tagvalue: staging
  days: 5
  aws:
    # If using access keys instead of secretName/IRSA, set accessKeyId and secretAccessKey (chart creates the secret).
    accessKeyId: ''
    secretAccessKey: ''
    tagprefix: 'resource_tags_user_'  # if not user custom tag, value should be resource_tags_
    region: us-east-1
    database: athena_cur_database
    table: fairwinds_insights_cur_report
    catalog: AwsDataCatalog
    workgroup: cur_athena_workgroup
  • database: the database created on AWS Glue Data
  • table: aws cur report name
  • tagkey: tag key is the tag used on EC2 to indicate that it's a cluster node. Ex: KubernetesCluster (in case of Kops). The column name in Athena has a prefix resource_tags_user_. Also AWS applies pascal camel to split the tag name. In this example the column in Athena will be: resource_tags_user_kubernetes_cluster. Tags with special characters must be replaced with underscores. Example: aws:eks:cluster-name should be aws_eks_cluster_name.
  • tagprefix: tagprefix is a prefix AWS adds to your tag in order to create Athena column. In this example the column in Athena will be: resource_tags_user_kubernetes_cluster. Ex: KubernetesCluster (in case of Kops). The column name in Athena has a prefix resource_tags_user_. In the case you are using a tag provided by AWS the prefix can be a little bit different like resource_tags_. Ex: if you are using standard tag aws:eks:cluster-name from AWS EKS you need to set:
tagprefix = resource_tags_
tagkey    = aws_eks_cluster_name

Athena column in this case is resource_tags_aws_eks_cluster_name

  • tagvalue: the value associated to the tag for filtering. Ex: production, staging
  • catalog: default AWS Glue Catalog is AwsDataCatalog
  • workgroup: workgroup created on Athena to be used on querying
  • days: number of days of cost data to query (default: 5)

# Terraform

Note that you may have to apply the files below twice in order to get them to sync fully.

# provider.tf

provider "aws" {
  region  = "us-east-1"
  profile = "default"
}

# variables.tf

variable "s3_bucket_name" {
  type    = string
  default = "fairwinds-insights-cur-report"
}
variable "s3_region" {
  type    = string
  default = "us-east-1"
}
variable "time_unit" {
  type    = string
  default = "HOURLY"
}
variable "aws_region" {
  type    = string
  default = "us-east-1"
}

# iam.tf

resource "aws_iam_role" "crawler-service-role" {
  name               = "crawler-service-role"
  assume_role_policy = data.aws_iam_policy_document.crawler-assume-policy.json
}
data "aws_iam_policy_document" "crawler-assume-policy" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = ["glue.amazonaws.com"]
    }
  }
}
resource "aws_iam_role_policy_attachment" "AWSGlueServiceRole-attachment" {
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"
  role       = aws_iam_role.crawler-service-role.name
}
resource "aws_iam_policy" "cur-report-s3-access" {
  name   = "cur-report-s3-access"
  path   = "/"
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": ["arn:aws:s3:::${var.s3_bucket_name}"],
      "Condition": {}
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": [
        "arn:aws:s3:::${var.s3_bucket_name}/*"
      ]
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "cur-report-s3-access" {
  role       = aws_iam_role.crawler-service-role.name
  policy_arn = aws_iam_policy.cur-report-s3-access.arn
}

resource "aws_s3_bucket_policy" "s3-bucket-cur-report-policy" {
  bucket = aws_s3_bucket.cur_bucket.id
  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "billingreports.amazonaws.com"
      },
      "Action": [
        "s3:GetBucketAcl",
        "s3:GetBucketPolicy"
      ],
      "Resource":"arn:aws:s3:::${var.s3_bucket_name}"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "billingreports.amazonaws.com"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::${var.s3_bucket_name}/*"
    }
  ]
}
EOF
}

resource "aws_iam_user" "aws-costs-service-account" {
  name = "aws-costs-service-account"
  path = "/"
  tags = {
    tag-key = "service-account"
  }
}
resource "aws_iam_user_policy" "aws-costs-service-policy" {
  name = "aws-costs-service-policy"
  user = aws_iam_user.aws-costs-service-account.name

  policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "athena:StartQueryExecution",
        "athena:GetQueryExecution",
        "athena:GetQueryResults",
        "glue:GetDatabase",
        "glue:GetTable",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:GetCrawler",
        "glue:GetTags"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
          "s3:GetBucketLocation",
          "s3:GetObject",
          "s3:ListBucket",
          "s3:ListBucketMultipartUploads",
          "s3:ListMultipartUploadParts",
          "s3:PutObject"
      ],
      "Resource": [
          "arn:aws:s3:::${var.s3_bucket_name}",
          "arn:aws:s3:::${var.s3_bucket_name}/*"
      ]
    }    
  ]
}
EOF
}

# main.tf

resource "aws_s3_bucket" "cur_bucket" {
  bucket = var.s3_bucket_name
  acl    = "private"
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
}
resource "aws_glue_crawler" "cur_report_crawler" {
  database_name = "athena_cur_database"
  schedule      = "cron(0/15 * * * ? *)"
  name          = "cur_report_crawler"
  role          = "crawler-service-role"
  configuration = jsonencode(
    {
      Grouping = {
        TableGroupingPolicy = "CombineCompatibleSchemas"
      }
      CrawlerOutput = {
        Partitions = { AddOrUpdateBehavior = "InheritFromTable" }
      }
      Version = 1
    }
  )
  s3_target {
    path = format("s3://%s/fairwinds-insights-cur/fairwinds-insights-cur-report/", var.s3_bucket_name)
  }
}
resource "aws_athena_database" "athena_cur_database" {
  name   = "athena_cur_database"
  bucket = var.s3_bucket_name
  force_destroy = true
}
resource "aws_cur_report_definition" "fairwinds_insights_cur_report" {
  report_name                = "fairwinds-insights-cur-report"
  time_unit                  = var.time_unit
  format                     = "Parquet"
  compression                = "Parquet"
  additional_schema_elements = ["RESOURCES"]
  s3_bucket                  = var.s3_bucket_name
  s3_region                  = var.s3_region
  s3_prefix                  = "fairwinds-insights-cur"
  additional_artifacts       = ["ATHENA"]
  report_versioning          = "OVERWRITE_REPORT"
  depends_on                 = [aws_s3_bucket.cur_bucket]
}
resource "aws_athena_workgroup" "cur_athena_workgroup" {
  name = "cur_athena_workgroup"
  configuration {
    enforce_workgroup_configuration    = true
    publish_cloudwatch_metrics_enabled = true
    result_configuration {
      output_location = format("s3://%s/fairwinds-insights-cur/fairwinds-insights-cur-report/output", var.s3_bucket_name)
    }
  }
}

# Google Cloud Provider (GCP) Billing Integration

The GCP Report is built on Google Cloud Billing (opens new window).

The first step is setting up Google Cloud Billing to export to BigQuery. To do this, follow these steps:

  • Make sure Billing is enabled
  • Enable BigQuery for data transfer
  • Create a BigQuery dataset
  • Enable Cloud Billing export to the BigQuery dataset

All steps are described in detail at the link below: Set up Cloud Billing data export to BigQuery (opens new window)

Fairwinds will use this table, which is created once you execute the above steps: <projectname>.<datasetName>.gcp_billing_export_resource_v1_<BILLING_ACCOUNT_ID>

NOTE: It may takes few days for Google to ingest all the billing data into BigQuery table.

# Create service account to run BigQuery

In GCP:

  1. Go to IAM & Admin > Select Service Account
  2. Click Create Service Account
  3. Give the service account a name then "Create and Continue"
  4. Grant roles: "BigQuery Data Viewer" and "BigQuery Job User" and click Done
  5. Make sure Workload Identity is enabled: you can enable at cluster overview page. Autopilot is enabled by default. Follow instructions from the page below: Use Workload Identity (opens new window)

For GKE Standard:

gcloud container clusters update your-cluster \
    --region=your-region \
    --workload-pool=your-project.svc.id.goog
gcloud container node-pools update your-pool \
    --cluster=your-cluster \
    --region=your-region \
    --workload-metadata=GKE_METADATA
  1. Bind your GCP service account and Kubernetes cloud costs service account: Use Workload Identity (opens new window) Example:
gcloud iam service-accounts add-iam-policy-binding {service-account-name}@{your-project}.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:{your-project}.svc.id.goog[insights-agent/insights-agent-cloudcosts]"
  1. Annotate the insights-agent-cloudcosts service account: Set the annotation under cloudcosts.serviceAccount.annotations in the insights-agent values.yaml (opens new window) (see the cloudcosts section). Example:
cloudcosts:
  serviceAccount:
    annotations:
      iam.gke.io/gcp-service-account: {service-account-name}@{your-project}.iam.gserviceaccount.com

# Configuring the Insights cloudcosts report for GCP

Once the GCP resources are in place, you'll need to configure the cloudcosts report in the Insights Agent to start uploading your cost data. Your values.yaml should include the section below, replacing any values with your own.

cloudcosts:
  enabled: true
  provider: gcp
  tagvalue: "my-gcp-cluster"
  days: 5
  gcp:
    projectname: "my-project"
    dataset: "insightscosts"
    billingaccount: "123456-777AAA-123456"
    table: ""         # optional; auto-derived from projectname/dataset/billingaccount if not set
  • provider: provider must be gcp
  • tagkey: optional. Label name used on GCP to indicate the cluster. Default is goog-k8s-cluster-name.
  • tagvalue: the value associated to the cluster name label for filtering. Ex: production, staging
  • projectname: GCP project name, required if table is not provided
  • dataset: dataset name you provided when you set up BigQuery for Billing, required if table is not provided
  • billingaccount: your Google Billing Account ID from the Billing console, used to derive the BigQuery table name. Example: 1A2B3C-4D5E6F-7G8H9I; required if table is not provided
  • table: optional; you can provide the full BigQuery table path instead of projectname/dataset/billingaccount
  • days: number of days of cost data to query (default: 5)

# Azure Billing Integration (Alpha)

The Azure Cloud Costs report uses Azure Cost Management (opens new window) and Azure AD Workload Identity (opens new window) so the Insights Agent can read cost data without storing credentials. Your AKS cluster must have workload identity (OIDC) enabled.

Azure applies a 2-day lag to cost data (today and yesterday are excluded) so that usage is fully finalized. Filtering is done server-side via the Cost Management API; if you omit tagkey, the default is kubernetes-cluster.

# Create Azure AD Workload Identity for cloud-costs

When using cloudcosts with provider: azure, the Insights Agent uses a federated credential. Create an App registration (or user-assigned managed identity) in Microsoft Entra ID and add a federated identity credential that matches your AKS cluster and the cloud-costs service account.

  1. Create an App registration (or use an existing one) in Microsoft Entra ID (opens new window). Note the Application (client) ID and Directory (tenant) ID; you will set these as cloudcosts.azure.workloadIdentity.clientId and cloudcosts.azure.workloadIdentity.tenantId.

  2. Get your AKS cluster OIDC issuer URL (use the exact value, including a trailing slash if present):

    az aks show --name <cluster-name> --resource-group <resource-group> \
      --query "oidcIssuerProfile.issuerUrl" -o tsv
    
  3. Create a federated identity credential on the App registration. In Azure Portal: App registrations → your app → Certificates & secretsFederated credentialsAdd credential. Or with Azure CLI (replace <app-object-id>, <issuer-url>, and <credential-name>):

    az ad app federated-credential create \
      --id <app-object-id> \
      --parameters '{
        "name": "<credential-name>",
        "issuer": "<issuer-url>",
        "subject": "system:serviceaccount:insights-agent:insights-agent-cloudcosts",
        "description": "AKS workload identity for insights-agent cloud-costs",
        "audiences": ["api://AzureADTokenExchange"]
      }'
    
    • Subject must be exactly: system:serviceaccount:insights-agent:insights-agent-cloudcosts (namespace insights-agent, service account insights-agent-cloudcosts). If you install the Insights Agent in a different namespace, use system:serviceaccount:<namespace>:insights-agent-cloudcosts.
    • Issuer must match the URL from step 2 exactly (including or excluding a trailing slash as returned).
    • Audiences: api://AzureADTokenExchange.

    To get the app object ID for --id: az ad app show --id <client-id> --query id -o tsv.

  4. Assign RBAC so the identity can read cost data. Grant the app's service principal Reader and Cost Management Reader on the subscription:

    az role assignment create --role "Reader" \
      --assignee <client-id> --scope /subscriptions/<subscription-id>
    az role assignment create --role "Cost Management Reader" \
      --assignee <client-id> --scope /subscriptions/<subscription-id>
    

    Use the same client ID as cloudcosts.azure.workloadIdentity.clientId and your subscription ID as cloudcosts.azure.subscription.

# Configuring the Insights cloudcosts report for Azure

After the App registration and federated credential are in place, configure the Insights Agent. Your values.yaml should include:

cloudcosts:
  enabled: true
  provider: azure
  azure:
    subscription: "<your-subscription-id>"
    workloadIdentity:
      clientId: "<application-client-id>"
      tenantId: "<directory-tenant-id>"
  # optional: tagkey/tagvalue for filtering
  # tagkey: ""
  # tagvalue: ""
  # days: 5
  • provider: must be azure
  • subscription: Azure subscription ID (required)
  • workloadIdentity.clientId: Application (client) ID of the App registration (required)
  • workloadIdentity.tenantId: Directory (tenant) ID (required)
  • tagkey: optional; tag name to filter resources (default: kubernetes-cluster). Resources must be tagged in Azure for the filter to apply.
  • tagvalue: optional; tag value to filter to a specific cluster. If provided with tagkey, the API returns only costs for resources with that tag.
  • days: number of days of cost data to query (default: 5). Note the 2-day lag: Azure excludes the most recent two days.

No Kubernetes Secret is needed for Azure; authentication uses workload identity only.

# Terraform

Terraform for Google Cloud Provider (GCP) Billing Integration

# versions.tf

terraform {
  required_version = ">= 0.13"
  required_providers {
    google = {
      source = "hashicorp/google"
    }
  }
}

# variables.tf

variable "project_name" {
  type = string
}

# gcp-cloud-costs.auto.tfvars

project_name = "my-gcp-project"

# main.tf

resource "google_service_account" "bigqueryaccess" {
  account_id   = "bigqueryaccess"
  display_name = "Big query Access"
}

resource "google_project_iam_member" "bigquery_iam_member_dataViewer" {
  role    = "roles/bigquery.dataViewer"
  member  = "serviceAccount:${google_service_account.bigqueryaccess.email}"
  project = "${var.project_name}"
}

resource "google_project_iam_member" "bigquery_iam_member_jobUser" {
  role    = "roles/bigquery.jobUser"
  member  = "serviceAccount:${google_service_account.bigqueryaccess.email}"
  project = "${var.project_name}"
}

resource "google_service_account_iam_binding" "bigqueryaccess_workload_identity" {
  service_account_id = google_service_account.bigqueryaccess.name
  role               = "roles/iam.workloadIdentityUser"
  members = [
    "serviceAccount:${var.project_name}.svc.id.goog[insights-agent/insights-agent-cloudcosts]",
  ]
}

##########################################################
# Standard GKE only ADDITIONAL STEPS, ignore if Autopilot
# For standard GKE add these in your google_container_cluster resource
# To enable workload identity
#
#workload_identity_config {
#    identity_namespace = "${var.google_project}.svc.id.goog"
#}
# To enable workload identity in node pools
#workload_metadata_config {
#    node_metadata = "GKE_METADATA_SERVER"
#}
##########################################################