Skip to main content
ZSoftly logo
DevOps

Kubernetes FinOps for Growing Companies: From Chaos to Cost Control

January 28, 2025
7 min read
Share:
Kubernetes FinOps for Growing Companies: From Chaos to Cost Control - Featured image

The Cost Attribution Problem

Without proper labeling and tracking, Kubernetes costs become a black box that's impossible to optimize or allocate to teams. The principle is visibility before optimization. You cannot optimize what you cannot measure.


TL;DR

Kubernetes costs without proper labeling and tracking become a black box—impossible to optimize or allocate to teams. FinOps brings cost control through standardized labeling conventions, team-level budgets with namespace quotas, and automated cleanup of idle resources. Results: 40% cost reduction, 60% improvement in attribution, 2 hours/week saved.

The Strategy: Establish consistent metadata (team, project, environment, cost-center, owner), enforce labels via OPA/Gatekeeper admission controllers, implement ResourceQuotas per namespace translating to monthly budget caps, automate cleanup with TTL policies.

Key Takeaways:

  1. Standardized labeling enforced at admission time - Every resource must have team, project, environment, cost-center, owner labels. OPA/Gatekeeper policies reject deployments missing labels. Make unlabeled deployment impossible—retroactive labeling is painful.
  2. Team-level budgets create accountability - ResourceQuotas per namespace define CPU/memory/storage limits correlating to monthly costs. Teams hitting limits must optimize before requesting more. Alerts at 70%, 90%, 100%.
  3. Automated cleanup recovers 40% of spend - TTL policies delete ephemeral environments after 7 days, unused PersistentVolumes after 30 days, idle workloads using <10% requests. Dev scheduling (scale to zero Friday 5PM, up Monday 8AM) saves 65%.
  4. Cost visibility at commit time prevents surprises - Parse Kubernetes manifests in PRs, calculate monthly cost estimates, post as PR comments, block if exceeds budget. Shift cost awareness left to developers.

Real-World Results: 40% cost reduction via automated cleanup and TTL policies. 60% improvement in cost attribution via standardized labeling. 2 hours/week saved with automated reports and self-service dashboards.

Core Principle: Transparency drives behavior—when teams see their costs and have ownership, they make better optimization decisions. FinOps is culture change through tooling, not just tooling.


Standardized Labeling Strategy

The Principle: Consistent Metadata

Labels are the foundation of Kubernetes cost attribution. Every resource must have consistent, enforced labels that answer: Who owns this? What project is it for? What environment? Who pays?

Essential Labels

Every workload should include:

  • team: Engineering team responsible (e.g., "platform", "frontend")
  • project: Application or service name (e.g., "api-backend", "checkout")
  • environment: Deployment environment (e.g., "production", "staging", "dev")
  • cost-center: Financial allocation code for chargeback
  • owner: Contact person or email for escalations

Enforcement Patterns

Labels are useless if not enforced. Apply the shift-left enforcement principle:

  1. Admission Controllers: Reject deployments missing required labels at the API server level
  2. Policy as Code: Use OPA/Gatekeeper to define and enforce labeling policies
  3. GitOps Validation: Validate labels in CI/CD before deployment reaches the cluster

The goal is to make it impossible to deploy unlabeled resources. Retroactive labeling is painful and often incomplete.

Team-Level Cost Budgets

The Principle: Accountability Through Ownership

When teams have visibility into their costs and budgets, they make better decisions. FinOps is about culture change, not just tooling.

Implementation Strategy

ResourceQuotas per Namespace

Each team gets a namespace with defined limits:

  • CPU requests (correlates to compute cost)
  • Memory requests (correlates to compute cost)
  • PersistentVolumeClaims (storage cost)
  • Total storage capacity

These quotas translate directly to monthly budget caps. When a team hits their limit, they must optimize before requesting more.

LimitRanges for Guardrails

Set default resource requests and limits so developers don't have to think about it. Prevent accidentally requesting 100 CPU cores when 1 is needed.

Budget Alerts

Configure alerts at 70%, 90%, and 100% of budget:

  • 70%: Informational, check for optimization opportunities
  • 90%: Warning, immediate attention needed
  • 100%: Critical, scale-down or optimization required

Monitoring Tools

  • Kubecost: Open-source cost monitoring for Kubernetes
  • AWS Cost Explorer: EKS cost breakdown by tag
  • Custom Prometheus metrics: Build dashboards for specific cost KPIs

Automated Resource Cleanup

The Principle: Ephemeral by Default

Resources should have defined lifetimes. Development environments, preview deployments, and test namespaces should auto-delete after a set period.

TTL (Time-to-Live) Policies

Define cleanup rules based on resource age and usage:

  • Ephemeral environments: Delete after 7 days or when PR is merged
  • Unused PersistentVolumes: Flag volumes not attached to any pod for 30+ days
  • Abandoned namespaces: Identify namespaces with no active deployments
  • Idle workloads: Detect pods using <10% of requested resources

Cleanup Strategies

Automated Deletion with Safety

Run weekly cleanup jobs that:

  1. Identify resources matching TTL criteria
  2. Send warning notification 24 hours before deletion
  3. Delete resources if no objection
  4. Log all deletions for audit

Development Environment Scheduling

Scale dev/staging environments to zero during nights and weekends:

  • 5 PM Friday: Scale all dev deployments to 0 replicas
  • 8 AM Monday: Scale back up
  • Savings: 65% of dev environment costs

The principle is pay for what you use. Idle resources are wasted money.

Integration with DevOps Workflows

GitOps Integration

The principle is cost visibility at commit time. Developers should know the cost impact of their changes before merging.

Cost Estimates in Pull Requests

Integrate cost estimation into your merge request workflow:

  1. Parse Kubernetes manifests in the PR
  2. Calculate estimated monthly cost based on resource requests
  3. Post cost estimate as a comment on the PR
  4. Fail if cost exceeds budget threshold

This shifts cost awareness left to the developer, not post-deployment surprise.

Budget Validation Before Deployment

Add a CI stage that validates namespace budget before deployment:

  1. Get current namespace resource usage
  2. Calculate projected usage with new deployment
  3. Block if projection exceeds namespace quota

Developer Experience

Self-Service Cost Dashboards

Give developers visibility into their team's costs:

  • Current month spending vs. budget
  • Top 10 most expensive workloads
  • Resource utilization (actual vs. requested)
  • Cost trend over time

Showback Reports

Monthly reports by team showing:

  • Total Kubernetes cost
  • Cost per application
  • Efficiency metrics (utilization percentage)
  • Recommendations for optimization

The principle is transparency drives behavior. When teams see their costs, they optimize.

Real-World Results

Case studies showing the impact of FinOps implementation:

40% cost reduction through automated cleanup

  • Eliminated abandoned dev environments
  • Removed unused PVCs and orphaned resources
  • Implemented TTL policies

60% improvement in cost attribution

  • Standardized labeling across all workloads
  • Enabled accurate showback to teams
  • Identified previously hidden cost drivers

2 hours/week saved in manual cost analysis

  • Automated weekly cost reports
  • Self-service dashboards for team leads
  • Eliminated ad-hoc cost investigations

Implementation Checklist

Phase 1: Foundation (Weeks 1-2)

  • Define standard label schema
  • Document labeling requirements
  • Install Kubecost or similar tool
  • Create baseline cost report

Phase 2: Enforcement (Weeks 3-4)

  • Deploy admission controller for label validation
  • Create OPA/Gatekeeper policies
  • Add label validation to CI/CD
  • Remediate existing unlabeled resources

Phase 3: Budgets (Weeks 5-6)

  • Create namespace ResourceQuotas
  • Configure LimitRanges
  • Set up budget alerts
  • Create team cost dashboards

Phase 4: Automation (Weeks 7-8)

  • Implement TTL policies
  • Deploy cleanup automation
  • Add cost estimation to PRs
  • Schedule dev environment shutdown

Best Practices

  1. Start with visibility: You cannot optimize what you cannot see
  2. Enforce labels at admission: Retroactive labeling is painful
  3. Set budgets, not just alerts: Budgets create accountability
  4. Automate cleanup: Manual cleanup never happens consistently
  5. Make costs visible to developers: Transparency drives behavior
  6. Celebrate wins: Recognize teams that optimize successfully

Get Expert Help

Implementing Kubernetes FinOps requires both financial and technical expertise. At ZSoftly, we help growing companies take control of their Kubernetes costs.

Our FinOps Services:

  • Cost attribution strategy and implementation
  • Labeling standards and enforcement
  • Budget and quota configuration
  • Automated cleanup policies
  • Team training and culture change

Ready to take control of your Kubernetes costs?