ZSoftly Technologies

CI/CD Pipelines for Infrastructure as Code

Scale Revenue, Reduce Costs, Minimize Risk

A Reference Architecture by Ditah Kumbong, ZSoftly Technologies Inc.

December 2025

Notices

This document is provided for informational purposes only. It represents ZSoftly Technologies Inc.'s current product offerings and practices as of the date of publication, which are subject to change without notice.

Reference Implementation

All patterns, scripts, and configurations referenced in this whitepaper are available in our open-source reference repository:

github.com/zsoftly/iac-cicd-reference

This repository is designed to be AI-friendly—provide it to your preferred AI assistant along with your organization's context to generate customized CI/CD pipelines.

Table of Contents

  1. Executive Summary
  2. The Business Case
  3. Architecture Overview
  4. Zero-Secrets Authentication
  5. Multi-Account Strategy
  6. Pipeline Design Patterns
  7. Foundation Deployment
  8. State Management
  9. Artifact Storage & Caching
  10. Ansible Configuration Management
  11. Platform Support
  12. Operational Excellence
  13. Getting Started
  14. Why ZSoftly
  15. Contact Us

Executive Summary

Infrastructure as Code (IaC) enables organizations to manage cloud resources with the same rigor as application code. However, implementing production-grade CI/CD pipelines for IaC presents unique challenges: credential management, multi-account deployments, state consistency, and approval workflows.

This whitepaper presents a comprehensive reference architecture that addresses these challenges while delivering measurable business outcomes.

Business Outcomes

Outcome How We Deliver It
Scale Revenue Ship infrastructure changes faster with automated pipelines
Reduce Costs Eliminate wasted pipeline runs, optimize credential management
Minimize Risk Zero stored secrets, mandatory approvals, state locking

Key Patterns

Pattern Benefit
OU-Based Account Model PLT OU (runner) + WKL OU (environments)
Two-StackSet Deployment Automatic role provisioning based on OU membership
Role Chaining Runner assumes deploy roles cross-account
Skip Feature Branches Eliminate 60-80% of wasteful pipeline runs
Shared Modules No boilerplate, faster reviews, consistency
Terraform State Backend Versioned, locked, encrypted state management

Reference Repository

All implementation details are available at:

github.com/zsoftly/iac-cicd-reference

Directory Contents
docs/ Authentication, pipeline rules, OU conventions
scripts/ OIDC setup, role assumption, rollback utilities
cloudformation/stacksets/plt/ StackSet templates for PLT OU (runner role)
cloudformation/stacksets/wkl/ StackSet templates for WKL OU (deploy roles)
terraform/ File naming conventions and patterns
ansible/ Roles, playbooks, inventory structure
.github/, gitlab-ci/, jenkins/ Platform-specific guidance

The Business Case

Scale Revenue: Ship Faster

Manual infrastructure changes create bottlenecks. Teams wait for approvals, operators make mistakes, and deployments queue up.

Before: 2-week infrastructure change cycle

After: Same-day deployments with automated validation

Cicd Ship Faster

Impact: More features reach customers faster, driving revenue growth.


Reduce Costs: Eliminate Waste

Traditional CI/CD pipelines run on every commit, including work-in-progress feature branches that will never be deployed.

The Problem:

Event Traditional Our Approach
Feature branch push Runs pipeline Skipped
PR/MR opened Runs pipeline Runs pipeline
Push to main Runs pipeline Runs pipeline

Savings: Skip 60-80% of pipeline runs by only building when code is ready for review.

Reference: See docs/pipeline-rules.md for trigger configuration.


Minimize Risk: Security by Design

Stored credentials are the #1 source of cloud security breaches. Our architecture eliminates them entirely.

Risk Reduction:

Risk Traditional Our Approach
Credential exposure Static keys in CI/CD Zero stored secrets (OIDC)
Over-privileged access Single admin key Role chaining (minimal → full)
Unauthorized changes Anyone can deploy Mandatory approvals for prod
State corruption No locking DynamoDB locking + versioning
Disaster recovery Manual rebuild State rollback scripts

Reference: See docs/authentication.md and scripts/95-rollback.sh.


Shared Modules: Eliminate Boilerplate

Every team writing infrastructure from scratch creates inconsistency, duplicates effort, and multiplies review burden. Shared modules solve this.

The Problem with Copy-Paste Infrastructure:

Issue Impact
Duplicate code Every team writes the same VPC, IAM, S3 patterns
Inconsistent configs Different teams use different defaults
Review overhead Reviewers check the same patterns repeatedly
Drift over time No single source of truth

The Shared Module Strategy:

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#3B82F6', 'lineColor': '#64748B', 'background': '#fff'}}}%%
flowchart LR
    TEAM1[Team A] --> MOD[(Shared<br/>Modules<br/>v1.2.0)]
    TEAM2[Team B] --> MOD
    TEAM3[Team C] --> MOD
    MOD --> AWS[Consistent<br/>Infrastructure]

    style TEAM1 fill:#64748B,stroke:#475569,color:#fff
    style TEAM2 fill:#64748B,stroke:#475569,color:#fff
    style TEAM3 fill:#64748B,stroke:#475569,color:#fff
    style MOD fill:#3B82F6,stroke:#2563EB,color:#fff
    style AWS fill:#059669,stroke:#047857,color:#fff

Benefits of Versioned Modules:

Benefit How
No boilerplate Teams consume modules, don't write from scratch
Faster reviews Reviewers trust pinned module versions
Guaranteed consistency Same module = same configuration
Safe upgrades Pin to v1.2.0, upgrade when ready
No breaking changes Teams control when they adopt new versions

Example Module Versioning:

# Team A - stable, pinned
module "vpc" {
  source  = "git::https://github.com/org/tf-modules.git//vpc?ref=v1.2.0"
}

# Team B - same module, same consistency
module "vpc" {
  source  = "git::https://github.com/org/tf-modules.git//vpc?ref=v1.2.0"
}

Applies to All IaC Tools:

Tool Reusable Pattern
Terraform Git-sourced modules with version tags
Ansible Roles in collections with version constraints
CloudFormation Nested stacks, Service Catalog products

Reference: See terraform/ for module conventions and ansible/roles/ for reusable roles.


Architecture Overview

Core Components

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#3B82F6', 'lineColor': '#64748B', 'background': '#fff'}}}%%
flowchart LR
    CICD[CI/CD Platform<br/>GitHub / GitLab / Jenkins] --> OIDC[OIDC<br/>Provider]
    OIDC --> RUNNER[PLT OU<br/>Runner Account]
    RUNNER --> NP[WKL-NPD OU<br/>SBX / DEV / QAT]
    RUNNER --> PROD[WKL-PRD OU<br/>STG / PRD / DR]
    NP --> S3[(S3 State<br/>+ Versioning)]
    PROD --> S3
    S3 --> DDB[(DynamoDB<br/>Lock)]

    style CICD fill:#64748B,stroke:#475569,color:#fff
    style OIDC fill:#3B82F6,stroke:#2563EB,color:#fff
    style RUNNER fill:#7C3AED,stroke:#5B21B6,color:#fff
    style NP fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style PROD fill:#059669,stroke:#047857,color:#fff
    style S3 fill:#D97706,stroke:#B45309,color:#fff
    style DDB fill:#D97706,stroke:#B45309,color:#fff

Design Principles

Principle Implementation
Zero Secrets OIDC federation, no stored credentials
OU-Based Isolation PLT OU for runners, WKL OU for deploys
Automatic Onboarding StackSets auto-deploy roles to new OU accounts
Least Privilege Runner assumes scoped deploy roles cross-account
Consistent Ordering Numbered prefixes for predictable sorting
Platform Agnostic Patterns work across GitHub, GitLab, Jenkins

Reference: See README.md for complete architecture overview.


Zero-Secrets Authentication

The Cross-Account Role Chaining Pattern

Traditional CI/CD uses static AWS credentials stored in the platform. This creates risk: credentials can leak, don't expire, and provide excessive access.

Our approach uses cross-account role chaining: a runner role in PLT OU assumes deploy roles in WKL accounts.

Cicd Role Chaining

Role Separation

Role Location Purpose Permissions
cicd-runner-role PLT OU Authenticate via OIDC sts:AssumeRole to deploy roles
cicd-deploy-role Each WKL Acct Deploy infrastructure Full infrastructure access

Why This Matters:

Authentication Methods

Method Best For Reference
OIDC + Role Chain GitHub Actions, GitLab Premium scripts/00-setup-oidc-github.sh
IMDv2 + Role Chain Self-hosted EC2 runners docs/authentication.md

Reference: See docs/authentication.md for complete pattern documentation.


Multi-Account Strategy

AWS Organizations OU Hierarchy

A well-structured Organizational Unit (OU) hierarchy enables automated IAM role deployment via CloudFormation StackSets. This approach separates the CI/CD runner account from workload accounts, providing clear security boundaries and automatic onboarding for new accounts.

OU Code Reference:

Code Full Name Purpose
PLT Platform CI/CD runners and shared tooling
WKL Workloads Application environments (parent)
WKL-NPD Workloads-NonProd Development and testing
WKL-PRD Workloads-Prod Production and DR
%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#3B82F6', 'lineColor': '#64748B', 'background': '#fff'}}}%%
flowchart TB
    ROOT[Root] --> PLT[PLT OU]
    ROOT --> WKL[WKL OU]

    PLT --> RUNNER[PLT-Runner<br/>Account]

    WKL --> NPD[WKL-NPD OU]
    WKL --> PRD[WKL-PRD OU]

    NPD --> SBX[SBX Account]
    NPD --> DEV[DEV Account]
    NPD --> QAT[QAT Account]

    PRD --> STG[STG Account]
    PRD --> PROD[PRD Account]
    PRD --> DR[DR Account]

    style ROOT fill:#64748B,stroke:#475569,color:#fff
    style PLT fill:#7C3AED,stroke:#5B21B6,color:#fff
    style WKL fill:#3B82F6,stroke:#2563EB,color:#fff
    style NPD fill:#3B82F6,stroke:#2563EB,color:#fff
    style PRD fill:#059669,stroke:#047857,color:#fff
    style RUNNER fill:#A78BFA,stroke:#7C3AED,color:#1E1B4B
    style SBX fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style DEV fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style QAT fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style STG fill:#6EE7B7,stroke:#059669,color:#064E3B
    style PROD fill:#6EE7B7,stroke:#059669,color:#064E3B
    style DR fill:#FCD34D,stroke:#D97706,color:#78350F

OU Structure Explained

OU Purpose Accounts
PLT CI/CD infrastructure, runner execution PLT-Runner
WKL Application environments (parent OU) -
WKL-NPD Development and testing environments SBX, DEV, QAT
WKL-PRD Production and disaster recovery STG, PRD, DR

Cross-Account Role Assumption

The runner account in PLT OU assumes roles in WKL accounts. This separation ensures:

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#232F3E', 'lineColor': '#232F3E', 'background': '#fff'}}}%%
flowchart LR
    RUNNER[PLT-Runner<br/>cicd-runner-role] --> DEV[DEV Account<br/>cicd-deploy-role]
    RUNNER --> QAT[QAT Account<br/>cicd-deploy-role]
    RUNNER --> STG[STG Account<br/>cicd-deploy-role]
    RUNNER --> PRD[PRD Account<br/>cicd-deploy-role]

    style RUNNER fill:#7C3AED,stroke:#5B21B6,color:#fff
    style DEV fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style QAT fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style STG fill:#6EE7B7,stroke:#059669,color:#064E3B
    style PRD fill:#6EE7B7,stroke:#059669,color:#064E3B

Security Benefits:

Environment Configuration

Prefix Environment OU Account Region Change Request
00 runner PLT PLT ca-central-1 No
05 sbx WKL-NPD SBX ca-central-1 No
10 dev WKL-NPD DEV ca-central-1 No
20 qat WKL-NPD QAT ca-central-1 No
40 stg WKL-PRD STG ca-central-1 No
70 prod WKL-PRD PRD ca-central-1 Yes
90 dr WKL-PRD DR ca-west-1 No

Why Numbered Prefixes?

Alphabetical sorting produces incorrect order: dev, dr, prod, qat, sbx, stg

Numbered prefixes ensure correct order across all tools: 00-runner, 05-sbx, 10-dev, 20-qat, 40-stg, 70-prod, 90-dr

Reference: See docs/conventions.md for complete naming standards.


Pipeline Design Patterns

Skip Feature Branches

The most impactful optimization: only run pipelines when code is ready for review.

Cicd Skip Feature Branches

Four-Stage Pipeline

Stage Trigger Purpose
Validate Automatic Format check, linting, syntax validation
Plan Automatic Generate execution plan, show diff
Deploy Manual Apply changes to environment
Destroy Manual + Admin Teardown resources (protected)

Environment Deployment Rules

Environment OU On PR/MR On Main Requires CR
05-sbx WKL-NPD Manual Manual No
10-dev WKL-NPD Manual Manual No
20-qat WKL-NPD Blocked Manual No
40-stg WKL-PRD Blocked Manual No
70-prod WKL-PRD Blocked Manual Yes
90-dr WKL-PRD Blocked Manual No

Reference: See docs/pipeline-rules.md for complete trigger configuration.


Foundation Deployment

OU-Targeted CloudFormation StackSets

Foundation resources are deployed via CloudFormation StackSets targeting specific OUs. This architecture uses two StackSets—one for PLT OU (runner account) and one for WKL OU (deployment targets)—enabling automatic role deployment to new accounts.

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#232F3E', 'lineColor': '#232F3E', 'background': '#fff'}}}%%
flowchart TB
    MGMT[Management Account] --> SS1[StackSet 1:<br/>PLT Runner Role]
    MGMT --> SS2[StackSet 2:<br/>WKL Deploy Roles]

    SS1 --> PLT_OU[PLT OU]
    PLT_OU --> RUNNER[PLT-Runner Account<br/>cicd-runner-role]

    SS2 --> WKL_OU[WKL OU]
    WKL_OU --> NPD[WKL-NPD OU]
    WKL_OU --> PRD[WKL-PRD OU]

    NPD --> DEV[DEV: cicd-deploy-role]
    NPD --> QAT[QAT: cicd-deploy-role]
    PRD --> STG[STG: cicd-deploy-role]
    PRD --> PROD[PRD: cicd-deploy-role]

    style MGMT fill:#64748B,stroke:#475569,color:#fff
    style SS1 fill:#7C3AED,stroke:#5B21B6,color:#fff
    style SS2 fill:#3B82F6,stroke:#2563EB,color:#fff
    style PLT_OU fill:#A78BFA,stroke:#7C3AED,color:#1E1B4B
    style WKL_OU fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style NPD fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style PRD fill:#6EE7B7,stroke:#059669,color:#064E3B
    style RUNNER fill:#A78BFA,stroke:#7C3AED,color:#1E1B4B
    style DEV fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style QAT fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style STG fill:#6EE7B7,stroke:#059669,color:#064E3B
    style PROD fill:#6EE7B7,stroke:#059669,color:#064E3B

Two-StackSet Architecture

StackSet Target OU Purpose Role Created
PLT Runner StackSet PLT OU Creates runner execution role with OIDC trust cicd-runner-role
WKL Deploy StackSet WKL OU Creates deploy roles trusting runner role ARN cicd-deploy-role

How It Works:

  1. StackSet 1 deploys to PLT OU, creating cicd-runner-role in the runner account
  2. StackSet 2 deploys to WKL OU (including nested WKL-NPD and WKL-PRD OUs)
  3. Deploy roles include trust policy referencing the runner role ARN
  4. Runner can assume deploy roles across all WKL accounts

Automatic Account Onboarding:

When a new account is added to WKL-NPD or WKL-PRD OU:

StackSet Templates

Template Target OU Purpose Resources Created
plt/00-oidc-provider-github.yaml PLT GitHub OIDC federation IAM OIDC Provider
plt/10-iam-runner-role.yaml PLT Runner execution role cicd-runner-role
wkl/15-iam-deploy-role.yaml WKL Deployment target roles cicd-deploy-role (per account)
wkl/20-terraform-state-backend.yaml WKL State management S3 bucket, DynamoDB, KMS key

Trust Policy Configuration

Runner Role (PLT OU):

AssumeRolePolicyDocument:
  Statement:
    - Effect: Allow
      Principal:
        Federated: !Sub arn:aws:iam::${AWS::AccountId}:oidc-provider/token.actions.githubusercontent.com
      Action: sts:AssumeRoleWithWebIdentity
      Condition:
        StringEquals:
          token.actions.githubusercontent.com:aud: sts.amazonaws.com
        StringLike:
          token.actions.githubusercontent.com:sub: repo:org/repo:*

Deploy Role (WKL OU):

AssumeRolePolicyDocument:
  Statement:
    - Effect: Allow
      Principal:
        AWS: !Sub arn:aws:iam::${RunnerAccountId}:role/cicd-runner-role
      Action: sts:AssumeRole

Deployment Sequence

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#232F3E', 'lineColor': '#232F3E', 'background': '#fff'}}}%%
flowchart LR
    A[1. Deploy OIDC Provider<br/>to PLT OU] --> B[2. Deploy Runner Role<br/>to PLT OU]
    B --> C[3. Deploy Deploy Roles<br/>to WKL OU]
    C --> D[4. Deploy State Backend<br/>to WKL OU]

    style A fill:#3B82F6,stroke:#1E40AF,color:#fff
    style B fill:#7C3AED,stroke:#5B21B6,color:#fff
    style C fill:#F59E0B,stroke:#B45309,color:#fff
    style D fill:#10B981,stroke:#047857,color:#fff

Deployment Mode

Mode Use Case
SERVICE_MANAGED Automatic deployment to all accounts in target OU (recommended)
SELF_MANAGED Manual deployment to specific account list

Reference: See cloudformation/stacksets/ for templates and scripts/deploy-foundation.md for commands.


State Management

Terraform Backend Architecture

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#232F3E', 'lineColor': '#232F3E', 'background': '#fff'}}}%%
flowchart LR
    TF[Terraform] --> S3[(S3 State)]
    TF --> DDB[(DynamoDB Lock)]
    S3 --> VER[Versioning]
    S3 --> ENC[Encryption]

    style TF fill:#7B42BC,stroke:#5C32A3,color:#fff
    style S3 fill:#232F3E,stroke:#232F3E,color:#fff
    style DDB fill:#3B82F6,stroke:#1E40AF,color:#fff
    style VER fill:#10B981,stroke:#047857,color:#fff
    style ENC fill:#10B981,stroke:#047857,color:#fff

State Backend Features

Feature Implementation Benefit
Locking DynamoDB table Prevent concurrent modifications
Versioning S3 versioning Rollback to previous state
Encryption S3 SSE or KMS Data protection at rest
Cross-Region S3 replication Disaster recovery (ca-central-1 ↔ ca-west-1)
TTL Cleanup DynamoDB TTL Auto-expire stale locks

State Isolation

Each WKL account has its own state bucket, deployed via StackSet:

# WKL-NPD OU accounts
s3://org-tfstate-SBX-ca-central-1/
├── org/repo/05-sbx/terraform.tfstate

s3://org-tfstate-DEV-ca-central-1/
├── org/repo/10-dev/terraform.tfstate

s3://org-tfstate-QAT-ca-central-1/
├── org/repo/20-qat/terraform.tfstate

# WKL-PRD OU accounts
s3://org-tfstate-STG-ca-central-1/
├── org/repo/40-stg/terraform.tfstate

s3://org-tfstate-PRD-ca-central-1/
├── org/repo/70-prod/terraform.tfstate

s3://org-tfstate-DR-ca-west-1/
├── org/repo/90-dr/terraform.tfstate

Reference: See cloudformation/stacksets/wkl/20-terraform-state-backend.yaml for backend template.


Artifact Storage & Caching

The Problem: Slow Pipelines

Every CI/CD run that downloads dependencies from scratch wastes time and resources:

Task Without Cache With Cache
Download Terraform providers 30-60s 2-5s
Install Ansible collections 20-40s 1-3s
Fetch Python/Node packages 45-90s 3-8s
Pull container base images 60-120s 5-10s

Impact: A 5-minute pipeline becomes 1-2 minutes with proper caching.

Caching Architecture

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#3B82F6', 'lineColor': '#64748B', 'background': '#fff'}}}%%
flowchart LR
    RUNNER[CI/CD Runner] --> CACHE[(Artifact Cache<br/>S3 / R2)]
    CACHE --> TF[Terraform<br/>Providers]
    CACHE --> ANS[Ansible<br/>Collections]
    CACHE --> PKG[Package<br/>Dependencies]
    CACHE --> IMG[Container<br/>Layers]

    style RUNNER fill:#7C3AED,stroke:#5B21B6,color:#fff
    style CACHE fill:#F59E0B,stroke:#B45309,color:#fff
    style TF fill:#7B42BC,stroke:#5C32A3,color:#fff
    style ANS fill:#EE0000,stroke:#CC0000,color:#fff
    style PKG fill:#3B82F6,stroke:#2563EB,color:#fff
    style IMG fill:#0DB7ED,stroke:#0B93BD,color:#fff

Artifact Bucket in PLT OU

Deploy a shared artifact bucket in the PLT account for all pipelines:

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#232F3E', 'lineColor': '#232F3E', 'background': '#fff'}}}%%
flowchart TB
    PLT[PLT Account] --> BUCKET[(cicd-artifacts<br/>S3 Bucket)]

    BUCKET --> PROV[terraform-providers/]
    BUCKET --> COLL[ansible-collections/]
    BUCKET --> DEPS[dependencies/]
    BUCKET --> PLANS[terraform-plans/]

    style PLT fill:#7C3AED,stroke:#5B21B6,color:#fff
    style BUCKET fill:#F59E0B,stroke:#B45309,color:#fff
    style PROV fill:#7B42BC,stroke:#5C32A3,color:#fff
    style COLL fill:#EE0000,stroke:#CC0000,color:#fff
    style DEPS fill:#3B82F6,stroke:#2563EB,color:#fff
    style PLANS fill:#10B981,stroke:#047857,color:#fff

What to Cache

Artifact Type Path / Key Pattern TTL Why Cache
Terraform providers terraform-providers/{hash}/ 7 days Large binaries, rarely change
Terraform plugin cache terraform-plugins/{os}-{arch}/ 7 days Provider binaries per platform
Ansible collections ansible-collections/{hash}/ 3 days Galaxy downloads are slow
Python packages pip-cache/{requirements-hash}/ 3 days pip install overhead
Node modules node-modules/{lockfile-hash}/ 3 days npm/yarn install overhead
Go modules go-mod/{go.sum-hash}/ 7 days go mod download overhead
Container layers docker-layers/{image-hash}/ 1 day Base image pull overhead
Terraform plans terraform-plans/{run-id}/ 24 hours Share plan between jobs

Cache Expiration Policy

Critical: Caches must expire to receive security updates.

# S3 Lifecycle Rules for artifact bucket
LifecycleConfiguration:
  Rules:
    # Expire provider cache after 7 days
    - Id: ExpireProviderCache
      Status: Enabled
      Prefix: terraform-providers/
      ExpirationInDays: 7

    # Expire dependency caches after 3 days
    - Id: ExpireDependencyCache
      Status: Enabled
      Prefix: dependencies/
      ExpirationInDays: 3

    # Expire plan artifacts after 1 day
    - Id: ExpirePlanArtifacts
      Status: Enabled
      Prefix: terraform-plans/
      ExpirationInDays: 1

    # Clean up incomplete uploads
    - Id: AbortIncompleteUploads
      Status: Enabled
      AbortIncompleteMultipartUpload:
        DaysAfterInitiation: 1

Why Short TTLs:

Concern Solution
Security patches 3-7 day TTL ensures updates within a week
Vulnerability fixes Short TTL forces fresh downloads regularly
Cache poisoning Hash-based keys + expiration limits blast radius
Storage costs Automatic cleanup prevents unbounded growth

Storage Options

Provider Best For Pros Cons
AWS S3 AWS-native pipelines Native IAM, same region as infra Egress costs
Cloudflare R2 Multi-cloud, cost-sensitive Zero egress fees, global edge Separate auth needed
GitHub Cache GitHub Actions only Built-in, no setup 10GB limit, repo-scoped
GitLab Cache GitLab CI only Built-in, no setup Runner-scoped

Platform-Specific Caching

GitHub Actions:

- name: Cache Terraform providers
  uses: actions/cache@v4
  with:
    path: ~/.terraform.d/plugin-cache
    key: terraform-${{ runner.os }}-${{ hashFiles('**/.terraform.lock.hcl') }}
    restore-keys: |
      terraform-${{ runner.os }}-

- name: Cache Ansible collections
  uses: actions/cache@v4
  with:
    path: ~/.ansible/collections
    key: ansible-${{ hashFiles('**/requirements.yml') }}
    restore-keys: |
      ansible-

GitLab CI:

.cache-terraform: &cache-terraform
  cache:
    key: terraform-${CI_COMMIT_REF_SLUG}
    paths:
      - .terraform/providers/
    policy: pull-push
    when: on_success

.cache-ansible: &cache-ansible
  cache:
    key: ansible-${CI_COMMIT_REF_SLUG}
    paths:
      - .ansible/collections/
    policy: pull-push

S3 Backend Cache (All Platforms):

# Download cache from S3
CACHE_KEY="terraform-providers-$(sha256sum .terraform.lock.hcl | cut -d' ' -f1)"
aws s3 cp "s3://${ARTIFACT_BUCKET}/${CACHE_KEY}.tar.gz" /tmp/cache.tar.gz || true
if [[ -f /tmp/cache.tar.gz ]]; then
  tar -xzf /tmp/cache.tar.gz -C ~/.terraform.d/
fi

# After terraform init, upload cache
tar -czf /tmp/cache.tar.gz -C ~/.terraform.d/ plugin-cache/
aws s3 cp /tmp/cache.tar.gz "s3://${ARTIFACT_BUCKET}/${CACHE_KEY}.tar.gz"

Terraform Provider Caching

Configure Terraform to use a plugin cache directory:

# ~/.terraformrc or environment variable
plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"

# Or via environment
# TF_PLUGIN_CACHE_DIR="$HOME/.terraform.d/plugin-cache"

Provider Mirror for Air-Gapped:

provider_installation {
  filesystem_mirror {
    path    = "/opt/terraform/providers"
    include = ["registry.terraform.io/*/*"]
  }
  direct {
    exclude = ["registry.terraform.io/*/*"]
  }
}

Ansible Collection Caching

# Install to specific path for caching
ansible-galaxy collection install -r requirements.yml -p ~/.ansible/collections

# Or use environment variable
export ANSIBLE_COLLECTIONS_PATH=~/.ansible/collections

Cache Invalidation

Force cache refresh when needed:

# GitHub Actions - add date to key for daily refresh
key: terraform-${{ runner.os }}-${{ hashFiles('**/.terraform.lock.hcl') }}-${{ steps.date.outputs.date }}

# Or use workflow dispatch input
on:
  workflow_dispatch:
    inputs:
      refresh_cache:
        description: 'Force cache refresh'
        type: boolean
        default: false

Artifact Bucket Template

Deploy via StackSet to PLT OU:

Template Target OU Purpose
plt/25-cicd-artifacts.yaml PLT Shared artifact/cache bucket
# Key bucket features
Resources:
  ArtifactBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub '${Org}-cicd-artifacts-${AWS::AccountId}'
      LifecycleConfiguration:
        Rules:
          - Id: ExpireCache
            Status: Enabled
            ExpirationInDays: 7
      # Intelligent tiering for cost optimization
      IntelligentTieringConfigurations:
        - Id: CacheOptimization
          Status: Enabled
          Tierings:
            - AccessTier: ARCHIVE_ACCESS
              Days: 90

Reference: See cloudformation/stacksets/plt/25-cicd-artifacts.yaml and docs/caching.md for implementation details.


Ansible Configuration Management

Directory Structure

The repository includes a complete Ansible structure following the same numbered naming conventions:

ansible/
├── inventories/
│   ├── 05-sbx/         # Sandbox (WKL-NPD)
│   ├── 10-dev/         # Development (WKL-NPD)
│   ├── 20-qat/         # QA Testing (WKL-NPD)
│   ├── 40-stg/         # Staging (WKL-PRD)
│   ├── 70-prod/        # Production (WKL-PRD)
│   └── 90-dr/          # Disaster Recovery (WKL-PRD)
├── roles/
│   ├── 10-common/      # Base OS configuration
│   └── 20-security/    # Security hardening
└── playbooks/
    └── site.yml

Included Roles

Role Purpose Key Tasks
10-common Base configuration Packages, timezone, NTP, system limits
20-security Security hardening SSH hardening, fail2ban, auto-updates

Pipeline Integration

Stage Command Purpose
Validate ansible-lint Style and syntax checking
Test --check --diff Dry run with change preview
Deploy ansible-playbook Apply configuration

Reference: See ansible/ for complete implementation.


Platform Support

Supported Platforms

The reference architecture supports all major CI/CD platforms with platform-specific guidance.

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#3B82F6', 'lineColor': '#64748B', 'background': '#fff'}}}%%
flowchart LR
    PAT[Reference<br/>Patterns] --> GH[GitHub<br/>Actions]
    PAT --> GL[GitLab<br/>CI]
    PAT --> JK[Jenkins]

    style PAT fill:#3B82F6,stroke:#2563EB,color:#fff
    style GH fill:#64748B,stroke:#475569,color:#fff
    style GL fill:#D97706,stroke:#B45309,color:#fff
    style JK fill:#059669,stroke:#047857,color:#fff

Platform Comparison

Feature GitHub Actions GitLab CI Jenkins
OIDC Support Native Manual Plugin
Trigger Rules Workflow on: workflow: rules: Multibranch filter
Manual Approval Environments when: manual input step
Concurrency concurrency: resource_group: Lock plugin
Reference .github/workflows/ gitlab-ci/ jenkins/

Implementation Guidance

Each platform directory contains:

Reference: See .github/, gitlab-ci/, and jenkins/ directories.


Operational Excellence

Troubleshooting Guide

The reference repository includes comprehensive troubleshooting documentation:

Category Common Issues
Authentication IMDv2 timeout, token expiry, cross-account denied
Terraform State lock deadlock, plan artifacts, init slow
Ansible Vault decrypt, dynamic inventory, SSH timeout
Pipeline Concurrent conflicts, artifact mismatch, timeouts

Reference: See docs/troubleshooting.md for solutions.

Emergency Recovery

State Rollback Script (scripts/95-rollback.sh):

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#EF4444', 'lineColor': '#EF4444', 'background': '#fff'}}}%%
flowchart LR
    A[Acquire Lock] --> B[Backup Current]
    B --> C[Restore Previous]
    C --> D[Release Lock]

    style A fill:#3B82F6,stroke:#1E40AF,color:#fff
    style B fill:#F59E0B,stroke:#B45309,color:#fff
    style C fill:#EF4444,stroke:#B91C1C,color:#fff
    style D fill:#10B981,stroke:#047857,color:#fff

Features:

Utility Scripts

Script Purpose
_utils.sh Shared logging, validation, formatting
00-setup-oidc-github.sh Manual OIDC provider setup
05-assume-role.sh Cross-account role assumption
95-rollback.sh Emergency state rollback

Reference: See scripts/ for all utilities.


Getting Started

Prerequisites

Implementation Steps

Step 1: Clone the Reference Repository

git clone https://github.com/zsoftly/iac-cicd-reference.git

Step 2: Set Up OU Structure

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#3B82F6', 'lineColor': '#3B82F6', 'background': '#fff'}}}%%
flowchart LR
    A[Create PLT OU] --> B[Create WKL OU]
    B --> C[Create WKL-NPD OU]
    B --> D[Create WKL-PRD OU]
    C --> E[Move accounts to OUs]
    D --> E

    style A fill:#7C3AED,stroke:#5B21B6,color:#fff
    style B fill:#3B82F6,stroke:#2563EB,color:#fff
    style C fill:#93C5FD,stroke:#3B82F6,color:#1E3A8A
    style D fill:#6EE7B7,stroke:#059669,color:#064E3B
    style E fill:#10B981,stroke:#047857,color:#fff

Step 3: Deploy Foundation StackSets

%%{init: {'theme':'base', 'themeVariables': {'primaryColor': '#3B82F6', 'lineColor': '#3B82F6', 'background': '#fff'}}}%%
flowchart LR
    A[1. OIDC Provider<br/>→ PLT OU] --> B[2. Runner Role<br/>→ PLT OU]
    B --> C[3. Deploy Roles<br/>→ WKL OU]
    C --> D[4. State Backend<br/>→ WKL OU]

    style A fill:#3B82F6,stroke:#1E40AF,color:#fff
    style B fill:#7C3AED,stroke:#5B21B6,color:#fff
    style C fill:#F59E0B,stroke:#B45309,color:#fff
    style D fill:#10B981,stroke:#047857,color:#fff

Follow commands in scripts/deploy-foundation.md. New accounts added to WKL OU automatically receive deploy roles.

Step 4: Configure Your CI/CD Platform

Copy patterns from the appropriate directory:

Step 5: Customize for Your Organization

Provide the repository and docs/conventions.md to your AI assistant with your organization's context to generate customized pipelines.


Why ZSoftly

Our Expertise

CI/CD Pipeline Specialists

Infrastructure as Code Experts

Canadian-Based Team

Our Services

Service Description
Assessment Review current pipelines, identify improvements
Implementation Deploy reference architecture for your organization
Migration Move from legacy CI/CD to modern patterns
Training Enable your team on IaC CI/CD best practices
Support Ongoing assistance and troubleshooting

Business Outcomes We Deliver

Outcome How
Scale Revenue Faster deployments, more features to market
Reduce Costs 60-80% fewer pipeline runs, no credential rotation overhead
Minimize Risk Zero stored secrets, mandatory approvals, state protection

Ready to Modernize Your CI/CD Pipelines?

Start with our open-source reference:

github.com/zsoftly/iac-cicd-reference

Or get expert help:

zsoftly.com/contact


Contact Us

ZSoftly Technologies Inc.

Contact Link
Website zsoftly.com
Services zsoftly.com/services/devops
Contact zsoftly.com/contact
Get in Touch zsoftly.com/contact
Reference Repo github.com/zsoftly/iac-cicd-reference

Sources


Document Version: 1.0 - December 2025

Copyright (c) 2025 ZSoftly Technologies Inc. All rights reserved.

zsoftly.com | Contact Us