Daniel Ciaglia
Why easy, when we can make it complicated? – the unknown platform engineer
…..
n
, p
- next, previous slideo
- overviewf
- fullscreenb
- black out the presentations
- speaker view
Stack: Terraform root module2, tracked with 1 state file
Related: Highly recommend talk “Terraform: from zero to madness” by @Timur Bublik
.
├── databases.tf
├── vpc.tf
├── main.tf
├── outputs.tf
└── terraform.tf
.tfvars
files.
├── production.tfvars
├── staging.tfvars
├── databases.tf
├── vpc.tf
├── variables.tf
├── main.tf
└── terraform.tf
# select a specific tag
module "rds" {
source = "github.com/example/rds?ref=v1.2.0"
}
services
module receives output of
base
as input eg. vpc_id
or subnetsterraform apply plan
is run manually still.
├── environments
│ ├── production
│ │ ├── main.tf
│ │ ├── outputs.tf
│ │ └── variables.tf
│ └── staging
│ ├── main.tf
│ ├── outputs.tf
│ └── variables.tf
└── modules
├── base
│ ├── main.tf
│ ├── outputs.tf
│ ├── variables.tf
│ └── vpc.tf
└── services
├── databases.tf
├── main.tf
├── outputs.tf
└── variables.tf
👉 At this point in time I joined the project 👈
The situation
terraform plan -out plan
takes more and
more timeterraform
step is enormous4version
)Possible solutions
./boring-registry upload --type s3 (some more flags) ./your-module
module "rds" {
source = "registry.example.com/acme/rds/aws"
version = "~> 0.1"
}
Don’t
terraform_remote_state
data source!Do
jsondecode()
and jsonencode()
Code for 3 Terraform modules will be provided
s3_json_store
CRUD JSON data on S3ssm_json_store
CRUD JSON data on SSM Parameter
storessm_json_regex
read SSM parameter with regexmodule "ssm_service_data" {
source = "registry.example.com/foo/ssm_json_store/aws"
version = "~> 1.0.2"
path = "/configuration"
name = "base"
data = {
domain = local.domain_name
environment = local.environment
environmentClass = local.environmentClass
backup_plan = local.backup_plan
networking = {
vpc_id = module.base.vpc_default_id
subnet_database_ids = module.base.subnet_private_database_ids
subnet_k8s_ids = module.base.subnet_private_k8s_ids
}
cluster = {
name = module.eks.cluster_name
oidc_issuer_url = module.eks.cluster_oidc_issuer_url
oidc_provider_arn = module.eks.cluster_oidc_provider_arn
}
}
}
module "ssm_service_data" {
source = "registry.example.com/foo/ssm_json_store/aws"
version = "~> 1.0.2"
path = "/configuration"
name = "upstream"
data = {
installed = true
private = {}
public = {
sns = {
"foo" = {
"arn" = module.sns_foo.arn
"name" = module.sns_foo.name
}
sqs = {
"bar" = {
"arn" = module.bar_queue.arn
"name" = module.bar_queue.name
}
}
}
}
}
}
module "ssm_data" {
source = "registry.example.com/foo/ssm_json_store/aws"
version = "~> 0.1.0"
path = "/configuration"
include_filter_regex = "(base|upstream)"
}
module "sns_sqs_subscription_foo" {
count = try(module.ssm_data.values["upstream"]["installed"], false) ? 1 : 0
source = "registry.example.com/foo/sns_sqs_subscription/aws"
version = "~> 0.1"
sns_arn = nonsensitive(module.ssm_data.values["upstream"]["public"]["sns"]["foo"]["arn"])
message_retention_seconds = 1209600
redrive_policy = jsonencode({
deadLetterTargetArn = module.dead_foo[0].arn
maxReceiveCount = 5
})
}
CODEOWNERS
)x
y
1
) for
z
z
dimensiontotal stacks = stacks * environments * tenants
To give some numbers: my client LYNQTECH runs ~100 microservices in at least 2 environments per tenant for 5+ tenants - north of 1000 stacks 😉
GitOps is an operational framework that takes DevOps best practices used for application development such as version control, collaboration, compliance, and CI/CD, and applies them to infrastructure automation. – https://about.gitlab.com/topics/gitops/
flux push artifact
11 is your friend.
├── environments
│ ├── client-a
│ │ ├── prod
│ │ │ ├── services
│ │ │ │ ├ _versions.yaml
│ │ │ │ ├ foo.yaml
│ │ │ │ └ bar.yaml
│ │ │ └── system
│ │ └── stage
│ │ ├── services
│ │ └── system
│ ├── client-b
│ [...]
│
├── flux-apps
│ ├── service-stacks
│ │ ├── foo
│ │ ├── bar
│ │ [...]
│ │ └── baz
│ └── system
│ [...]
│ ├── vertical-pod-autoscaler
│ └── weaveworks-gitops
from the perspective of an individual FluxCD installation
_versions.yaml
becomes
service-versions
ConfigMapbase
ConfigMap provides client
,
environment
and other dataapiVersion: v1
kind: ConfigMap
metadata:
name: service-versions
data:
version_foo: "2.5.0"
version_foo_tf: "~ 0.1.0-0"
version_vertical_pod_autoscaler: "~> 9.0.0"
version_vertical_pod_autoscaler_tf: "~ 0.1.0"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: init
data:
clientId: "tenant-a"
domain: "stage.tenant-a.tld"
environment: "stage"
environmentClass: "non-prod"
region: "eu-central-1"
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: foo-iac
spec:
interval: 5m
provider: aws
ref:
semver: "${version_foo_tf}"
url: oci://xxx.dkr.ecr.eu-central-1.amazonaws.com/iac/foo
apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
name: foo
spec:
backendConfig:
customConfiguration: |
backend "s3" {
region = "${region}"
bucket = "terraform-states"
key = "${clientId}/${environment}/stacks/foo.tfstate"
role_arn = "arn:aws:iam::xxx:role/tf-${clientId}-${environment}"
dynamodb_table = "terraform-states-locks"
encrypt = true
}
sourceRef:
kind: OCIRepository
name: foo-iac
vars: []
base
; Runtime: ConfigMap
init
configuration.tf
overlay
directoriesvalues.yaml
locals {
service = "foo"
squad = "bar"
domain_name = module.ssm_data.values["base"]["domain"]
cluster_name = module.ssm_data.values["base"]["cluster"]["name"]
client = nonsensitive(module.ssm_data.values["base"]["clientId"])
environment = nonsensitive(module.ssm_data.values["base"]["environment"])
env_class = nonsensitive(module.ssm_data.values["base"]["environment_class"])
configuration = {
default = {
k8s_namespace = local.service
k8s_sa_name = local.service
rds_instance_class = "db.t4g.medium"
}
client_a = {
stage = {}
}
environment_classes = {
non-prod = {}
prod = {
rds_instance_class = "db.r6g.medium"
}
}
}
# choose the right configuration based on
# client/environment/environment class or simply defaults
selected_configuration = merge(
local.configuration["default"],
try(local.configuration[local.client][local.environment], {})
)
}
# get the central SSM config parameters
module "ssm_data" {
source = "registry.example.com/foo/ssm_full_json_store/aws"
version = "0.3.1"
path = var.config_map_base_path
include_filter_regex = "(base|foo|bar)"
}
module "database" {
source = "registry.example.com/foo/RDS/aws"
version = "3.5.0"
identifier = local.service
squad = local.squad
rds_engine_version = local.selected_configuration["rds_engine_version"]
rds_instance_class = local.selected_configuration["rds_instance_class"]
client_id = local.client
environment = local.environment
vpc_id = module.ssm_data.values["base"]["aws"]["vpc_id"]
subnet_ids = module.ssm_data.values["base"]["aws"]["subnet_public_ids"]
# [...]
}
module "ssm_service_data" {
source = "registry.example.com/foo/ssm_json_store/aws"
version = "1.0.2"
path = "/configuration"
name = "foo"
data = {
installed = true
private = {
database = {
database_name = module.database.databas
database_username = module.database.database_username
endpoint = module.database.endpoint
reader_endpoint = module.database.reader_endpoint
port = module.database.cluster_port
}
}
public = {}
}
}
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: foo-secrets-ssm
spec:
target:
name: foo-secrets-ssm
data:
# [...]
- remoteRef:
key: /configuration/foo
property: private.database.database_username
secretKey: DATABASE_USER
- remoteRef:
key: /configuration/foo
property: private.database.endpoint
secretKey: DATABASE_HOST
---
kind: Deployment
metadata:
annotations:
reloader.stakater.com/auto: "true"
tf-runner
pod
terraform-provider-aws_5.31.0_darwin_arm64.zip
= 84MBterraform init
for each execution.terraformrc
credentials "my.terraform-registry.foo.bar" {
token = "7H151553CUr3!" # we are 1337
}
provider_installation {
network_mirror {
url = "https://my.terraform-registry.foo.bar/v1/mirror/"
include = ["*/*"]
}
}
tf-runner
pod per
stacktf-runner
pods consumes
priorityClass
apiVersion: scheduling.k8s.io/v1
description: used to limit the number of terraform runners
kind: PriorityClass
metadata:
name: terraform
value: 0 # same priority as everybody else
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: terraform-runners
spec:
hard:
pods: "10"
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values:
- terraform
Be honest, where are you in the project?
https://www.linkedin.com/feed/update/urn:li:activity:7160295096825860096/
General FluxCD
TF-Controller
your experience might be different 😄↩︎
https://developer.hashicorp.com/terraform/language/files#the-root-module↩︎
https://en.wikipedia.org/wiki/Don%27t_repeat_yourself↩︎
HashiTalks DACH 2020 - Opinionated terraform modules and a registry↩︎
How TIER switched paradigms - from team- to service-centric↩︎
TF-CIX as an approach to share information between terraform stacks↩︎
https://fluxcd.io/flux/components/↩︎
https://github.com/weaveworks/tf-controller↩︎
Please note: As the tf-runner ServiceAccount is usually very powerful, do not run it in an accessible namespace!↩︎
https://fluxcd.io/flux/cmd/flux_push_artifact/↩︎
https://fluxcd.io/flux/components/kustomize/kustomizations/#post-build-variable-substitution/↩︎
Weave Policy Engine, Integrate TF Controller with Flux Receivers and Alerts, Open Policy Agent↩︎
https://github.com/weaveworks/weave-gitops and https://docs.gitops.weave.works/↩︎
https://www.opentofu.org/↩︎
https://github.com/weaveworks/tf-controller/releases/tag/v0.16.0-rc.3↩︎
Introducing the BACK Stack! - https://www.youtube.com/watch?v=SMlR12uwMLs↩︎