Featured image of post Terraform: why data sources and filters are preferable over remote state

Terraform: why data sources and filters are preferable over remote state

Why Terraform data sources are preferable over remote state, with use-cases using multiple filters based on tags to filter resources dynamically

Quite often you need to share data or output resources between your Terraform modules. Fundamental modules that build the infrastructure have no dependencies. As your infrastructure grows, the dependencies are inevitable. A frequently used module is the VPC.

Almost every resource (if not all of them) requires a VPC to be placed in. When you look up how to share resources between your modules, you’re led to articles and examples of terraform_remote_state. This solves the problem, but in my opinion, there is a better alternative. By better, I mean, more stable.

Whenever you need to share the state between modules, your first choice should be terraform data sources. terraform_remote_state should be the alternative when the first is not achievable.

Use the data “aws_*” resource

Data sources allow Terraform to use the information defined outside of Terraform, defined by another separate Terraform configuration, or modified by functions.

Using data sources with Terraform is a good design choice. It uses AWS API to fetch resources based on names, filters (tags), etc. instead of hard-coding them in the module or fetching them from a remote state.

Data sources are more accurate. They are always up to date. The code will always check the active resource and won’t depend on the module which created it.

Let’s look at a couple of use cases.

S3 bucket

If your module requires an S3 bucket name or ID as input, it is as simple as that to provide it using the relevant data source:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data "aws_s3_bucket" "this" {
  bucket = "my-shiny-bucket"
}

module "app" {
  source = "../modules/app"

  .. removed for brevity ..

  s3_bucket  = data.aws_s3_bucket.this.id
}

AWS account ID

How many times have you needed to provide your account ID to configure a policy? instead of copy-pasting this value between your module, simply fetch it using a data source:

1
2
3
4
5
6
7
8
9
data "aws_caller_identity" "this" {}

module "app" {
  source = "../modules/app"

  .. removed for brevity ..

  aws_account  = data.aws_caller_identity.this.account_id
}

Time to look for the more advanced features.

Filters

Data sources support the filter block, to fetch specific resources. A filter allows you to filter the results returned from a data source call. The use-case I’m using for an example is updating routing tables.

You have your VPC, which contains multiple subnets. You have a task to add a route to these routing tables in the staging environment.

How would you get the list of the routing tables? What if you would like to update just the ones related to private subnets? Does your solution still work?

Let’s look at some code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# main.tf
# -------
data "aws_route_tables" "rts" {
  vpc_id   = var.vpc_id

  dynamic "filter" {
    for_each = var.rts_filters
    content {
      name = filter.value.name
      values = filter.value.values
    }
  }
}

resource "aws_route" "r" {
	count = length(data.aws_route_tables.rts.ids)

  route_table_id            = tolist(data.aws_route_tables.rts.ids)[count.index]
  destination_cidr_block    = "10.0.1.0/22"
  vpc_peering_connection_id = "pcx-45ff3dc1"
}

# terraform.tfvars
# ----------------
rts_filters = [{
      name = "tag:Environment"
      values = ["staging"]
    },
    {
      name = "tag:Name"
      values = ["*private*"]
    }]
  • The example assumes your resources are tagged with Environment = staging, and contains the string private in their name (e.g, us-east-private-1a, us-east-private-1b, ..)
  • A dynamic block acts much like a for expression but produces nested blocks instead of a complex typed value. It iterates over a given complex value and generates a nested block for each element of that complex value.
    • Using this block, you can support multiple filters. This is translated to multiple filter blocks
    • I’m using the tag:.. key, but there are other filters you can use depending on the resource type
  • Using filter.value we have access to the value of the current element
  • In the example, I’m adding a route to 10.0.1.0/22 through VPC peering. For additional config options, check aws_route docs

This syntax feels a bit strange at first until you get used to it, and is very powerful.

Summary

In the world of programming or IaC, there is more than one way to solve a problem. In this post I wrote my preference after adopting the use of terraform data sources, rather than using terraform_remote_state. The latter has its use. I find it more complex and fragile to changes.

The takeaway from this is, the statement at the top of the article:

Whenever you need to share the state between modules, your first choice should be terraform data sources. terraform_remote_state should be the alternative when the first is not achievable.

Any thoughts or comments are welcome.