blob: c86aa380a611eb1eec689bad52757f6f1b96de61 [file] [log] [blame]
---
page_title: Module Composition
description: |-
Module composition allows infrastructure to be described from modular
building blocks.
---
# Module Composition
In a simple Terraform configuration with only one root module, we create a
flat set of resources and use Terraform's expression syntax to describe the
relationships between these resources:
```hcl
resource "aws_vpc" "example" {
cidr_block = "10.1.0.0/16"
}
resource "aws_subnet" "example" {
vpc_id = aws_vpc.example.id
availability_zone = "us-west-2b"
cidr_block = cidrsubnet(aws_vpc.example.cidr_block, 4, 1)
}
```
When we introduce `module` blocks, our configuration becomes hierarchical
rather than flat: each module contains its own set of resources, and possibly
its own child modules, which can potentially create a deep, complex tree of
resource configurations.
However, in most cases we strongly recommend keeping the module tree flat,
with only one level of child modules, and use a technique similar to the
above of using expressions to describe the relationships between the modules:
```hcl
module "network" {
source = "./modules/aws-network"
base_cidr_block = "10.0.0.0/8"
}
module "consul_cluster" {
source = "./modules/aws-consul-cluster"
vpc_id = module.network.vpc_id
subnet_ids = module.network.subnet_ids
}
```
We call this flat style of module usage _module composition_, because it
takes multiple [composable](https://en.wikipedia.org/wiki/Composability)
building-block modules and assembles them together to produce a larger system.
Instead of a module _embedding_ its dependencies, creating and managing its
own copy, the module _receives_ its dependencies from the root module, which
can therefore connect the same modules in different ways to produce different
results.
The rest of this page discusses some more specific composition patterns that
may be useful when describing larger systems with Terraform.
## Dependency Inversion
In the example above, we saw a `consul_cluster` module that presumably describes
a cluster of [HashiCorp Consul](https://www.consul.io/) servers running in
an AWS VPC network, and thus it requires as arguments the identifiers of both
the VPC itself and of the subnets within that VPC.
An alternative design would be to have the `consul_cluster` module describe
its _own_ network resources, but if we did that then it would be hard for
the Consul cluster to coexist with other infrastructure in the same network,
and so where possible we prefer to keep modules relatively small and pass in
their dependencies.
This [dependency inversion](https://en.wikipedia.org/wiki/Dependency_inversion_principle)
approach also improves flexibility for future
refactoring, because the `consul_cluster` module doesn't know or care how
those identifiers are obtained by the calling module. A future refactor may
separate the network creation into its own configuration, and thus we may
pass those values into the module from data sources instead:
```hcl
data "aws_vpc" "main" {
tags = {
Environment = "production"
}
}
data "aws_subnet_ids" "main" {
vpc_id = data.aws_vpc.main.id
}
module "consul_cluster" {
source = "./modules/aws-consul-cluster"
vpc_id = data.aws_vpc.main.id
subnet_ids = data.aws_subnet_ids.main.ids
}
```
### Conditional Creation of Objects
In situations where the same module is used across multiple environments,
it's common to see that some necessary object already exists in some
environments but needs to be created in other environments.
For example, this can arise in development environment scenarios: for cost
reasons, certain infrastructure may be shared across multiple development
environments, while in production the infrastructure is unique and managed
directly by the production configuration.
Rather than trying to write a module that itself tries to detect whether something
exists and create it if not, we recommend applying the dependency inversion
approach: making the module accept the object it needs as an argument, via
an input variable.
For example, consider a situation where a Terraform module deploys compute
instances based on a disk image, and in some environments there is a
specialized disk image available while other environments share a common
base disk image. Rather than having the module itself handle both of these
scenarios, we can instead declare an input variable for an object representing
the disk image. Using AWS EC2 as an example, we might declare a common subtype
of the `aws_ami` resource type and data source schemas:
```hcl
variable "ami" {
type = object({
# Declare an object using only the subset of attributes the module
# needs. Terraform will allow any object that has at least these
# attributes.
id = string
architecture = string
})
}
```
The caller of this module can now itself directly represent whether this is
an AMI to be created inline or an AMI to be retrieved from elsewhere:
```hcl
# In situations where the AMI will be directly managed:
resource "aws_ami_copy" "example" {
name = "local-copy-of-ami"
source_ami_id = "ami-abc123"
source_ami_region = "eu-west-1"
}
module "example" {
source = "./modules/example"
ami = aws_ami_copy.example
}
```
```hcl
# Or, in situations where the AMI already exists:
data "aws_ami" "example" {
owner = "9999933333"
tags = {
application = "example-app"
environment = "dev"
}
}
module "example" {
source = "./modules/example"
ami = data.aws_ami.example
}
```
This is consistent with Terraform's declarative style: rather than creating
modules with complex conditional branches, we directly describe what
should already exist and what we want Terraform to manage itself.
By following this pattern, we can be explicit about in which situations we
expect the AMI to already be present and which we don't. A future reader
of the configuration can then directly understand what it is intending to do
without first needing to inspect the state of the remote system.
In the above example, the object to be created or read is simple enough to
be given inline as a single resource, but we can also compose together multiple
modules as described elsewhere on this page in situations where the
dependencies themselves are complicated enough to benefit from abstractions.
## Assumptions and Guarantees
Every module has implicit assumptions and guarantees that define what data it expects and what data it produces for consumers.
- **Assumption:** A condition that must be true in order for the configuration of a particular resource to be usable. For example, an `aws_instance` configuration can have the assumption that the given AMI will always be configured for the `x86_64` CPU architecture.
- **Guarantee:** A characteristic or behavior of an object that the rest of the configuration should be able to rely on. For example, an `aws_instance` configuration can have the guarantee that an EC2 instance will be running in a network that assigns it a private DNS record.
We recommend using [custom conditions](/language/expressions/custom-conditions) help capture and test for assumptions and guarantees. This helps future maintainers understand the configuration design and intent. Custom conditions also return useful information about errors earlier and in context, helping consumers more easily diagnose issues in their configurations.
The following examples creates a precondition that checks whether the EC2 instance has an encrypted root volume.
```hcl
output "api_base_url" {
value = "https://${aws_instance.example.private_dns}:8433/"
# The EC2 instance must have an encrypted root volume.
precondition {
condition = data.aws_ebs_volume.example.encrypted
error_message = "The server's root volume is not encrypted."
}
}
```
## Multi-cloud Abstractions
Terraform itself intentionally does not attempt to abstract over similar
services offered by different vendors, because we want to expose the full
functionality in each offering and yet unifying multiple offerings behind a
single interface will tend to require a "lowest common denominator" approach.
However, through composition of Terraform modules it is possible to create
your own lightweight multi-cloud abstractions by making your own tradeoffs
about which platform features are important to you.
Opportunities for such abstractions arise in any situation where multiple
vendors implement the same concept, protocol, or open standard. For example,
the basic capabilities of the domain name system are common across all vendors,
and although some vendors differentiate themselves with unique features such
as geolocation and smart load balancing, you may conclude that in your use-case
you are willing to eschew those features in return for creating modules that
abstract the common DNS concepts across multiple vendors:
```hcl
module "webserver" {
source = "./modules/webserver"
}
locals {
fixed_recordsets = [
{
name = "www"
type = "CNAME"
ttl = 3600
records = [
"webserver01",
"webserver02",
"webserver03",
]
},
]
server_recordsets = [
for i, addr in module.webserver.public_ip_addrs : {
name = format("webserver%02d", i)
type = "A"
records = [addr]
}
]
}
module "dns_records" {
source = "./modules/route53-dns-records"
route53_zone_id = var.route53_zone_id
recordsets = concat(local.fixed_recordsets, local.server_recordsets)
}
```
In the above example, we've created a lightweight abstraction in the form of
a "recordset" object. This contains the attributes that describe the general
idea of a DNS recordset that should be mappable onto any DNS provider.
We then instantiate one specific _implementation_ of that abstraction as a
module, in this case deploying our recordsets to Amazon Route53.
If we later wanted to switch to a different DNS provider, we'd need only to
replace the `dns_records` module with a new implementation targeting that
provider, and all of the configuration that _produces_ the recordset
definitions can remain unchanged.
We can create lightweight abstractions like these by defining Terraform object
types representing the concepts involved and then using these object types
for module input variables. In this case, all of our "DNS records"
implementations would have the following variable declared:
```hcl
variable "recordsets" {
type = list(object({
name = string
type = string
ttl = number
records = list(string)
}))
}
```
While DNS serves as a simple example, there are many more opportunities to
exploit common elements across vendors. A more complex example is Kubernetes,
where there are now many different vendors offering hosted Kubernetes clusters
and even more ways to run Kubernetes yourself.
If the common functionality across all of these implementations is sufficient
for your needs, you may choose to implement a set of different modules that
describe a particular Kubernetes cluster implementation and all have the common
trait of exporting the hostname of the cluster as an output value:
```hcl
output "hostname" {
value = azurerm_kubernetes_cluster.main.fqdn
}
```
You can then write _other_ modules that expect only a Kubernetes cluster
hostname as input and use them interchangeably with any of your Kubernetes
cluster modules:
```hcl
module "k8s_cluster" {
source = "modules/azurerm-k8s-cluster"
# (Azure-specific configuration arguments)
}
module "monitoring_tools" {
source = "modules/monitoring_tools"
cluster_hostname = module.k8s_cluster.hostname
}
```
## Data-only Modules
Most modules contain `resource` blocks and thus describe infrastructure to be
created and managed. It may sometimes be useful to write modules that do not
describe any new infrastructure at all, but merely retrieve information about
existing infrastructure that was created elsewhere using
[data sources](/language/data-sources).
As with conventional modules, we suggest using this technique only when the
module raises the level of abstraction in some way, in this case by
encapsulating exactly how the data is retrieved.
A common use of this technique is when a system has been decomposed into several
subsystem configurations but there is certain infrastructure that is shared
across all of the subsystems, such as a common IP network. In this situation,
we might write a shared module called `join-network-aws` which can be called
by any configuration that needs information about the shared network when
deployed in AWS:
```hcl
module "network" {
source = "./modules/join-network-aws"
environment = "production"
}
module "k8s_cluster" {
source = "./modules/aws-k8s-cluster"
subnet_ids = module.network.aws_subnet_ids
}
```
The `network` module itself could retrieve this data in a number of different
ways: it could query the AWS API directly using
[`aws_vpc`](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/vpc)
and
[`aws_subnet_ids`](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/subnet_ids)
data sources, or it could read saved information from a Consul cluster using
[`consul_keys`](https://registry.terraform.io/providers/hashicorp/consul/latest/docs/data-sources/keys),
or it might read the outputs directly from the state of the configuration that
manages the network using
[`terraform_remote_state`](/language/state/remote-state-data).
The key benefit of this approach is that the source of this information can
change over time without updating every configuration that depends on it.
Furthermore, if you design your data-only module with a similar set of outputs
as a corresponding management module, you can swap between the two relatively
easily when refactoring.