v1.14.6/website/content/docs/enterprise/consistency.mdx - hashicorp/vault - Git at Google

 ---
 layout: docs
 page_title: Vault Enterprise Eventual Consistency
 description: Vault Enterprise Consistency Model
 ---

 # Vault eventual consistency

 @include 'alerts/enterprise-and-hcp.mdx'

 When running in a cluster, Vault has an eventual consistency model.
 Only one node (the leader) can write to Vault's storage.
 Users generally expect read-after-write consistency: in other
 words, after writing foo=1, a subsequent read of foo should return 1. Depending
 on the Vault configuration this isn't always the case. When using performance
 standbys with Integrated Storage, or when using performance replication,
 there are some sequences of operations that don't always yield read-after-write
 consistency.

 ## Performance standby nodes

 When using the Integrated Storage backend without performance standbys, only
 a single Vault node (the active node) handles requests. Requests sent to
 regular standbys are handled by forwarding them to the active node. This Vault configuration
 gives Vault the same behavior as the default Consul consistency model.

 When using the Integrated Storage backend with performance standbys, both the
 active node and performance standbys can handle requests. If a performance standby
 handles a login request, or a request that generates a dynamic secret, the
 performance standby will issue a remote procedure call (RPC) to the active node to store the token
 and/or lease. If the performance standby handles any other request that
 results in a storage write, it will forward that request to the active node
 in the same way a regular standby forwards all requests.

 With Integrated Storage, all writes occur on the active node, which then issues
 RPCs to update the local storage on every other node. Between when the active
 node writes the data to its local disk, and when those RPCs are handled on the
 other nodes to write the data to their local disks, those nodes present a stale
 view of the data.

 As a result, even if you're always talking to the same performance standby,
 you may not get read-after-write semantics. The write gets sent to the active
 node, and if the subsequent read request occurs before the new data gets sent
 to the node handling the read request, the read request won't be able to take
 the write into account because the new data isn't present on that node yet.

 ## Performance replication

 A similar phenomenon occurs when using performance replication. One example
 of how this manifests is when using shared mounts. If a KV secrets engine
 is mounted on the primary with `local=false`, it will exist on the secondary
 cluster as well. The secondary cluster can handle requests to that mount,
 though as with performance standbys, write requests must be forwarded - in
 this case to the primary active node. Once data is written to the primary cluster,
 it won't be visible on the secondary cluster until the data has been replicated
 from the primary. Therefore, on the secondary cluster, it initially appears as if
 the data write hasn't happened.

 If the secondary cluster is using Integrated Storage, and the read request is
 being handled on one of its performance standbys, the problem is exacerbated because it
 has to be sent first from the primary active node to the secondary active node,
 and then from there to the secondary performance standby, each of which can
 introduce their own form of lag.

 Even without shared secret engines, stale reads can still happen with performance
 replication. The Identity subsystem aims to provide a view on entities and
 groups which span across clusters. As such, when logging in to a secondary cluster
 using a shared mount, Vault tries to generate an entity and alias if they don't
 already exist, and these must be stored on the primary using an RPC. Something
 similar happens with groups.

 ## Mitigations

 There has long been a partial mitigation for the above problems. When writing
 data via RPC, e.g. when a performance standby registers tokens and leases on the
 active node after a login or generating a dynamic secret, part of the response
 includes a number known as the "WAL index", aka Write-Ahead Log index.

 A full explanation of this is outside the scope of this document, but the short
 version is that both performance replication and performance standbys use log
 shipping to stay in sync with the upstream source of writes. The mitigation
 historically used by nodes doing writes via RPC is to look at the WAL index in
 the response and wait up to 2 seconds to see if that WAL index appear in the
 logs being shipped from upstream. Once the WAL index is seen, the Vault node
 handling the request that resulted in RPCs can return its own response to the
 client: it knows that any subsequent reads will be able to see the value that
 was just written. If the WAL index isn't seen within those 2 seconds, the Vault
 node completes the request anyway, returning a warning in the response.

 This mitigation option still exists in Vault 1.7, though now there is a
 configuration option to adjust the wait time:
 [best_effort_wal_wait_duration](/vault/docs/configuration/replication).

 ## Vault 1.7 mitigations

 There are now a variety of other mitigations available:

 - per-request option to always forward the request to the active node
 - per-request option to conditionally forward the request to the active node
   if it would otherwise result in a stale read
 - per-request option to fail requests if they might result in a stale read
 - Vault Proxy configuration to do the above for proxied requests

 The remainder of this document describes the tradeoffs of these mitigations and
 how to use them.

 Note that any headers requesting forwarding are disabled by default, and must
 be enabled using [allow_forwarding_via_header](/vault/docs/configuration/replication).

 ### Unconditional forwarding (Performance standbys only)

 The simplest solution to never experience stale reads from a performance standby
 is to provide the following HTTP header in the request:

 ```
 X-Vault-Forward: active-node
 ```

 The drawback here is that if all your requests are forwarded to the active node,
 you might as well not be using performance standbys. So this mitigation only
 makes sense to use selectively.

 This mitigation will not help with stale reads relating to performance replication.

 ### Conditional forwarding (Performance standbys only)

 As of Vault Enterprise 1.7, all requests that modify storage now return a new
 HTTP response header:

 ```
 X-Vault-Index: <base64 value>
 ```

 To ensure that the state resulting from that write request is visible to a
 subsequent request, add these headers to that second request:

 ```
 X-Vault-Index: <base64 value taken from previous response>
 X-Vault-Inconsistent: forward-active-node
 ```

 The effect will be that the node handling the request will look at the state
 it has locally, and if it doesn't contain the state described by the X-Vault-Index
 header, the node will forward the request to the active node.

 The drawback here is that when requests are forwarded to the active node,
 performance standbys provide less value. If this happens often enough
 the active node can become a bottleneck, limiting the horizontal read scalability
 performance standbys are intended to provide.

 ### Retry stale requests

 As of Vault Enterprise 1.7, all requests that modify storage now return a new
 HTTP response header:

 ```
 X-Vault-Index: <base64 value>
 ```

 To ensure that the state resulting from that write request is visible to a
 subsequent request, add this headers to that second request:

 ```
 X-Vault-Index: <base64 value taken from previous response>
 ```

 When the desired state isn't present, Vault will return a failure response with
 HTTP status code 412. This tells the client that it should retry the request.
 The advantage over the Conditional Forwarding solution above is twofold:
 first, there's no additional load on the active node. Second, this solution
 is applicable to performance replication as well as performance standbys.

 The Vault Go API will now automatically retry 412s, and provides convenience
 methods for propagating the X-Vault-Index response header into the request
 header of subsequent requests. Those not using the Vault Go API will want
 to build equivalent functionality into their client library.

 ### Vault proxy and consistency headers

 When configured, the [Vault API Proxy](/vault/docs/agent-and-proxy/proxy/apiproxy) will proxy incoming requests to Vault. There is
 Proxy configuration available in the `api_proxy` stanza that allows making use
 of some of the above mitigations without modifying clients.

 By setting `enforce_consistency="always"`, Proxy will always provide
 the `X-Vault-Index` consistency header. The value it uses for the header
 will be based on the responses that have passed through the Proxy previously.

 The option `when_inconsistent` controls how stale reads are prevented:

 - `"fail"` means that when a `412` response is seen, it is returned to the client
 - `"retry"` means that `412` responses will be retried automatically by Proxy,
   so the client doesn't have to deal with them
 - `"forward"` makes Proxy provide the
   `X-Vault-Inconsistent: forward-active-node` header as described above under
   Conditional Forwarding

 ## Vault 1.10 mitigations

 In Vault 1.10, the token format has changed, where service tokens now employ server side consistency.
 This means that by default, requests made
 to nodes which cannot support read-after-write consistency due to
 not having the necessary WAL index to check Vault tokens locally will output
 a 412 status code. The Vault Go API automatically retries when receiving 412s, so
 unless there is a considerable replication delay, users will experience
 read-after-write consistency.

 The replication option [allow_forwarding_via_token](/vault/docs/configuration/replication)
 can be used to enforce requests that would have returned 412s in the
 aforementioned way will be forwarded instead to the active node.

 Refer to the [Server Side Consistent Token FAQ](/vault/docs/faq/ssct) for details.

 ## Client API helpers

 There are some new helpers in the `api` package to work with the new headers.
 `WithRequestCallbacks` and `WithResponseCallbacks` create a shallow clone of
 the client and populate it with the given callbacks. `RecordState` and
 `RequireState` are used to store the response header from one request and
 provide it in a subsequent request. For example:

 ```go
 client := api.NewClient(api.DefaultConfig)
 var state string
 _, err := client.WithResponseCallbacks(api.RecordState(&state)).Write(path, data)
 secret, err := client.WithRequestCallbacks(api.RequireState(state)).Read(path)
 ```

 This will retry the `Read` until the data stored by the `Write` is present.
 There are also callbacks to use forwarding: `ForwardInconsistent` and
 `ForwardAlways`.
	---
	layout: docs
	page_title: Vault Enterprise Eventual Consistency
	description: Vault Enterprise Consistency Model
	---

	# Vault eventual consistency

	@include 'alerts/enterprise-and-hcp.mdx'

	When running in a cluster, Vault has an eventual consistency model.
	Only one node (the leader) can write to Vault's storage.
	Users generally expect read-after-write consistency: in other
	words, after writing foo=1, a subsequent read of foo should return 1. Depending
	on the Vault configuration this isn't always the case. When using performance
	standbys with Integrated Storage, or when using performance replication,
	there are some sequences of operations that don't always yield read-after-write
	consistency.

	## Performance standby nodes

	When using the Integrated Storage backend without performance standbys, only
	a single Vault node (the active node) handles requests. Requests sent to
	regular standbys are handled by forwarding them to the active node. This Vault configuration
	gives Vault the same behavior as the default Consul consistency model.

	When using the Integrated Storage backend with performance standbys, both the
	active node and performance standbys can handle requests. If a performance standby
	handles a login request, or a request that generates a dynamic secret, the
	performance standby will issue a remote procedure call (RPC) to the active node to store the token
	and/or lease. If the performance standby handles any other request that
	results in a storage write, it will forward that request to the active node
	in the same way a regular standby forwards all requests.

	With Integrated Storage, all writes occur on the active node, which then issues
	RPCs to update the local storage on every other node. Between when the active
	node writes the data to its local disk, and when those RPCs are handled on the
	other nodes to write the data to their local disks, those nodes present a stale
	view of the data.

	As a result, even if you're always talking to the same performance standby,
	you may not get read-after-write semantics. The write gets sent to the active
	node, and if the subsequent read request occurs before the new data gets sent
	to the node handling the read request, the read request won't be able to take
	the write into account because the new data isn't present on that node yet.

	## Performance replication

	A similar phenomenon occurs when using performance replication. One example
	of how this manifests is when using shared mounts. If a KV secrets engine
	is mounted on the primary with `local=false`, it will exist on the secondary
	cluster as well. The secondary cluster can handle requests to that mount,
	though as with performance standbys, write requests must be forwarded - in
	this case to the primary active node. Once data is written to the primary cluster,
	it won't be visible on the secondary cluster until the data has been replicated
	from the primary. Therefore, on the secondary cluster, it initially appears as if
	the data write hasn't happened.

	If the secondary cluster is using Integrated Storage, and the read request is
	being handled on one of its performance standbys, the problem is exacerbated because it
	has to be sent first from the primary active node to the secondary active node,
	and then from there to the secondary performance standby, each of which can
	introduce their own form of lag.

	Even without shared secret engines, stale reads can still happen with performance
	replication. The Identity subsystem aims to provide a view on entities and
	groups which span across clusters. As such, when logging in to a secondary cluster
	using a shared mount, Vault tries to generate an entity and alias if they don't
	already exist, and these must be stored on the primary using an RPC. Something
	similar happens with groups.

	## Mitigations

	There has long been a partial mitigation for the above problems. When writing
	data via RPC, e.g. when a performance standby registers tokens and leases on the
	active node after a login or generating a dynamic secret, part of the response
	includes a number known as the "WAL index", aka Write-Ahead Log index.

	A full explanation of this is outside the scope of this document, but the short
	version is that both performance replication and performance standbys use log
	shipping to stay in sync with the upstream source of writes. The mitigation
	historically used by nodes doing writes via RPC is to look at the WAL index in
	the response and wait up to 2 seconds to see if that WAL index appear in the
	logs being shipped from upstream. Once the WAL index is seen, the Vault node
	handling the request that resulted in RPCs can return its own response to the
	client: it knows that any subsequent reads will be able to see the value that
	was just written. If the WAL index isn't seen within those 2 seconds, the Vault
	node completes the request anyway, returning a warning in the response.

	This mitigation option still exists in Vault 1.7, though now there is a
	configuration option to adjust the wait time:
	[best_effort_wal_wait_duration](/vault/docs/configuration/replication).

	## Vault 1.7 mitigations

	There are now a variety of other mitigations available:

	- per-request option to always forward the request to the active node
	- per-request option to conditionally forward the request to the active node
	if it would otherwise result in a stale read
	- per-request option to fail requests if they might result in a stale read
	- Vault Proxy configuration to do the above for proxied requests

	The remainder of this document describes the tradeoffs of these mitigations and
	how to use them.

	Note that any headers requesting forwarding are disabled by default, and must
	be enabled using [allow_forwarding_via_header](/vault/docs/configuration/replication).

	### Unconditional forwarding (Performance standbys only)

	The simplest solution to never experience stale reads from a performance standby
	is to provide the following HTTP header in the request:

	```
	X-Vault-Forward: active-node
	```

	The drawback here is that if all your requests are forwarded to the active node,
	you might as well not be using performance standbys. So this mitigation only
	makes sense to use selectively.

	This mitigation will not help with stale reads relating to performance replication.

	### Conditional forwarding (Performance standbys only)

	As of Vault Enterprise 1.7, all requests that modify storage now return a new
	HTTP response header:

	```
	X-Vault-Index: <base64 value>
	```

	To ensure that the state resulting from that write request is visible to a
	subsequent request, add these headers to that second request:

	```
	X-Vault-Index: <base64 value taken from previous response>
	X-Vault-Inconsistent: forward-active-node
	```

	The effect will be that the node handling the request will look at the state
	it has locally, and if it doesn't contain the state described by the X-Vault-Index
	header, the node will forward the request to the active node.

	The drawback here is that when requests are forwarded to the active node,
	performance standbys provide less value. If this happens often enough
	the active node can become a bottleneck, limiting the horizontal read scalability
	performance standbys are intended to provide.

	### Retry stale requests

	As of Vault Enterprise 1.7, all requests that modify storage now return a new
	HTTP response header:

	```
	X-Vault-Index: <base64 value>
	```

	To ensure that the state resulting from that write request is visible to a
	subsequent request, add this headers to that second request:

	```
	X-Vault-Index: <base64 value taken from previous response>
	```

	When the desired state isn't present, Vault will return a failure response with
	HTTP status code 412. This tells the client that it should retry the request.
	The advantage over the Conditional Forwarding solution above is twofold:
	first, there's no additional load on the active node. Second, this solution
	is applicable to performance replication as well as performance standbys.

	The Vault Go API will now automatically retry 412s, and provides convenience
	methods for propagating the X-Vault-Index response header into the request
	header of subsequent requests. Those not using the Vault Go API will want
	to build equivalent functionality into their client library.

	### Vault proxy and consistency headers

	When configured, the [Vault API Proxy](/vault/docs/agent-and-proxy/proxy/apiproxy) will proxy incoming requests to Vault. There is
	Proxy configuration available in the `api_proxy` stanza that allows making use
	of some of the above mitigations without modifying clients.

	By setting `enforce_consistency="always"`, Proxy will always provide
	the `X-Vault-Index` consistency header. The value it uses for the header
	will be based on the responses that have passed through the Proxy previously.

	The option `when_inconsistent` controls how stale reads are prevented:

	- `"fail"` means that when a `412` response is seen, it is returned to the client
	- `"retry"` means that `412` responses will be retried automatically by Proxy,
	so the client doesn't have to deal with them
	- `"forward"` makes Proxy provide the
	`X-Vault-Inconsistent: forward-active-node` header as described above under
	Conditional Forwarding

	## Vault 1.10 mitigations

	In Vault 1.10, the token format has changed, where service tokens now employ server side consistency.
	This means that by default, requests made
	to nodes which cannot support read-after-write consistency due to
	not having the necessary WAL index to check Vault tokens locally will output
	a 412 status code. The Vault Go API automatically retries when receiving 412s, so
	unless there is a considerable replication delay, users will experience
	read-after-write consistency.

	The replication option [allow_forwarding_via_token](/vault/docs/configuration/replication)
	can be used to enforce requests that would have returned 412s in the
	aforementioned way will be forwarded instead to the active node.

	Refer to the [Server Side Consistent Token FAQ](/vault/docs/faq/ssct) for details.

	## Client API helpers

	There are some new helpers in the `api` package to work with the new headers.
	`WithRequestCallbacks` and `WithResponseCallbacks` create a shallow clone of
	the client and populate it with the given callbacks. `RecordState` and
	`RequireState` are used to store the response header from one request and
	provide it in a subsequent request. For example:

	```go
	client := api.NewClient(api.DefaultConfig)
	var state string
	_, err := client.WithResponseCallbacks(api.RecordState(&state)).Write(path, data)
	secret, err := client.WithRequestCallbacks(api.RequireState(state)).Read(path)
	```

	This will retry the `Read` until the data stored by the `Write` is present.
	There are also callbacks to use forwarding: `ForwardInconsistent` and
	`ForwardAlways`.