Under The Hood

What I Learned About DynamoDB DAX

Recently I've been dipping my toes into learning DynamoDB, and came across DAX, which is a managed write-thru cache service for DynamoDB intended to speed up read operations from milliseconds to sub-millisecond.

Initially the benefits were very appealing:

  • DynamoDB's single-digit milliseconds read latency could be too slow for some applications -- fronting with a write-thru cache speed up reads and expand its use cases.

  • Reads per partition throttles after 3000 RCU (Read Capacity Unit, which approximates to 6000 eventually consistent reads per second with provisioned capacity). Although adaptive capacity handles automatic repartition, it still takes time to kick in (they claimed 5-30 minutes). Fronting with a cache reduces the likelihood of reaching this throttling threshold.

  • DAX supports high availability without the management overhead. By setting up multiple DAX nodes in different AZ, it handles replication, leader election, and automatic failover of the cache such that if one node goes down, you have a warm cache waiting for you in another AZ (hooray to a good night's sleep!).

As I dug more into their documentations, it became pretty clear that there are a lot left to be desired. Here are the fine prints:

  • Each DAX cluster has an item cache and a query cache, and they are completely separate entities. When you write to a DynamoDB table thru DAX, it is written to the item cache. When you read from a DynamoDB table, it will then check against the item cache -- but only if you are issuing a point-lookup (GetItem and BatchGetItem operations). If you are issuing a query or a scan operation instead, it will check against the query cache, which is only updated on a cache miss. You can systematically configure each cache's TTL , but there is no mechanism for on-demand cache invalidations of individual keys. The default TTL is 5 minutes, this means the data you query could be 5-minutes stale. Sure, you can configure it to a much shorter duration, but that would end up offsetting the benefits of the cache to begin with! IMO data being 5 minutes stale is just too slow, even for a service that can tolerate eventual consistency.

  • DAX ignores all transactional reads/writes to DynamoDB, it will act as a proxy that simply pass those requests forward to the underlying table. Transactional reads are much slower than eventually consistent reads in DynamoDB, and are the more appealing candidates for performance optimization, but yet they are not supported by DAX.

  • We all know that DynamoDB can throttle, but guess what? so can DAX! It will throttle your operations caused by high memory utilization on any node in the cluster, even if the underlying table's read/write capacity have not been reached. This implies DAX will become a central point of failure unless your application code is catching exceptions and retrying directly with DynamoDB client.

  • Have global tables enabled in DynamoDB to achieve multi-region availability? It's worth noting that global tables only replicate data of the underlying table to other region, DAX caches do not get replicated (it only replicates within the same region). So if you want to achieve multi-region with DAX, you have to roll out your own cross-region replication and application-layer failover mechanism.

  • DAX offers both horizontal and vertical scaling. However unlike Redis cluster which has automatic partitioning, DAX's horizontal scaling only adds read replicas and does not scale writes. If you need to scale writes, you have to upgrade your node type to a set of beefier machines (vertical scaling) or implement your own application-layer partitioning logic. Wait there's more! If you want to go with the easier option of vertical scaling, keep in mind it cannot be done automatically, you have to spin up an entire separate cluster, and manually warm them up yourself.

  • Single-node cluster? Don't even think about it. DAX has a weekly maintenance window that could take the node down if you don't have another one to failover to. Think of DAX as a proxy. If it's down, writes to DynamoDB will fail (as mentioned above).

So if you decide to put DAX in front of DynamoDB, please don't do so blindly! Make sure to read the fine print and evaluate it for your specific use cases.


over 3 years ago

Joy Gao