DynamoDB for Dummies
Table of Contents
I’ve been dealing with DynamoDB lately, and it took me a while to wrap my head around it, so I thought I would make this guide for others.
To understand DynamoDB, it might first be best to understand traditional relational databases (If you’re reading this, I assume you have some familiarity with SQL, so this will be quick). Relational databases such as MySQL, Oracle DB, and Microsoft SQL Server store highly-structured data in tables with rows and columns. These tables can relate to each other allowing for very powerful data access patterns.
For example, let’s say you have a database with a table for customers and orders.
|Order ID||Customer ID||Item||Amount|
Now, let’s say you want to figure out which items were purchased by Bob Jones over $30. This can be done with SQL like so:
1SELECT orders.item FROM customers 2INNER JOIN orders ON customers.id=orders.customer 3WHERE customers.id == 1 AND orders.amount > 20;
This would output:
As you can see, traditional SQL-based databases provide a lot of flexibility with how data can be accessed. However, this comes at a cost. These types of databases don’t scale horizontally, but rather vertically. This can become a challenge for high-throughput applications.
Most of the effort going into designing a database like this is figuring out how to most effectively represent the data in the database schema. How this data is queried and accessed is generally less important.
DynamoDB is a proprietary NoSQL database offered by Amazon AWS that allows for near-infinite horizontal scalability. In contrast to relational databases, a lot of thought needs to be put into how the data is accessed rather than its representation.
Here are the basics parts of DynamoDB:
- Tables - A table holds items, much like relational databases. However, tables are entirely independent of one another and are not part of any “database”. Tables do not have a rigid column structure.
- Items - Items are what actually contain data with various attributes. This is like a table row. Items are uniquely identified by their primary key.
- Attributes - These are the data elements of items. These do not have to be consistent with other items in the same table. They are like table columns. An item in a “Albums” table would have attributes like “Year”, “Artist”, etc.
In terms of pricing, DynamoDB charges for both data storage, and read and write operations. Read and write operations use “read units” and “write units” which are pretty much just arbitrary units to quantify how much bandwidth/processing/etc. you used to perform an operation. You can either choose an on-demand pricing model, or a provisioned capacity model. More info is on the pricing page: https://aws.amazon.com/dynamodb/pricing/
In DynamoDB, there are two ways of accessing items by their primary key.
Partition Key Only
Items can have a unique partition key (also sometimes called “hash key”). No two items can have the same partition key. In order to query an item, one must know the partition key.
|Album ID (PK)||Album Name||Artist||Year|
|1||Dark Side of the Moon||Pink Floyd||1973|
|2||Led Zeppelin IV||Led Zeppelin||1971|
In this example, the
Album ID field is acting as the primary key.
Partition Key and Sort Key
Items can have a unique combination of the partition key and sort key (also sometimes called “range key”). This combination is called a “composite primary key”. No two items can have the same combination of the partition key and sort key. However, the partition key does not need to be unique. The sort key, must still be unique. In order to query a single item, one must know both the partition key and sort key.
|Artist (PK)||Album Name (SK)||Year|
|Pink Floyd||Dark Side of the Moon||1973|
|Led Zeppelin||Led Zeppelin IV||1971|
In this example, the
Artist field is acting as the partition key and the
as the sort key. This example would work, assuming no two artists release an album with
the same name.
In a partition key only scenario, only single items can be queried. This is because each item needs a single unique identifier, and that is the only query-able attribute.
On the other hand, composite key scenarios are exceptionally powerful, as sort keys can be queried with comparisons like equals, less than, and starts with. For example, lets say you have a table of different buildings.
|Type (PK)||Location (SK)||Name|
With a table structure like this, you could query for all warehouses in Colorado
with a query partition key of
warehouse and a query sort key of
You can see how this can be very powerful, especially if you use a timestamp as the
sort key. You could easily filter for certain time ranges.
Sometimes, you just need to get all the data in a table. With DynamoDB, this is called scanning. This really should be avoided if at all possible as it will consume a LOT of read units. You can optionally filter the results by any attribute, but it will not reduce the number of read units used.
Everything above really covers all the basics of DynamoDB. The core functionality is pretty straight-forward.
However, DynamoDB has more functionality. Another major feature is secondary indexes. Sometimes, you just need to look up values based on another attribute. Maybe it’s a username instead of the normal user ID. Secondary indexes let you query by attributes other than what you normally can.
Global Secondary Index
The first option is a Global Secondary Index. This is the most flexible option. It does not need to be defined at table creation and there is no limit on data size. It behaves just like an additional partition key. However, it uses additional read and write capacity units, so will cost you more. Additionally, a major caveat is that it is not read-consistent. What this means, is that if you write data to a global secondary index, and then immediately try to read that data, you may not receive that data back for a while. It will eventually propagate, but it is not guaranteed to be immediate.
Local Secondary Index
The other option is a Local Secondary Index. This is a lot more restrictive, but can make sense for the right use-case. It has to be defined at table creation (can’t update the schema later), each hash key can’t contain more than 10GB of data, and it shares the read and write capacity units of the underlying table. It also can only be used on tables with a partition key and sort key, as it behaves like an additional sort key. You still need to use the original partition key. It also optionally allows you enable read and write consistency if desired.
That’s about it! DynamoDB can be very inexpensive and flexible as a database. However, you really need to think through your database design and access patterns before you get started. It’s really best suited for large amounts of low-complexity data (such as logging events). Use the tool that suits your application best.