What is AEM?
Active Events per Month (AEM) is how DataLakeHouse determines the amount of data that is used in your subscription plan which is driven by your data consumption.
First, let's define "monthly active event." The two main components of AEM are:
- Rows at Rest: Rows at rest is the total number of primary keys in the data source
- Update Rate: Update rate is the percent of primary keys in the source that are updated or added at least once in a single month. This is typically 5-15%.
A row becomes active when added to or updated in a data destination like a data warehouse. We only recognize a row as active once in a month period, not each time it’s updated. This means you’re not charged for multiple updates to a row in a single month.
We’re able to price based on active events per month due to DataLakeHouse.io connectors being designed to efficiently capture changes in the data source and perform incremental upserts wherever possible. This ends up being 20 to 200 times smaller than the total synced rows you’ll see from typical pipelines. This can ultimately reduce the cost of managing a cloud destination since only necessary data, which needs to be replicated, will be brought over.
Total Synced Rows and Update Waste
We learned some time ago that active events per month are not the same as what our customers see in their pipelines for total synced rows. This is because a typical pipeline will experience waste, which happens when a row that wasn't updated is repeatedly synced in a few ways:
- Multiple Row Updates: A single row, defined by a unique primary key, can be updated multiple times in a single month. Rows will undergo updates several times over the course of a month. Each update counts as a synced row. This generally occurs 5 times per month on average.
- Snapshot Waste: This happens when a primary key that wasn't actually updated is synced (e.g., when you replicate a table using snapshots). Capturing updates is hard and many often resort to a snapshot approach, syncing all rows every time. This generally occurs 10 to 20 times per month on average.
Over the course of a month, or even years, you can see how much waste a typical data pipeline generates because it was never built to handle incremental changes effectively.
Now that we know the difference between AEM and total synced rows, we can finally calculate just how different they can be with the same amount of data.
To calculate AEM:
Total Rows at Rest * Update Rate % = AEM
To estimate your total synced rows:
(Average # of Row Updates AEM) + (# of Monthly Snapshots Total Rows at Rest) = Total Synced Rows
What if we were to take a database of 10 million rows at rest and compare the different counts?
Monthly Active Rows:
10,000,000 X 10% = 1 million
Total Synced Rows:
(51,000,000)+(54,000,000) = 105 million