website logo
⌘K
Getting Started 🚀
What is DataLakeHouse.io?
Our Business-Value Focus
Learn the Basic Concepts
Connectors
Operations Applications
ConnectWise
Google Sheets
Aloha POS
BILL
Bloom Growth
Ceridian Dayforce
Food Delivery Service Connector
Facebook Ads
Google Analytics 4
Harvest
Hubspot
Jira
McLeod Transportation
NetSuite (Oracle NetSuite)
Optimum HRIS
QuickBooks Online
Salesforce
Shopify
Square
Square Marketplace
Stripe
Workday HCM
Xero
Databases
SQL Transformations
Terraform: Reverse Terraforming
DBT Cloud Transformations
Sync Bridge (Data Pipelines)
Create a Sync Bridge
Manually Run a Sync Bridge
Deleting a Sync Bridge
Analytics
Access Analytics
Snowflake Usage Analytics
FAQ (about syncing data)
How are new columns are added to the target Data Warehouse?
....
Data Catalog
Create the Catalog
Populate the Catalog
Access the Catalog
Data Warehousing
Snowflake
Open Source DW Models
Alerts & Notifications
Integrations (Slack, etc.)
Logs & Monitoring
Security
Release Notes
April 2022
July 2022
Community Overview
Contributor Agreements
Code Contribution Guide
About
Our
License
Viewpoint
Docs powered by archbee 
6min

What is AEM?

Active Events per Month (AEM) is how DataLakeHouse determines the amount of data that is used in your subscription plan which is driven by your data consumption.

Active rows and pipeline efficiency

First, let's define "monthly active event." The two main components of AEM are:

  • Rows at Rest: Rows at rest is the total number of primary keys in the data source
  • Update Rate: Update rate is the percent of primary keys in the source that are updated or added at least once in a single month. This is typically 5-15%.

A row becomes active when added to or updated in a data destination like a data warehouse. We only recognize a row as active once in a month period, not each time it’s updated. This means you’re not charged for multiple updates to a row in a single month.

We’re able to price based on active events per month due to DataLakeHouse.io connectors being designed to efficiently capture changes in the data source and perform incremental upserts wherever possible. This ends up being 20 to 200 times smaller than the total synced rows you’ll see from typical pipelines. This can ultimately reduce the cost of managing a cloud destination since only necessary data, which needs to be replicated, will be brought over.

Total Synced Rows and Update Waste

We learned some time ago that active events per month are not the same as what our customers see in their pipelines for total synced rows. This is because a typical pipeline will experience waste, which happens when a row that wasn't updated is repeatedly synced in a few ways:

  • Multiple Row Updates: A single row, defined by a unique primary key, can be updated multiple times in a single month. Rows will undergo updates several times over the course of a month. Each update counts as a synced row. This generally occurs 5 times per month on average.
  • Snapshot Waste: This happens when a primary key that wasn't actually updated is synced (e.g., when you replicate a table using snapshots). Capturing updates is hard and many often resort to a snapshot approach, syncing all rows every time. This generally occurs 10 to 20 times per month on average.

Over the course of a month, or even years, you can see how much waste a typical data pipeline generates because it was never built to handle incremental changes effectively.

Calculating AEM vs. Total Synced Rows

Now that we know the difference between AEM and total synced rows, we can finally calculate just how different they can be with the same amount of data.

To calculate AEM:

Total Rows at Rest * Update Rate % = AEM

To estimate your total synced rows:

(Average # of Row Updates AEM) + (# of Monthly Snapshots Total Rows at Rest) = Total Synced Rows

Example

What if we were to take a database of 10 million rows at rest and compare the different counts?

Monthly Active Rows:

10,000,000 X 10% = 1 million

Total Synced Rows:

(51,000,000)+(54,000,000) = 105 million



Updated 03 Mar 2023
Did this page help you?
Yes
No
UP NEXT
Architecture
Docs powered by archbee 
TABLE OF CONTENTS
Active rows and pipeline efficiency
Calculating AEM vs. Total Synced Rows
Example