website logo
⌘K
Getting Started 🚀
What is DataLakeHouse.io?
Our Business-Value Focus
Learn the Basic Concepts
Connectors
Operations Applications
Asana
Aloha POS
Baremetrics
Beans Route
BILL
Bloom Growth
Bullhorn
Calendly
Ceridian Dayforce
ClinicalTrials.gov
ConnectWise
DBT Cloud
DBT Cloud Log Stream
Facebook Ads
Food Delivery Service Connector
Google Analytics 4
Google Play
Harvest
Hubspot
Jira
MailChimp
McLeod Transportation
Microsoft Teams
NetSuite (Oracle NetSuite)
NetSuite SuiteAnalytics
Optimum HRIS
QuickBooks Online
Salesforce
Salesloft
Shift4 Payments
Shopify
Square
Square Marketplace
Stripe
Toast
TriNet
Verizon Wireless Business
Workday HCM
Xero
Zendesk Sell
Zoom
Databases
Files & Object Storage
SQL Data Query
SSH Tunnel Setup for Hosted Database Systems
SQL Playground Editor
SQL Transformations
DBT Cloud Transformations
Terraform: Reverse Terraforming
Sync Bridge (Data Pipelines)
Create a Sync Bridge
Manually Run a Sync Bridge
Deleting a Sync Bridge
Historical Re-sync
Analytics
Access Analytics
Snowflake Usage Analytics
Data Catalog
Create the Catalog
Populate the Catalog
Access the Catalog
Data Warehouse Clouds
❄️Snowflake
Open Source DW Models
Alerts & Notifications
Slack Notifications
Logs & Monitoring
Security
Callback Links
Service Level Agreement (SLA)
Release Notes
July 2023
June 2023
May 2023
April 2023
Q3 2022
Q4 2022
Community Overview
Contributor Agreements
Code Contribution Guide
About
Customer Support
License
Viewpoint
Credit Consumption Breakdown
Docs powered by
Archbee
website logo
Learn the Basic Concepts

What is AEM?

6min

Active Events per Month (AEM) is how DataLakeHouse determines the amount of data that is used in your subscription plan which is driven by your data consumption.

Active rows and pipeline efficiency

First, let's define "monthly active event." The two main components of AEM are:

  • Rows at Rest: Rows at rest is the total number of primary keys in the data source
  • Update Rate: Update rate is the percent of primary keys in the source that are updated or added at least once in a single month. This is typically 5-15%.

A row becomes active when added to or updated in a data destination like a data warehouse. We only recognize a row as active once in a month period, not each time it’s updated. This means you’re not charged for multiple updates to a row in a single month.

We’re able to price based on active events per month due to DataLakeHouse.io connectors being designed to efficiently capture changes in the data source and perform incremental upserts wherever possible. This ends up being 20 to 200 times smaller than the total synced rows you’ll see from typical pipelines. This can ultimately reduce the cost of managing a cloud destination since only necessary data, which needs to be replicated, will be brought over.

Total Synced Rows and Update Waste

We learned some time ago that active events per month are not the same as what our customers see in their pipelines for total synced rows. This is because a typical pipeline will experience waste, which happens when a row that wasn't updated is repeatedly synced in a few ways:

  • Multiple Row Updates: A single row, defined by a unique primary key, can be updated multiple times in a single month. Rows will undergo updates several times over the course of a month. Each update counts as a synced row. This generally occurs 5 times per month on average.
  • Snapshot Waste: This happens when a primary key that wasn't actually updated is synced (e.g., when you replicate a table using snapshots). Capturing updates is hard and many often resort to a snapshot approach, syncing all rows every time. This generally occurs 10 to 20 times per month on average.

Over the course of a month, or even years, you can see how much waste a typical data pipeline generates because it was never built to handle incremental changes effectively.

Calculating AEM vs. Total Synced Rows

Now that we know the difference between AEM and total synced rows, we can finally calculate just how different they can be with the same amount of data.

To calculate AEM:

Total Rows at Rest * Update Rate % = AEM

To estimate your total synced rows:

(Average # of Row Updates AEM) + (# of Monthly Snapshots Total Rows at Rest) = Total Synced Rows

Example

What if we were to take a database of 10 million rows at rest and compare the different counts?

Monthly Active Rows:

10,000,000 X 10% = 1 million

Total Synced Rows:

(51,000,000)+(54,000,000) = 105 million



Updated 03 Mar 2023
Did this page help you?
PREVIOUS
What is a Sync Bridge (Pipeline)?
NEXT
Architecture
Docs powered by
Archbee
TABLE OF CONTENTS
Active rows and pipeline efficiency
Calculating AEM vs. Total Synced Rows
Example
Docs powered by
Archbee