website logo
⌘K
Getting Started 🚀
What is DataLakeHouse.io?
Our Business-Value Focus
Learn the Basic Concepts
Connectors
Operations Applications
ConnectWise
Google Sheets
Aloha POS
BILL
Bloom Growth
Ceridian Dayforce
Food Delivery Service Connector
Facebook Ads
Google Analytics 4
Harvest
Hubspot
Jira
McLeod Transportation
NetSuite (Oracle NetSuite)
Optimum HRIS
QuickBooks Online
Salesforce
Shopify
Square
Square Marketplace
Stripe
Workday HCM
Xero
Databases
SQL Transformations
Terraform: Reverse Terraforming
DBT Cloud Transformations
Sync Bridge (Data Pipelines)
Create a Sync Bridge
Manually Run a Sync Bridge
Deleting a Sync Bridge
Analytics
Access Analytics
Snowflake Usage Analytics
FAQ (about syncing data)
How are new columns are added to the target Data Warehouse?
....
Data Catalog
Create the Catalog
Populate the Catalog
Access the Catalog
Data Warehousing
Snowflake
Open Source DW Models
Alerts & Notifications
Integrations (Slack, etc.)
Logs & Monitoring
Security
Release Notes
April 2022
July 2022
Community Overview
Contributor Agreements
Code Contribution Guide
About
Our
License
Viewpoint
Docs powered by archbee 
28min

Security

DataLakeHouse.io is committed to security and focused on keeping you and your data safe. DataLakeHouse.io adheres to industry-leading standards while connecting, replicating, and loading data from all of your data sources.

Contact security@datalakehouse.io if you have any questions or comments.



Web portal connectivity

  • All connections to DataLakeHouse.io's web portal are encrypted by default using industry-standard cryptographic protocols (TLS 1.2+).
  • Any attempt to connect over an unencrypted channel (HTTP) is redirected to an encrypted channel (HTTPS).
  • To take advantage of HTTPS, your browser must support encryption protection (all versions of Google Chrome, Firefox, and Safari).

Communication & Encryption 

  • All connections to DataLakeHouse.io are encrypted by default, in both directions using modern ciphers and cryptographic systems. We encrypt in transit utilizing TLS 1.2.
  • Any attempt to connect over HTTP is redirected to HTTPS.
  • We use HSTS to ensure browsers interact with DataLakeHouse.io only over HTTPS
  • We utilize AES-256 for all data encrypted at rest

Penetration Testing 

  • DataLakeHouse.io undergoes an annual penetration testing from an outside provider, and regularly installs the latest, secure versions of all underlying software

Compliance 

  • SOC2 Type II: A SOC 2 examination, performed by an independent, certified public accounting (CPA) firm, is an assessment of a service provider’s security control environment against the trust services principles and criteria set forth by the American Institute of Certified Public Accountants (AICPA). The result of the examination is a report which contains the service auditor’s opinion, a description of the system that was examined, management’s assertion regarding the description, and the testing procedures performed by the auditor. DataLakeHouse.io is in the midst of a SOC 2 Type II examination, which means our controls are assessed based on their operating effectiveness over the reporting period. Our SOC2 Type II proof of active examination is available for review under MNDA upon request.

Connectors

  • Connections to customers' database sources and destinations are SSL encrypted by default.
  • DataLakeHouse.io can support multiple connectivity channels
  • Connections to customers' software-as-a-service (SaaS) tool sources are encrypted through HTTPS.


Permissions

  • Databases and API cloud applications - DataLakeHouse.io only requires READ permissions. For data sources that by default grant permissions beyond read-only, DataLakeHouse.io will never make use of those permissions.
  • Destinations - DataLakeHouse.io requires the CREATE permission. This permission allows DataLakeHouse.io to CREATE a schema within your destination, CREATE tables within that schema, and WRITE to those tables. DataLakeHouse.io is then able to READ only the data it has written.

Retention of customer data

All customer data, besides what is listed in the Exceptions section, is purged from DataLakeHouse.io's system as soon as it is successfully written to the destination. For normal syncs, this means data exists in our system for no more than eight hours. There are some cases where the retention period may be longer, as described below. In the following two cases, customer data is automatically purged after 30 days using object lifecycle management.

  • Destination outage - DataLakeHouse.io maintains data that has been read from your source if the destination is down, so we can resume the sync without losing progress once the issue is resolved.
  • Retrieving schema information for column blocking or hashing purposes - For newly created connectors, if you choose to review your connector schema before syncing in order to use column blocking or hashing, we queue your data while we read the full schema and only write it to the destination once you approve.

Exceptions

DataLakeHouse.io retains subsets of a customer's data that are required to provide and maintain DataLakeHouse.io's solution. This only includes the following data:

  • Customer access keys - DataLakeHouse.io retains customer database credentials and SaaS OAuth tokens in order to securely and continuously extract data and troubleshoot customer issues. These credentials are securely stored in a key management system. The key management system is backed by a hardware security module that is managed by our cloud provider.
  • Customer metadata - DataLakeHouse.io retains configuration details and data points (such as table and column names) for each connector so that this information can be shown to your organization in your DataLakeHouse.io dashboard.
  • Temporary data - Some data integration or replication processes may use ephemeral data specific to a data source. This stream of data is essential to the integration process and is deleted as soon as possible, though it may briefly exceed 24 hours in rare instances. Examples of this temporary data include Binary Logs for MySQL or SQL Server.


Solution infrastructure

Access to DataLakeHouse.io production infrastructure is only allowed via hardened bastion hosts, which require an active account protected by MFA (multi-factor authentication) to authenticate. Further access to the environment and enforcement of least privilege is controlled by IAM (identity and access management) policies. Privileged actions taken from bastion host are captured in audit logs for review and anomalous behavior detection.



Physical and environmental safeguards

Physical and environmental security is handled entirely by our cloud service providers. Each of our cloud service providers provides an extensive list of compliance and regulatory assurances, including SOC 1/2-3, PCI-DSS, and ISO27001.

Google

See the Google Cloud Platform compliance, security, and data center security documentation for more detailed information.

Amazon

See the Amazon Web Services compliance, security, and data center security documentation for more detailed information.

Your organization permissions

  • Users can use Single Sign-On (SSO) with SAML 2.0.
    • Officially supported identity providers:
      • Azure
      • Google Workspace
      • Okta
  • Only users of your organization registered within DataLakeHouse.io and DataLakeHouse.io operations staff have access to your organization's DataLakeHouse.io dashboard.
  • Your organization's DataLakeHouse.io Dashboard provides visibility into the status of each integration, the aforementioned metadata for each integration, and the ability to pause or delete the integration connection - not organization data.
  • Organization administrators can request that DataLakeHouse.io revoke an organization member's access at any point; these requests will be honored within 24 hours or less.


Company policies

  • DataLakeHouse.io requires that all employees comply with security policies designed to keep any and all customer information safe, and address multiple security compliance standards, rules and regulations.
  • Two-factor authentication and strong password controls are required for administrative access to systems.
  • Security policies and procedures are documented and reviewed on a regular basis.
  • Current and future development follows industry-standard secure coding guidelines, such as those recommended by OWASP.
  • Networks are strictly segregated according to security level. Modern, restrictive firewalls protect all connections between networks.

HIPAA

Under The HIPAA Security Rule, DataLakeHouse.io does comply with HIPAA requirements for Protected Health Information (PHI) and will sign a Business Associate Agreement (BAA) with customers who are subject to HIPAA mandates (typically, HIPAA covered entities). DataLakeHouse.io is not a covered entity under HIPAA rules, and therefore cannot be "HIPAA compliant", since HIPAA itself applies to covered entities (that is, those entities that are subject to regulation by the HHS). DataLakeHouse.io serves as a data pipeline, which means that PHI traversing the DataLakeHouse.io environment is never permanently stored. All transmissions are encrypted using industry best practices (at present, TLS 1.2+). Temporary storage may occur when the amount of data transmitted exceeds the capacity for real-time processing, and as a result, requires short-term caching. Such temporary storage is encrypted. All customer data, including PHI, is purged from DataLakeHouse.io's system as soon as it is successfully written to the destination.



In the event of a data breach

To date, DataLakeHouse.io has not experienced a breach in security of any kind. In the event of such an occurrence, DataLakeHouse.io protocol is such that customers would be made aware as soon as the compromise is confirmed.



Responsible disclosure policy

At DataLakeHouse.io, we are committed to keeping our systems, data and product(s) secure. Despite the measures we take, security vulnerabilities will always be possible.

If you believe you’ve found a security vulnerability, please send it to us by emailing security@datalakehouse.io. Please include the following details with your report:

  • Description of the location and potential impact of the vulnerability
  • A detailed description of the steps required to reproduce the vulnerability (POC scripts, screenshots, and compressed screen captures are all helpful to us)

Please make a good faith effort to avoid privacy violations as well as destruction, interruption or segregation of services and/or data.

We will respond to your report within 5 business days of receipt. If you have followed the above instructions, we will not take any legal action against you regarding the report.

Diagnostic data access

IMPORTANT: DataLakeHouse.io cannot access your data without your approval.

When working on a support ticket, we may need to access your data to troubleshoot or fix your broken connector or destination. In that case, we will ask you to grant DataLakeHouse.io access to your data for a limited time. You can allow or deny data access. If you grant us data access, you can revoke it at any moment.

Updated 09 Mar 2023
Did this page help you?
Yes
No
UP NEXT
Release Notes
Docs powered by archbee 
TABLE OF CONTENTS
Web portal connectivity
Communication & Encryption 
Penetration Testing 
Compliance 
Connectors
Permissions
Retention of customer data
Exceptions
Solution infrastructure
Physical and environmental safeguards
Google
Amazon
Your organization permissions
Company policies
HIPAA
In the event of a data breach
Responsible disclosure policy
Diagnostic data access