Security

22min

DataLakeHouse.io is committed to security and focused on keeping you and your data safe. DataLakeHouse.io adheres to industry-leading standards while connecting, replicating, and loading data from all of your data sources.

Contact [email protected] if you have any questions or comments.



Web portal connectivity

  • All connections to DataLakeHouse.io's web portal are encrypted by default using industry-standard cryptographic protocols (TLS 1.2+)
  • Any attempt to connect over an unencrypted channel (HTTP) is redirected to an encrypted channel (HTTPS)
  • To take advantage of HTTPS, your browser must support encryption protection (all versions of Google Chrome, Firefox, and Safari)

Communication & Encryption 

  • All connections to DataLakeHouse.io are encrypted by default, in both directions using modern ciphers and cryptographic systems. We encrypt in transit utilizing TLS 1.2
  • Any attempt to connect over HTTP is redirected to HTTPS
  • We use HSTS to ensure browsers interact with DataLakeHouse.io only over HTTPS
  • We utilize AES-256 for all data encrypted at rest

Penetration Testing 

  • DataLakeHouse.io undergoes an annual penetration testing from an outside provider, and regularly installs the latest, secure versions of all underlying software

Compliance 

  • SOC 2: A SOC 2 examination, performed by an independent, certified public accounting (CPA) firm, is an assessment of a service provider’s security control environment against the trust services principles and criteria set forth by the American Institute of Certified Public Accountants (AICPA). The result of the examination is a report which contains the service auditor’s opinion, a description of the system that was examined, management’s assertion regarding the description, and the testing procedures performed by the auditor. DataLakeHouse.io received its SOC 2 accrediation on December 15, 2023. Our SOC 2 proof of active examination is available for review under MNDA upon request for existing customers and special requests.
Document image

  • GDPR: DataLakeHouse.io is fully GDPR compliant. DataLakeHouse.io's Terms of Service includes a Data Processing Addendum that enacts standard contractual clauses set forth by the European Commission to establish a legal basis for cross-border data transfers from the EU
  • PCI: Before granting DataLakeHouse.io access to data subject to PCI requirements, please contact support at [email protected]
  • HIPAA: The Health Insurance Portability and Accountability Act of 1996 (HIPAA) is a federal law that required the creation of national standards to protect sensitive patient health information from being disclosed without the patient’s consent or knowledge. DataLakeHouse.io has being assessed against relevant HIPAA Security criteria as part of our SOC 2 report and is available for review under MNDA upon request.

Connectors

  • Connections to customers' database sources and destinations are SSL encrypted by default
  • DataLakeHouse.io can support multiple connectivity channels
  • Connections to customers' software-as-a-service (SaaS) tool sources are encrypted through HTTPS


Permissions

  • Databases and API cloud applications - DataLakeHouse.io only requires READ permissions. For data sources that by default grant permissions beyond read-only, DataLakeHouse.io will never make use of those permissions.
  • Destinations - DataLakeHouse.io requires the CREATE permission. This permission allows DataLakeHouse.io to CREATE a schema within your destination, CREATE tables within that schema, and WRITE to those tables. DataLakeHouse.io is then able to READ only the data it has written.

Retention of customer data

All customer data, besides what is listed in the Exceptions section, is purged from DataLakeHouse.io's system as soon as it is successfully written to the destination. For normal syncs, this means data exists in our system for no more than eight hours. There are some cases where the retention period may be longer, as described below. In the following two cases, customer data is automatically purged after 30 days using object lifecycle management.

  • Destination outage - DataLakeHouse.io maintains data that has been read from your source if the destination is down, so we can resume the sync without losing progress once the issue is resolved.
  • Retrieving schema information for column blocking or hashing purposes - For newly created connectors, if you choose to review your connector schema before syncing in order to use column blocking or hashing, we queue your data while we read the full schema and only write it to the destination once you approve.

Exceptions

DataLakeHouse.io retains subsets of a customer's data that are required to provide and maintain DataLakeHouse.io's solution. This only includes the following data:

  • Customer access keys - DataLakeHouse.io retains customer database credentials and SaaS OAuth tokens in order to securely and continuously extract data and troubleshoot customer issues. These credentials are securely stored in a key management system. The key management system is backed by a hardware security module that is managed by our cloud provider.
  • Customer metadata - DataLakeHouse.io retains configuration details and data points (such as table and column names) for each connector so that this information can be shown to your organization in your DataLakeHouse.io dashboard.
  • Temporary data - Some data integration or replication processes may use ephemeral data specific to a data source. This stream of data is essential to the integration process and is deleted as soon as possible, though it may briefly exceed 24 hours in rare instances. Examples of this temporary data include Binary Logs for MySQL or SQL Server.


Solution infrastructure

Access to DataLakeHouse.io production infrastructure is only allowed via hardened bastion hosts, which require an active account protected by MFA (multi-factor authentication) to authenticate. Further access to the environment and enforcement of least privilege is controlled by IAM (identity and access management) policies. Privileged actions taken from bastion host are captured in audit logs for review and anomalous behavior detection.



Physical and environmental safeguards

Physical and environmental security is handled entirely by our cloud service providers. Each of our cloud service providers provides an extensive list of compliance and regulatory assurances, including SOC 1/2-3, PCI-DSS, and ISO27001.

Google

See the Google Cloud Platform compliancesecurity, and data center security documentation for more detailed information.

Amazon

See the Amazon Web Services compliancesecurity, and data center security documentation for more detailed information.

Your organization permissions

  • Users can use Single Sign-On (SSO) with SAML 2.0.
  • Only users of your organization registered within DataLakeHouse.io and DataLakeHouse.io operations staff have access to your organization's DataLakeHouse.io dashboard.
  • Your organization's DataLakeHouse.io Dashboard provides visibility into the status of each integration, the aforementioned metadata for each integration, and the ability to pause or delete the integration connection - not organization data.
  • Organization administrators can request that DataLakeHouse.io revoke an organization member's access at any point; these requests will be honored within 24 hours or less.


Company policies

  • DataLakeHouse.io requires that all employees comply with security policies designed to keep any and all customer information safe, and address multiple security compliance standards, rules and regulations.
  • Two-factor authentication and strong password controls are required for administrative access to systems.
  • Security policies and procedures are documented and reviewed on a regular basis.
  • Current and future development follows industry-standard secure coding guidelines, such as those recommended by OWASP.
  • Networks are strictly segregated according to security level. Modern, restrictive firewalls protect all connections between networks.

HIPAA

Under The HIPAA Security Rule, DataLakeHouse.io does comply with HIPAA requirements for Protected Health Information (PHI) and will sign a Business Associate Agreement (BAA) with customers who are subject to HIPAA mandates (typically, HIPAA covered entities). DataLakeHouse.io is not a covered entity under HIPAA rules, and therefore cannot be "HIPAA compliant", since HIPAA itself applies to covered entities (that is, those entities that are subject to regulation by the HHS). DataLakeHouse.io serves as a data pipeline, which means that PHI traversing the DataLakeHouse.io environment is never permanently stored. All transmissions are encrypted using industry best practices (at present, TLS 1.2+). Temporary storage may occur when the amount of data transmitted exceeds the capacity for real-time processing, and as a result, requires short-term caching. Such temporary storage is encrypted. All customer data, including PHI, is purged from DataLakeHouse.io's system as soon as it is successfully written to the destination.



In the event of a data breach

To date, DataLakeHouse.io has not experienced a breach in security of any kind. In the event of such an occurrence, DataLakeHouse.io protocol is such that customers would be made aware as soon as the compromise is confirmed.



Responsible disclosure policy

At DataLakeHouse.io, we are committed to keeping our systems, data and product(s) secure. Despite the measures we take, security vulnerabilities will always be possible.

If you believe you’ve found a security vulnerability, please send it to us by emailing [email protected]. Please include the following details with your report:

  • Description of the location and potential impact of the vulnerability
  • A detailed description of the steps required to reproduce the vulnerability (POC scripts, screenshots, and compressed screen captures are all helpful to us)

Please make a good faith effort to avoid privacy violations as well as destruction, interruption or segregation of services and/or data.

We will respond to your report within 5 business days of receipt. If you have followed the above instructions, we will not take any legal action against you regarding the report.

Diagnostic data access

IMPORTANT: DataLakeHouse.io cannot access your data without your approval.

When working on a support ticket, we may need to access your data to troubleshoot or fix your broken connector or destination. In that case, we will ask you to grant DataLakeHouse.io access to your data for a limited time. You can allow or deny data access. If you Grant Access to Support, you can revoke it at any moment.