What is a Data Vault?
DataLakeHouse.io believes a Data LakeHouse is a framework and a way of thinking in terms of structure and scale. We support the initiatives and guidance of the Data Vault 2.0 framework as a part of the DataLakeHouse concept and continue to implement Data Vault concepts as well as supporting features in DataLakeHouse.io on an ongoing basis as part of our roadmap.
Data Vault 2.0 is a hybrid data modeling methodology, architecture, and framework that allows for working with data of all types (structured, semi-structured, and unstructured) and is designed to be resilient to environmental changes. At the core it is a modern, agile way of designing and building efficient, effective Data Warehouses. Dan Linstedt, the Data Vault methodology creator once described the Data Vault as "A detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3NF and Star Schemas. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise."
The newest iteration of the Data Vault methodology Data Vault 2.0 improves massively to align with many leaps technology advancement and compute such as removing the need for lookup surrogate keys, now using a Hash Key concept which we at DataLakeHouse.io use in all of our data modeling designs.
The Data Vault Hub, Link, and Satellite concept allow for integration of multiple data sources into single or multiple structures that help define and democratize data across the enterprise. In so doing the Data Warehouse becomes a near idempotent system of data in a Raw Vault with the ability to layer on more business centric transformed data in a Business or Information Vault. Ultimately, the ability to building a traditional dimensional model Data Warehouse or Data Mart is available with the consistency of having a reliable and massively scalable DV 2.0 Raw Vault at its core.