Traditional database theory dictates that you design the data set before entering any data. A data lake, also called an enterprise data lake or enterprise data hub. Then get all these data sources and dump them all into a big Hadoop repository, and you won’t try to design a data model beforehand. Instead, it provides tools for people to analyze the data, along with a high-level definition of what data exists in the lake. eople build the views into the data as they go along. It’s a very incremental, organic model for building a large-scale database.
As part of its Intuit Analytics Cloud, Intuit has a data lake that includes clickstream user data and enterprise and third-party data. But, the focus is on “democratizing” the tools surrounding it to enable business people to use it effectively. Hadoop is that the platform isn’t really enterprise-ready. It needs capabilities that traditional enterprise databases have had for decades — monitoring access control, encryption, securing the data and tracing the lineage of data from source to destination.