I think the ultimate question is: Can all the benefits of a traditional relational data warehouse be implemented inside of a Hadoop data lake with interactive querying via Hive LLAP or Spark SQL, or should I use both a data lake and a relational data warehouse in my big data solution? The short answer is you should use both. The rest of this post will dig into the reasons why.
The love affair with the noSQL (BigData) databases seems to be over. Many of the projects using Hadoop and the other “not” relational databases have fallen by the wayside. Some things like structured data are still done better on the old school relational database server s and accessed with SQL or some SQL tool. As the amount of unstructured data increases so will the use of noSQL databases.