Home > Articles

  • Print
  • + Share This
This chapter is from the book

Summary

In this chapter

  • The Hadoop data lake concept was presented as a new model for data processing.

  • Various methods for making data available to several Hadoop tools were outlined. The examples included copying files directly to HDFS, importing CSV files to Apache Hive and Spark, and importing JSON files into HIVE with Spark.

  • Apache Sqoop was presented as a tool for moving relational data into and out of HDFS.

  • Apache Flume was presented as tool for capturing and transporting continuous data, such as web logs, into HDFS.

  • The Apache Oozie workflow manager was described as a tool for creating and scheduling Hadoop workflows.

  • The Apache Falcon tool enables a high-level framework for data governance (end-to-end management) by keeping Hadoop data and tasks organized and defined as pipelines.

  • New tools like Apache Nifi and Atlas were mentioned as options for governance and data flow on a Hadoop cluster.

  • + Share This
  • 🔖 Save To Your Account