- Hadoop as a Data Lake
- The Hadoop Distributed File System (HDFS)
- Direct File Transfer to Hadoop HDFS
- Importing Data from Files into Hive Tables
- Importing Data into Hive Tables Using Spark
- Using Apache Sqoop to Acquire Relational Data
- Using Apache Flume to Acquire Data Streams
- Manage Hadoop Work and Data Flows with Apache Oozie
- Apache Falcon
- What's Next in Data Ingestion?
- Summary
What’s Next in Data Ingestion?
As the Hadoop platform continues to evolve, innovation in ingestion tools continues. Two important new tools are now available to ingestion teams that we would like to mention:
Apache Nifi is a recent addition to the data ingestion toolset. Originally created at the NSA and recently open sourced and added to the Apache family, Nifi provides a scalable way to define data routing, transformation, and system mediation logic. An excellent UI makes building data flows in Nifi fast and easy. Nifi provides support for lineage tracking and the security and monitoring capability that make it a great tool for data ingestion, especially for sensor data.
Apache Atlas provides a set of core data governance services that enables enterprises to effectively deal with compliance requirements on Hadoop.