- Hadoop as a Data Lake
- The Hadoop Distributed File System (HDFS)
- Direct File Transfer to Hadoop HDFS
- Importing Data from Files into Hive Tables
- Importing Data into Hive Tables Using Spark
- Using Apache Sqoop to Acquire Relational Data
- Using Apache Flume to Acquire Data Streams
- Manage Hadoop Work and Data Flows with Apache Oozie
- Apache Falcon
- What's Next in Data Ingestion?
- Summary
Direct File Transfer to Hadoop HDFS
The easiest way to move data into and out of HDFS is to use the native HDFS commands. These commands are wrappers that interact with the HDFS file system. Local commands, such as cp, ls, or mv will only work on local files. To copy a file (test) from your local file system to HDFS, the following put command can be used:
$ hdfs dfs -put test
To view files in HDFS use the following command. The result is a full listing similar to a locally executed ls -l command:
$ hdfs dfs -ls -rw-r--r-- 2 username hdfs 497 2016-05-11 14:32 test
To copy a file (another-test) from HDFS to your local file system, use the following get command:
$ hdfs dfs -get another-test
Other HDFS commands will be introduced in the examples. Appendix B “HDFS Quick Start,” provides basic command examples including listing, copying, and removing files in HDFS.