Run Sample MapReduce Examples
To test your installation, run the sample “pi” program that calculates the value of pi using a quasi-Monte Carlo method and MapReduce. Change to user hdfs and run the following:
# su - hdfs $ cd /opt/yarn/hadoop-2.2.0/bin $ export YARN_EXAMPLES=/opt/yarn/hadoop-2.2.0/share/hadoop/mapreduce $ ./yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.2.0.jar pi 16 1000
If the program worked correctly, the following should be displayed at the end of the program output stream:
Estimated value of Pi is 3.14250000000000000000
This example submits a MapReduce job to YARN from the included samples in the share/hadoop/mapreduce directory. The master JAR file contains several sample applications to test your YARN installation. After you submit the job, its progress can be viewed by updating the ResourceManager webpage shown in Figure 2.2.
You can get a full list of examples by entering the following:
./yarn jar $YARN_EXAMPLES/hadoop-mapreduce-examples-2.2.0.jar
To see a list of options for each example, add the example name to this command. The following is a list of the included jobs in the examples JAR file.
- aggregatewordcount: An Aggregate-based map/reduce program that counts the words in the input files.
- aggregatewordhist: An Aggregate-based map/reduce program that computes the histogram of the words in the input files.
- bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute the exact digits of pi.
- dbcount: An example job that counts the pageview counts from a database.
- distbbp: A map/reduce program that uses a BBP-type formula to compute the exact bits of pi.
- grep: A map/reduce program that counts the matches to a regex in the input.
- join: A job that effects a join over sorted, equally partitioned data sets.
- multifilewc: A job that counts words from several files.
- pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
- pi: A map/reduce program that estimates pi using a quasi-Monte Carlo method.
- randomtextwriter: A map/reduce program that writes 10 GB of random textual data per node.
- randomwriter: A map/reduce program that writes 10 GB of random data per node.
- secondarysort: An example defining a secondary sort to the reduce.
- sort: A map/reduce program that sorts the data written by the random writer.
- sudoku: A Sudoku solver.
- teragen: Generate data for the terasort.
- terasort: Run the terasort.
- teravalidate: Check the results of the terasort.
- wordcount: A map/reduce program that counts the words in the input files.
- wordmean: A map/reduce program that counts the average length of the words in the input files.
- wordmedian: A map/reduce program that counts the median length of the words in the input files.
- wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
Some of the examples require files to be copied to or from HDFS. For those unfamiliar with basic HDFS operation, an HDFS quick start is provided in Appendix F. If you were able to complete the preceding steps, you should now have a fully functioning Apache Hadoop YARN system running in pseudo-distributed mode.