Rough Cuts are manuscripts that are developed but not yet published, available through Safari. Rough Cuts provide you access to the very latest information on a given topic and offer you the opportunity to interact with the author to influence the final publication.
Also available in other formats.
This is the Rough Cut version of the printed book.
Stop searching the web for out-of-date, fragmentary, and unreliable information about running Hadoop! Now, there's a single source for all the authoritative knowledge and trustworthy procedures you need: Expert Hadoop® Administration: Managing Spark, YARN, and HDFS.
Pioneering Hadoop/Big Data administrator Sam R. Alapati shares step-by-step procedures for confidently performing every important task involved in creating, configuring, securing, managing, and optimizing production Hadoop clusters. The only Hadoop administration guide written by a working Hadoop administrator, Expert Hadoop® Administration covers an unmatched range of topics and offers an unparalleled collection of realistic examples. Alapati shares proven answers to complex configuration, management, and performance-tuning problems Hadoop administrators constantly encounter, and expert guidance for customizing Hadoop 2's intensely complex environment. Throughout, he integrates action-oriented advice with carefully researched explanations of both problems and solutions. Coverage includes
Part I: Introduction to Hadoop 2—Architecture and Hadoop Clusters
Chapter 1: Introduction to Hadoop 2 and Its Environment
Chapter 2: An Introduction to the Architecture of Hadoop 2
Chapter 3: Creating and Configuring a Simple Hadoop 2 Cluster
Chapter 4: Planning for and Creating a Fully Distributed Cluster
Part II: Hadoop Application Frameworks
Chapter 5: Running Applications in a Cluster—The MapReduce Framework (and Pig, Hive)
Chapter 6: Running Applications in a Cluster—The Spark Framework
Chapter 7: Running Applications in a Cluster—The Spark Framework (Second Part)
Part III: Managing and Protecting Hadoop Data and High Availability
Chapter 8: The Role of the NameNode and How HDFS Works
Chapter 9: HDFS Commands, File Permissions, and HDFS Storage Management
Chapter 10: Data Protection, Compression, and Hadoop Data Formats
Chapter 11: NameNode Operations and High Availability
Part IV: Moving Data, Allocating Resources, Scheduling Jobs, and Security
Chapter 12: Moving Data Into and Out of Hadoop
Chapter 13: YARN, and Resource Allocation in a Hadoop Cluster
Chapter 14: Working with Oozie and Hue to Manage Workflows
Chapter 15: Securing Hadoop
Part V: Monitoring, Optimization, and Troubleshooting
Chapter 16: Managing Jobs, Using Hue, and Performing Routine Tasks
Chapter 17: Monitoring, Metrics, and Hadoop Logging
Chapter 18: Bechmarking, Optimization, and Performance Tuning
Chapter 19: Configuring and Tuning Apache Spark on YARN
Chapter 20: Optimizing Spark Applications
Chapter 21: Troubleshooting Hadoop—A Sampler