Hadoop Operations doc

297 260 0
Hadoop Operations doc

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

[...]... “Network Design” on page 66 Command-Line Tools Hadoop comes with a number of command-line tools that enable basic filesystem operations Like all Hadoop tools, HDFS commands are subcommands of the hadoop command-line utility Running hadoop fs will display basic usage information, as shown in Example 2-1 Example 2-1 hadoop fs help information [esammer @hadoop0 1 ~]$ hadoop fs Usage: java FsShell [-ls ]... supergroup 2216 2012-01-25 21:07 /user/esammer/passwd esammer @hadoop0 1 ~]$ ls -al passwd ls: passwd: No such file or directory [esammer @hadoop0 1 ~]$ hadoop fs -get /user/esammer/passwd / [esammer @hadoop0 1 ~]$ ls -al passwd -rw-rw-r + 1 esammer esammer 2216 Jan 25 21:17 passwd [esammer @hadoop0 1 ~]$ hadoop fs -rm /user/esammer/passwd Deleted hdfs:/ /hadoop0 1.sf.cloudera.com/user/esammer/passwd Also unique... 2-4) Example 2-4 Copying files to and from HDFS [esammer @hadoop0 1 ~]$ hadoop fs -ls /user/esammer/ Found 2 items drwx esammer supergroup 0 2012-01-11 15:06 /user/esammer/.staging -rw-r r-3 esammer supergroup 27888890 2012-01-10 13:41 /user/esammer/data.txt [esammer @hadoop0 1 ~]$ hadoop fs -put /etc/passwd /user/esammer/ [esammer @hadoop0 1 ~]$ hadoop fs -ls /user/esammer/ Found 3 items drwx esammer... recursively (see Example 2-5) Example 2-5 Changing the replication factor on files in HDFS [esammer @hadoop0 1 ~]$ hadoop fs -setrep 5 -R /user/esammer/tmp/ Replication 5 set: hdfs:/ /hadoop0 1.sf.cloudera.com/user/esammer/tmp/a Replication 5 set: hdfs:/ /hadoop0 1.sf.cloudera.com/user/esammer/tmp/b [esammer @hadoop0 1 ~]$ hadoop fsck /user/esammer/tmp -files -blocks -locations FSCK started by esammer from /10.1.1.160... systems such as Hadoop, including some real-world war stories In an attempt to minimize those rainy days, Chapter 10 is all about how to effectively monitor your Hadoop cluster Finally, Chapter 11 provides some basic tools and techniques for backing up Hadoop and dealing with catastrophic failure 6 | Chapter 1: Introduction CHAPTER 2 HDFS Goals and Motivation The first half of Apache Hadoop is a filesystem... reduces I/ O operations to sequential, append-only operations (in the context of the namenode, since it serves directly from RAM), which avoids costly seek operations and yields better overall performance Upon namenode startup, the fsimage file is loaded into RAM and any changes in the edits file are replayed, bringing the in-memory view of the filesystem up to date In more recent versions of Hadoop (specifically,... friends with it Exchanging data with relational databases is one of the most popular integration points with Apache Hadoop Sqoop, short for “SQL to Hadoop, ” performs bidirectional data transfer between Hadoop and almost any database with a JDBC driver Using MapReduce, Sqoop performs these operations in parallel with no need to write code For even greater performance, Sqoop supports database-specific plug-ins... why they exist, and at a high level, how they work Chapter 4 walks you through the process of planning for an Hadoop deployment including hardware selection, basic resource planning, operating system selection and configuration, Hadoop distribution and version selection, and network concerns for Hadoop clusters If you are looking for the meat and potatoes, Chapter 5 is where it’s at, with configuration... whatever the desired result might be Apache Hadoop is a platform that provides pragmatic, cost-effective, scalable infrastructure for building many of the types of applications described earlier Made up of a distributed filesystem called the Hadoop Distributed Filesystem (HDFS) and a computation layer that implements a processing paradigm called MapReduce, Hadoop is an open source, batch data processing... size for data Hadoop, on the other hand, uses the significantly larger block size of 64 MB by default In fact, cluster administrators usually raise this to 128 MB, 256 MB, or even as high as 1 GB Increasing the block size means data will be written in larger contiguous chunks on disk, which in turn means data can be written and read in larger sequential operations This minimizes drive seek operations one . class="bi x0 y0 w0 h1" alt="" Hadoop Operations Eric Sammer Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo Hadoop Operations by Eric Sammer Copyright. 41 Picking a Distribution and Version of Hadoop 41 Apache Hadoop 41 Cloudera’s Distribution Including Apache Hadoop 42 Versions and Features 42 v What

Ngày đăng: 22/03/2014, 09:20

Từ khóa liên quan

Mục lục

  • Table of Contents

  • Preface

    • Conventions Used in This Book

    • Using Code Examples

    • Safari® Books Online

    • How to Contact Us

    • Acknowledgments

    • Chapter 1. Introduction

    • Chapter 2. HDFS

      • Goals and Motivation

      • Design

      • Daemons

      • Reading and Writing Data

        • The Read Path

        • The Write Path

        • Managing Filesystem Metadata

        • Namenode High Availability

        • Namenode Federation

        • Access and Integration

          • Command-Line Tools

          • FUSE

          • REST Support

          • Chapter 3. MapReduce

            • The Stages of MapReduce

            • Introducing Hadoop MapReduce

              • Daemons

                • Jobtracker

Tài liệu cùng người dùng

Tài liệu liên quan