Hadoop Operations doc

Thông tin tài liệu

[...]... “Network Design” on page 66 Command-Line Tools Hadoop comes with a number of command-line tools that enable basic filesystem operations Like all Hadoop tools, HDFS commands are subcommands of the hadoop command-line utility Running hadoop fs will display basic usage information, as shown in Example 2-1 Example 2-1 hadoop fs help information [esammer @hadoop0 1 ~]$ hadoop fs Usage: java FsShell [-ls ]... supergroup 2216 2012-01-25 21:07 /user/esammer/passwd esammer @hadoop0 1 ~]$ ls -al passwd ls: passwd: No such file or directory [esammer @hadoop0 1 ~]$ hadoop fs -get /user/esammer/passwd / [esammer @hadoop0 1 ~]$ ls -al passwd -rw-rw-r + 1 esammer esammer 2216 Jan 25 21:17 passwd [esammer @hadoop0 1 ~]$ hadoop fs -rm /user/esammer/passwd Deleted hdfs:/ /hadoop0 1.sf.cloudera.com/user/esammer/passwd Also unique... 2-4) Example 2-4 Copying files to and from HDFS [esammer @hadoop0 1 ~]$ hadoop fs -ls /user/esammer/ Found 2 items drwx esammer supergroup 0 2012-01-11 15:06 /user/esammer/.staging -rw-r r-3 esammer supergroup 27888890 2012-01-10 13:41 /user/esammer/data.txt [esammer @hadoop0 1 ~]$ hadoop fs -put /etc/passwd /user/esammer/ [esammer @hadoop0 1 ~]$ hadoop fs -ls /user/esammer/ Found 3 items drwx esammer... recursively (see Example 2-5) Example 2-5 Changing the replication factor on files in HDFS [esammer @hadoop0 1 ~]$ hadoop fs -setrep 5 -R /user/esammer/tmp/ Replication 5 set: hdfs:/ /hadoop0 1.sf.cloudera.com/user/esammer/tmp/a Replication 5 set: hdfs:/ /hadoop0 1.sf.cloudera.com/user/esammer/tmp/b [esammer @hadoop0 1 ~]$ hadoop fsck /user/esammer/tmp -files -blocks -locations FSCK started by esammer from /10.1.1.160... systems such as Hadoop, including some real-world war stories In an attempt to minimize those rainy days, Chapter 10 is all about how to effectively monitor your Hadoop cluster Finally, Chapter 11 provides some basic tools and techniques for backing up Hadoop and dealing with catastrophic failure 6 | Chapter 1: Introduction CHAPTER 2 HDFS Goals and Motivation The first half of Apache Hadoop is a filesystem... reduces I/ O operations to sequential, append-only operations (in the context of the namenode, since it serves directly from RAM), which avoids costly seek operations and yields better overall performance Upon namenode startup, the fsimage file is loaded into RAM and any changes in the edits file are replayed, bringing the in-memory view of the filesystem up to date In more recent versions of Hadoop (specifically,... friends with it Exchanging data with relational databases is one of the most popular integration points with Apache Hadoop Sqoop, short for “SQL to Hadoop, ” performs bidirectional data transfer between Hadoop and almost any database with a JDBC driver Using MapReduce, Sqoop performs these operations in parallel with no need to write code For even greater performance, Sqoop supports database-specific plug-ins... why they exist, and at a high level, how they work Chapter 4 walks you through the process of planning for an Hadoop deployment including hardware selection, basic resource planning, operating system selection and configuration, Hadoop distribution and version selection, and network concerns for Hadoop clusters If you are looking for the meat and potatoes, Chapter 5 is where it’s at, with configuration... whatever the desired result might be Apache Hadoop is a platform that provides pragmatic, cost-effective, scalable infrastructure for building many of the types of applications described earlier Made up of a distributed filesystem called the Hadoop Distributed Filesystem (HDFS) and a computation layer that implements a processing paradigm called MapReduce, Hadoop is an open source, batch data processing... size for data Hadoop, on the other hand, uses the significantly larger block size of 64 MB by default In fact, cluster administrators usually raise this to 128 MB, 256 MB, or even as high as 1 GB Increasing the block size means data will be written in larger contiguous chunks on disk, which in turn means data can be written and read in larger sequential operations This minimizes drive seek operations one . class="bi x0 y0 w0 h1" alt="" Hadoop Operations Eric Sammer Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo Hadoop Operations by Eric Sammer Copyright. 41 Picking a Distribution and Version of Hadoop 41 Apache Hadoop 41 Cloudera’s Distribution Including Apache Hadoop 42 Versions and Features 42 v What

Ngày đăng: 22/03/2014, 09:20

Xem thêm: Hadoop Operations doc, Hadoop Operations doc, Chapter 4. Planning a Hadoop Cluster, Chapter 6. Identity, Authentication, and Authorization

Hadoop Operations doc

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Table of Contents

Preface

Conventions Used in This Book

Using Code Examples

Safari® Books Online

How to Contact Us

Acknowledgments

Chapter 1. Introduction

Chapter 2. HDFS

Goals and Motivation

Design

Daemons

Reading and Writing Data

The Read Path

The Write Path

Managing Filesystem Metadata

Namenode High Availability

Namenode Federation

Access and Integration

Command-Line Tools

FUSE

REST Support

Chapter 3. MapReduce

The Stages of MapReduce

Introducing Hadoop MapReduce

Daemons

Jobtracker

Tài liệu cùng người dùng

Tài liệu liên quan