Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 624 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
624
Dung lượng
2,82 MB
Nội dung
Hadoop Real-World Solutions Cookbook Second Edition Table of Contents Hadoop Real-World Solutions Cookbook Second Edition Credits About the Author Acknowledgements About the Reviewer www.PacktPub.com eBooks, discount offers, and more Why Subscribe? Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Downloading the color images of this book Errata Piracy Questions Getting Started with Hadoop 2.X Introduction Installing a single-node Hadoop Cluster Getting ready How to it How it works Hadoop Distributed File System (HDFS) Yet Another Resource Negotiator (YARN) There's more Installing a multi-node Hadoop cluster Getting ready How to it How it works Adding new nodes to existing Hadoop clusters Getting ready How to it How it works Executing the balancer command for uniform data distribution Getting ready How to it How it works There's more Entering and exiting from the safe mode in a Hadoop cluster How to it How it works Decommissioning DataNodes Getting ready How to it How it works Performing benchmarking on a Hadoop cluster Getting ready How to it TestDFSIO NNBench MRBench How it works Exploring HDFS Introduction Loading data from a local machine to HDFS Getting ready How to it How it works Exporting HDFS data to a local machine Getting ready How to it How it works Changing the replication factor of an existing file in HDFS Getting ready How to it How it works Setting the HDFS block size for all the files in a cluster Getting ready How to it How it works Setting the HDFS block size for a specific file in a cluster Getting ready How to it How it works Enabling transparent encryption for HDFS Getting ready How to it How it works Importing data from another Hadoop cluster Getting ready How to it How it works Recycling deleted data from trash to HDFS Getting ready How to it How it works Saving compressed data in HDFS Getting ready How to it How it works Mastering Map Reduce Programs Introduction Writing the Map Reduce program in Java to analyze web log data Getting ready How to it How it works Executing the Map Reduce program in a Hadoop cluster Getting ready How to it How it works Adding support for a new writable data type in Hadoop Getting ready How to it How it works Implementing a user-defined counter in a Map Reduce program Getting ready How to it How it works Map Reduce program to find the top X Getting ready How to it How it works Map Reduce program to find distinct values Getting ready How to it How it works Map Reduce program to partition data using a custom partitioner Getting ready How to it How it works Writing Map Reduce results to multiple output files Getting ready How to it How it works Performing Reduce side Joins using Map Reduce Getting ready How to it How it works Unit testing the Map Reduce code using MRUnit Getting ready How to it How it works Data Analysis Using Hive, Pig, and Hbase Introduction Storing and processing Hive data in a sequential file format Getting ready How to it How it works Storing and processing Hive data in the RC file format Getting ready How to it How it works Storing and processing Hive data in the ORC file format Getting ready How to it How it works Storing and processing Hive data in the Parquet file format Getting ready How to it How it works Performing FILTER By queries in Pig Getting ready How to it How it works Performing Group By queries in Pig Getting ready How to it How it works Performing Order By queries in Pig Getting ready How to it How it works Performing JOINS in Pig Getting ready How to it How it works Replicated Joins Skewed Joins Merge Joins Writing a user-defined function in Pig Getting ready How to it How it works There's more Analyzing web log data using Pig Getting ready How to it How it works Performing the Hbase operation in CLI Getting ready How to it How it works Performing Hbase operations in Java Getting ready How to it How it works Executing the MapReduce programming with an Hbase Table Getting ready How to it How it works Advanced Data Analysis Using Hive Introduction Processing JSON data in Hive using JSON SerDe Getting ready How to it How it works Processing XML data in Hive using XML SerDe Getting ready How to it How it works Processing Hive data in the Avro format Getting ready How to it How it works Writing a user-defined function in Hive Getting ready How to it How it works Performing table joins in Hive Getting ready How to it Left outer join Right outer join Full outer join Left semi join How it works Executing map side joins in Hive Getting ready How to it How it works Performing context Ngram in Hive Getting ready How to it How it works Call Data Record Analytics using Hive Getting ready How to it How it works Twitter sentiment analysis using Hive Getting ready How to it How it works Implementing Change Data Capture using Hive Getting ready How to it How it works Multiple table inserting using Hive Getting ready How to it How it works Data Import/Export Using Sqoop and Flume Introduction Importing data from RDMBS to HDFS using Sqoop Getting ready How to it How it works Exporting data from HDFS to RDBMS Getting ready How to it How it works Using query operator in Sqoop import Getting ready How to it How it works Importing data using Sqoop in compressed format Getting ready How to it How it works Performing Atomic export using Sqoop Getting ready How to it How it works Importing data into Hive tables using Sqoop Getting ready How to it How it works Importing data into HDFS from Mainframes Getting ready How to it How it works Incremental import using Sqoop Getting ready How to it How it works Creating and executing Sqoop job Getting ready How to it How it works Importing data from RDBMS to Hbase using Sqoop Getting ready MySQL connector URL / Getting ready N Naive Bayes algorithm reference / How it works Ngrams reference / How it works NNBench benchmarking / NNBench nodes adding, to existing Hadoop clusters / Adding new nodes to existing Hadoop clusters, How it works O Olympics Athletes Data Analytics defining, Spark Shell used / Performing Olympics Athletes analytics using the Spark Shell, How to it URL / How to it Oozie used, for implementing Sqoop action job / Implementing a Sqoop action job using Oozie, How to it used, for implementing Map Reduce action job / Implementing a Map Reduce action job using Oozie, How to it used, for implementing Java action job / Implementing a Java action job using Oozie, How to it used, for implementing Hive action job / Implementing a Hive action job using Oozie, How to it , How it works used, for implementing Pig action job / Implementing a Pig action job using Oozie, How to it , How it works used, for implementing e-mail action job / Implementing an e-mail action job using Oozie, How to it , How it works used, for executing parallel jobs / Executing parallel jobs using Oozie (fork), How to it job, scheduling / Scheduling a job in Oozie, How to it ORC file format Hive data, storing in / Storing and processing Hive data in the ORC file format, How it works Hive data, processing in / Storing and processing Hive data in the ORC file format, How it works Order By queries performing, in Pig / Performing Order By queries in Pig, How it works P parallel jobs executing, Oozie used / Executing parallel jobs using Oozie (fork), How to it Parquet about / Analyzing Parquet files using Spark URL / How to it Parquet file format Hive data, processing in / Storing and processing Hive data in the Parquet file format, How to it Hive data, storing in / Storing and processing Hive data in the Parquet file format, How it works reference / How it works Parquet files analyzing, Spark used / Getting ready, How to it , How it works Pearson product-moment correlation coefficient reference / How it works people.json sample URL / How to it Pig reference / Introduction FILTER By queries, performing in / Performing FILTER By queries in Pig, How to it Group By queries, performing in / Performing Group By queries in Pig, How to it Order By queries, performing in / Performing Order By queries in Pig, How it works JOINS, performing in / Performing JOINS in Pig, How to it user-defined function, writing in / Writing a user-defined function in Pig, How to it used, for analyzing web log data / Analyzing web log data using Pig, How to it Pig 0.15 reference / Getting ready Pig action job implementing, Oozie used / Implementing a Pig action job using Oozie, How to it , How it works population data analytics performing, R used / Population Data Analytics using R, How to it predictive analytics performing, R used / Performing Predictive Analytics using R, How to it conducting, Spark MLib used / Conducting predictive analytics using Spark MLib, How to it , How it works Q query operator used, in Sqoop import / Using query operator in Sqoop import, How it works R R about / Introduction used, for performing population data analytics / Population Data Analytics using R, How to it used, for performing Twitter sentiment analytics / Twitter Sentiment Analytics using R, How to it , How it works used, for performing predictive analytics / Performing Predictive Analytics using R, How to it RC file format Hive data, storing in / Storing and processing Hive data in the RC file format, How it works Hive data, processing in / Storing and processing Hive data in the RC file format, How it works Reduce side Joins performing, Map Reduce used / Performing Reduce side Joins using Map Reduce, How to it, How it works Remote Procedure Calls (RPC) / Processing Hive data in the Avro format replicated joins about / Replicated Joins reference / Replicated Joins replication factor modifying, of existing file in HDFS / Changing the replication factor of an existing file in HDFS, How it works right outer join / Right outer join S safe mode entering / Entering and exiting from the safe mode in a Hadoop cluster exiting from / Entering and exiting from the safe mode in a Hadoop cluster sequential file format Hive data, storing in / Storing and processing Hive data in a sequential file format, How to it Hive data, processing in / Storing and processing Hive data in a sequential file format, How to it SGD for logistic regression reference / How to it Single Node Hadoop Cluster installing / Getting ready, How to it , How it works HDFS file operations, performing on / There's more skewed joins about / Skewed Joins reference / Skewed Joins Spark running, on YARN / Running Spark on YARN, How to it used, for analyzing Parquet files / Getting ready, How to it , How it works used, for analyzing JSON data / Analyzing JSON data using Spark , How to it , How it works Spark Shell used, for Olympics Athletes Data Analytics / Performing Olympics Athletes analytics using the Spark Shell, How to it Spark standalone running / Running Spark standalone, How to it , How it works Spark streaming used, for Twitter trending topics / Twitter trending topics using Spark streaming, How to it Spark Streaming used, for creating Twitter trending topics / Creating Twitter trending topics using Spark Streaming, How to it , How it works URL / How it works Sqoop used, for importing data from RDMBS to HDFS / Importing data from RDMBS to HDFS using Sqoop, How to it used, for performing Atomic export / Performing Atomic export using Sqoop, How it works used, for importing data into Hive table / Importing data into Hive tables using Sqoop, How to it , How it works used, for incremental import / Incremental import using Sqoop, How to it used, for importing data from RDBMS to Hbase / Importing data from RDBMS to Hbase using Sqoop, How to it , How it works Sqoop, in compressed format used, for importing data / Importing data using Sqoop in compressed format, How to it , How it works Sqoop action job implementing, Oozie used / Implementing a Sqoop action job using Oozie, How to it Sqoop import query operator, using / Using query operator in Sqoop import, How to it Sqoop job creating / Creating and executing Sqoop job, How it works executing / Creating and executing Sqoop job, How it works Stochastic Gradient Descent (SGD) / How to it T table joins performing, in Hive / Performing table joins in Hive, How to it left outer join / Left outer join right outer join / Right outer join full outer join / Full outer join left semi join / Left semi join TestDFSIO benchmarking / TestDFSIO text data clustering, K-Means Mahout used / Text data clustering using K-Means using Mahout, How to it top X finding, Map Reduce program used / Map Reduce program to find the top X, How to it transparent encryption enabling, for HDFS / Enabling transparent encryption for HDFS, How to it , How it works reference / How it works TreeMap reference link / How to it Twitter apps URL / How to it Twitter authorization tokens generating / How to it Twitter data importing Twitter data, Flume used / Importing Twitter data into HDFS using Flume, How to it Twitter sentiment analysis performing, Hive used / Twitter sentiment analysis using Hive, How to it , How it works Twitter sentiment analytics performing, R used / Twitter Sentiment Analytics using R, How to it , How it works Twitter trending topics creating, Spark Streaming used / Creating Twitter trending topics using Spark Streaming, How to it , How it works defining, Spark streaming used / Twitter trending topics using Spark streaming, How to it U uniform data distribution balancer command, executing for / Executing the balancer command for uniform data distribution, How to it user-defined counter implementing, in Map Reduce program / Implementing a userdefined counter in a Map Reduce program, How to it , How it works user-defined function writing, in Pig / Writing a user-defined function in Pig, How to it User-Defined Functions (UDFs) about / Introduction user based recommendation engine setting up, Mahout used / Creating a user-based recommendation engine using Mahout, How to it , How it works User Defined functions writing, in Hive / Writing a user-defined function in Hive, How to it W Web log analytics defining / Web log analytics, Solution references / Getting ready problem statement / Problem statement solution / Solution web log data analyzing, Pig used / Analyzing web log data using Pig, How to it web logs data into HDFS importing, Flume used / Importing web logs data into HDFS using Flume, How to it , How it works X XML data processing, Hive XML SerDe used / Processing XML data in Hive using XML SerDe, How to it , How it works XML SerDe references / Getting ready Y YARN Spark, running on / Running Spark on YARN, How to it Yet Another Resource Negotiator (YARN) about / How to it , Yet Another Resource Negotiator (YARN) ... encryption using Hadoop Getting ready How to it Problem statement Solution How it works Index Hadoop Real-World Solutions Cookbook Second Edition Hadoop Real-World Solutions Cookbook Second Edition. . .Hadoop Real-World Solutions Cookbook Second Edition Table of Contents Hadoop Real-World Solutions Cookbook Second Edition Credits About the Author Acknowledgements... February 2013 Second edition: March 2016 Production reference: 1220316 Published by Packt Publishing Ltd Livery Place 35 Livery Street Birmingham B3 2PB, UK ISBN 97 8-1 -7 843 9-5 5 0-6 www.packtpub.com