Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 563 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
563
Dung lượng
11,79 MB
Nội dung
BigDataAnalyticswithJava Table of Contents BigDataAnalyticswithJava Credits About the Author About the Reviewers www.PacktPub.com eBooks, discount offers, and more Why subscribe? Customer Feedback Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Downloading the color images of this book Errata Piracy Questions BigDataAnalyticswithJava Why dataanalytics on big data? Bigdata for analyticsBigdata – a bigger pay package for Java developers Basics of Hadoop – a Java sub-project Distributed computing on Hadoop HDFS concepts Design and architecture of HDFS Main components of HDFS HDFS simple commands Apache Spark Concepts Transformations Actions Spark Java API Spark samples using Java Loading dataData operations – cleansing and munging Analyzing data – count, projection, grouping, aggregation, and max/min Actions on RDDs Paired RDDs Transformations on paired RDDs Saving data Collecting and printing results Executing Spark programs on Hadoop Apache Spark sub-projects Spark machine learning modules MLlib Java API Other machine learning libraries Mahout – a popular Java ML library Deeplearning4j – a deep learning library Compressing data Avro and Parquet Summary First Steps in Data Analysis Datasets Data cleaning and munging Basic analysis of datawith Spark SQL Building SparkConf and context Dataframe and datasets Load and parse data Analyzing data – the Spark-SQL way Spark SQL for data exploration and analytics Market basket analysis – Apriori algorithm Full Apriori algorithm Implementation of the Apriori algorithm in Apache Spark Efficient market basket analysis using FP-Growth algorithm Running FP-Growth on Apache Spark Summary Data Visualization Data visualization withJava JFreeChart Using charts in bigdataanalytics Time Series chart All India seasonal and annual average temperature series dataset Simple single Time Series chart Multiple Time Series on a single chart window Bar charts Histograms When would you use a histogram? How to make histograms using JFreeChart? Line charts Scatter plots Box plots Advanced visualization technique Prefuse IVTK Graph toolkit Other libraries Summary Basics of Machine Learning What is machine learning? Real-life examples of machine learning Type of machine learning A small sample case study of supervised and unsupervised learning Steps for machine learning problems Choosing the machine learning model What are the feature types that can be extracted from the datasets? How you select the best features to train your models? How you run machine learning analytics on big data? Getting and preparing data in Hadoop Preparing the data Formatting the data Storing the data Training and storing models on bigdata Apache Spark machine learning API The new Spark ML API Summary Regression on BigData Linear regression What is simple linear regression? Where is linear regression used? Predicting house prices using linear regression Dataset Data cleaning and munging Exploring the dataset Running and testing the linear regression model Logistic regression Which mathematical functions does logistic regression use? Where is logistic regression used? Predicting heart disease using logistic regression Dataset Data cleaning and munging Data exploration Running and testing the logistic regression model Summary Naive Bayes and Sentiment Analysis Conditional probability Bayes theorem Naive Bayes algorithm Advantages of Naive Bayes Disadvantages of Naive Bayes Sentimental analysis Concepts for sentimental analysis Tokenization Stop words removal Stemming N-grams Term presence and Term Frequency TF-IDF Bag of words Dataset Data exploration of text data Sentimental analysis on this dataset SVM or Support Vector Machine Summary Decision Trees What is a decision tree? Building a decision tree Choosing the best features for splitting the datasets Advantages of using decision trees Disadvantages of using decision trees Dataset Data exploration Cleaning and munging the data Training and testing the model Summary Ensembling on BigData Ensembling Types of ensembling Bagging Boosting Advantages and disadvantages of ensembling Random forests Gradient boosted trees (GBTs) Classification problem and dataset used Data exploration Training and testing our random forest model Training and testing our gradient boosted tree model Summary Recommendation Systems Recommendation systems and their types Content-based recommendation systems Dataset Content-based recommender on MovieLens dataset Collaborative recommendation systems Advantages Disadvantages Alternating least square – collaborative filtering Summary 10 Clustering and Customer Segmentation on BigData Clustering Types of clustering Hierarchical clustering K-means clustering Bisecting k-means clustering Customer segmentation Dataset Data exploration Clustering for customer segmentation Changing the clustering algorithm Summary 11 Massive Graphs on BigData Refresher on graphs Representing graphs Common terminology on graphs Common algorithms on graphs Plotting graphs Massive graphs on bigdata Graph analytics GraphFrames Building a graph using GraphFrames Graph analytics on airports and their flights Datasets Graph analytics on flights data Summary 12 Real-Time Analytics on BigData Real-time analyticsBigdata stack for real-time analytics Real-time SQL queries on bigdata Real-time data ingestion and storage Real-time data processing Real-time SQL queries using Impala Flight delay analysis using Impala Apache Kafka Spark Streaming Typical uses of Spark Streaming Base project setup Trending videos Sentiment analysis in real time Summary 13 Deep Learning Using BigData Introduction to neural networks Perceptron Problems with perceptrons Sigmoid neuron Multi-layer perceptrons Accuracy of multi-layer perceptrons Deep learning Advantages and use cases of deep learning Flower species classification using multi-Layer perceptrons Deeplearning4j Hand written digit recognizition using CNN Diving into the code: More information on deep learning Summary Index BigDataAnalyticswithJava ... Piracy Questions Big Data Analytics with Java Why data analytics on big data? Big data for analytics Big data – a bigger pay package for Java developers Basics of Hadoop – a Java sub-project... GraphFrames Graph analytics on airports and their flights Datasets Graph analytics on flights data Summary 12 Real-Time Analytics on Big Data Real-time analytics Big data stack for real-time analytics. . .Big Data Analytics with Java Table of Contents Big Data Analytics with Java Credits About the Author About the Reviewers www.PacktPub.com