Foreword
Preface
Acknowledgements
About This Book
Contents
About the Authors
1 Big Data Analytics
1.1 Introduction
1.2 What Is Big Data?
1.3 Disruptive Change and Paradigm Shift in the Business Meaning of Big Data
1.4 Hadoop
1.5 Silos
1.5.1 Big Bang of Big Data
1.5.2 Possibilities
1.5.3 Future
1.5.4 Parallel Processing for Problem Solving
1.5.5 Why Hadoop?
1.5.6 Hadoop and HDFS
1.5.7 Hadoop Versions 1.0 and 2.0
1.5.8 Hadoop 2.0
1.6 HDFS Overview
1.7 Hadoop Ecosystem
1.8 Decision Making and Data Analysis in the Context of Big Data Environment
1.9 Machine Learning Algorithms
1.10 Evolutionary Computing (EC)
1.11 Conclusion
1.12 Review Questions
References and Bibliography
2 Intelligent Systems
2.1 Introduction
2.1.1 Open-Source Data Science
2.1.2 Machine Intelligence and Computational Intelligence
2.1.3 Data Engineering and Data Sciences
2.2 Big Data Computing
2.2.1 Distributed Systems and Database Systems
2.2.2 Data Stream Systems and Stream Mining
2.2.3 Ubiquitous Computing Infrastructures
2.3 Conclusion
2.4 Review Questions
References
3 Analytics Models for Data Science
3.1 Introduction
3.2 Data Models
3.3 Computing Models
3.3.1 Data Structures for Big Data
3.3.2 Feature Engineering for Structured Data
3.3.3 Computational Algorithm
3.3.4 Programming Models
3.3.5 Parallel Programming
3.3.6 Functional Programming
3.3.7 Distributed Programming
3.4 Conclusion
3.5 Review Questions
References
4 Big Data Tools—Hadoop Ecosystem, Spark and NoSQL Databases
4.1 Introduction
4.1.1 Hadoop Ecosystem
4.1.2 HDFS Commands [1]
4.2 MapReduce
4.3 Pig
4.4 Flume
4.5 Sqoop
4.6 Mahout, The Machine Learning Platform from Apache
4.7 GANGLIA, The Monitoring Tool
4.8 Kafka, The Stream Processing Platform (http://kafka.apache.org)
4.9 Spark
4.10 NoSQL Databases
4.11 Conclusion
References
5 Predictive Modeling for Unstructured Data
6 Machine Learning Algorithms for Big Data
6.1 Introduction
6.2 Generative Versus Discriminative Algorithms
6.3 Supervised Learning for Big Data
6.3.1 Decision Trees
6.3.2 Logistic Regression
6.3.3 Regression and Forecasting
6.3.4 Supervised Neural Networks
6.3.5 Support Vector Machines
6.4 Unsupervised Learning for Big Data
6.4.1 Spectral Clustering
6.4.2 Principal Component Analysis (PCA)
6.4.3 Latent Dirichlet Allocation (LDA)
6.4.4 Matrix Factorization
6.4.5 Manifold Learning
6.5 Semi-supervised Learning for Big Data
6.5.1 Co-training
6.5.2 Label Propagation
6.5.3 Multiview Learning
6.6 Reinforcement Learning Basics for Big Data
6.7 Online Learning for Big Data
6.8 Conclusion
6.9 Review Questions
References
7 Social Semantic Web Mining and Big Data Analytics
7.1 Introduction
7.2 What Is Semantic Web?
7.3 Knowledge Representation Techniques and Platforms in Semantic Web
7.4 Web Ontology Language (OWL)
7.5 Object Knowledge Model (OKM) [7]
7.6 Architecture of Semantic Web and the Semantic Web Road Map
7.7 Social Semantic Web Mining
7.8 Conceptual Networks and Folksonomies or Folk Taxonomies of Concepts/Subconcepts
7.9 SNA and ABM
7.10 e-Social Science
7.11 Opinion Mining and Sentiment Analysis
7.12 Semantic Wikis
7.13 Research Issues and Challenges for Future
7.14 Review Questions
References
8 Internet of Things (IOT) and Big Data Analytics
8.1 Introduction
8.2 Smart Cities and IOT
8.3 Stages of IOT and Stakeholders
8.4 Analytics
8.5 Access
8.6 Cost Reduction
8.7 Opportunities and Business Model
8.8 Content and Semantics
8.9 Data-Based Business Models Coming Out of IOT
8.10 Future of IOT
8.10.1 Technology Drivers
8.10.2 Future Possibilities
8.10.3 Challenges and Concerns
8.11 Big Data Analytics and IOT
8.12 Fog Computing
8.13 Research Trends
8.14 Conclusion
8.15 Review Questions
References
9 Big Data Analytics for Financial Services and Banking
9.1 Introduction
9.2 Customer Insights and Marketing Analysis
9.3 Sentiment Analysis for Consolidating Customer Feedback
9.4 Predictive Analytics for Capitalizing on Customer Insights
9.5 Model Building
9.6 Fraud Detection and Risk Management
9.7 Integration of Big Data Analytics into Operations
9.8 How Banks Can Benefit from Big Data Analytics?
9.9 Best Practices of Data Analytics in Banking for Crises Redressal and Management
9.10 Bottlenecks
9.11 Conclusion
9.12 Review Questions
References
10 Big Data Analytics Techniques in Capital Market Use Cases
10.1 Introduction
10.2 Capital Market Use Cases of Big Data Technologies [2, 3]
10.3 Prediction Algorithms
10.3.1 Stock Market Prediction [3–5, 7]
10.3.2 Efficient Market Hypothesis (EMH)
10.3.3 Random Walk Theory (RWT)
10.3.4 Trading Philosophies
10.3.5 Simulation Techniques
10.4 Research Experiments to Determine Threshold Time for Determining Predictability
10.5 Experimental Analysis Using Bag of Words and Support Vector Machine (SVM) Application to News Articles
10.6 Textual Representation and Analysis of News Articles
10.7 Named Entities
10.8 Object Knowledge Model (OKM) [8]
10.9 Application of Machine Learning Algorithms [7]
10.10 Sources of Data
10.11 Summary and Future Work
10.12 Conclusion
10.13 Review Questions
References
11 Big Data Analytics for Insurance
11.1 Introduction
11.2 The Insurance Business Scenario
11.3 Big Data Deployment in Insurance [4]
11.4 Insurance Use Cases [5]
11.5 Customer Needs Analysis
11.6 Other Applications
11.7 Conclusion
11.8 Review Questions
References
12 Big Data Analytics in Advertising
12.1 Introduction
12.2 What Role Can Big Data Analytics Play in Advertising?
12.3 BOTs
12.4 Predictive Analytics in Advertising
12.5 Big Data for Big Ideas
12.6 Innovation in Big Data—Netflix
12.7 Future Outlook
12.8 Conclusion
12.9 Review Questions
References
13 Big Data Analytics in Bio-informatics
13.1 Introduction
13.2 Characteristics of Problems in Bio-informatics
13.3 Cloud Computing in Bio-informatics
13.4 Types of Data in Bio-informatics
13.5 Big Data Analytics and Bio-informatics
13.6 Open Problems in Big Data Analytics in Bio-informatics [14]
13.7 Big Data Tools for Bio-informatics
13.8 Analysis on the Readiness of Machine Learning Techniques for Bio-informatics Application [14]
13.9 Conclusion
13.10 Questions and Answers
References
14 Big Data Analytics and Recommender Systems
15 Security in Big Data
15.1 Introduction
15.2 Ills of Social Networking—Identity Theft
15.3 Organizational Big Data Security
15.4 Security in Hadoop
15.5 Issues and Challenges in Big Data Security
15.6 Encryption for Security
15.7 Secure MapReduce and Log Management
15.8 Access Control, Differential Privacy and Third-Party Authentication
15.9 Real-Time Access Control
15.10 Security Best Practices for Non-relational or NoSQL Databases
15.11 Challenges, Issues and New Approaches Endpoint Input, Validation and Filtering
15.12 Research Overview and New Approaches for Security Issues in Big Data
15.13 Conclusion
15.14 Review Questions
References
16 Privacy and Big Data Analytics
16.1 Introduction
16.2 Privacy Protection [4]
16.3 Enterprise Big Data Privacy Policy and COBIT 5 [1]
16.4 Assurance and Governance
16.5 Conclusion
16.6 Review Questions
References
17 Emerging Research Trends and New Horizons
17.1 Introduction
17.2 Data Mining
17.3 Data Streams, Dynamic Network Analysis and Adversarial Learning
17.4 Algorithms for Big Data
17.5 Dynamic Data Streams
17.6 Dynamic Network Analysis
17.7 Outlier Detection in Time-Evolving Networks
17.8 Research Challenges
17.9 Literature Review of Research in Dynamic Networks
17.10 Dynamic Network Analysis
17.11 Sampling [8]
17.12 Validation Metrics [9]
17.13 Change Detection [10]
17.14 Labeled Graphs [11]
17.15 Event Mining [12]
17.16 Evolutionary Clustering
17.17 Block Modeling [14]
17.18 Surveys on Dynamic Networks
17.19 Adversarial Learning—Secure Machine Learning [4–7, 15, 16]
17.20 Conclusion and Future Emerging Direction
17.21 Review Questions
References
Case Studies
Case Study 1: Google
PageRank
Case Study 2: General Electric (GE)
Case Study 3: Microsoft
Case Study 4: Nokia
Case Study 5: Facebook
Case Study 6: Opower
Case Study 7: Kaggle
Case Study 8: Deutsche Bank
Case Study 9: Health Sector Analytics
Case Study 10: Online Insurance
Case Study 11: Delta Airlines
Case Study 12: LinkedIn
Case Study 13: Traffic Management
Solutions Provided
Technical Features
Business Value Outcomes
Case Study 14: Cisco
Case Study 15: JPMorgan Chase
Appendices
Appendix A: Statistics
Population
Measures of Central Tendency
Arithmetic Mean
Median
Mode
Geometric Mean
Harmonic Mean
Measures of Dispersion
Range
Coefficient of Range
The Interquartile Range of the Quartile Deviation
The Mean Deviation or Average Deviation
Standard Deviation
Deviation Taken from Actual Mean
Deviation Taken from Assumed Mean
Variance
Coefficient of Variation
Correlation
Types of Correlation
Positive or Negative Correlation
Simple, Partial and Multiple Correlations
Linear and Nonlinear Correlation
Methods of Studying Correlation
Scatter Diagram Method
Graphic Method
Karl Pearson Coefficient of Correlation
Rank Correlation
Regression
Types of Variables
Categorical Variables
Numerical Variables
Linear Regression Model (LRM)
χ2 Test (Chi-Square Test)
Procedure to Determine χ2
Chi-Square Distribution Curve
Alternative Method of Applying the Value of χ
Conditions for Applying χ2 Test
Use of χ2 Test
Estimations
Types of Estimations
Point Estimator
Interval Estimates
Statistical Inference
Hypothesis Testing
Estimations
Bayesian Estimation
The Gaussian or Normal Distribution
Appendix B: Probability
R Language
Appendix D: R Scripts