Handbook of big data analytics applications in ICT

IET COMPUTING SERIES 37 Handbook of Big Data Analytics IET Book Series on Big Data – Call for Authors Editor-in-Chief: Professor Albert Y Zomaya, University of Sydney, Australia The topic of Big Data has emerged as a revolutionary theme that cuts across many technologies and application domains This new book series brings together topics within the myriad research activities in many areas that analyse, compute, store, manage and transport massive amounts of data, such as algorithm design, data mining and search, processor architectures, databases, infrastructure development, service and data discovery, networking and mobile computing, cloud computing, high-performance computing, privacy and security, storage and visualization Topics considered include (but not restricted to) IoT and Internet computing; cloud computing; peer-to-peer computing; autonomic computing; data centre computing; multi-core and many core computing; parallel, distributed and high-performance computing; scalable databases; mobile computing and sensor networking; Green computing; service computing; networking infrastructures; cyberinfrastructures; e-Science; smart cities; analytics and data mining; Big Data applications and more Proposals for coherently integrated International co-edited or co-authored handbooks and research monographs will be considered for this book series Each proposal will be reviewed by the Editor-in-Chief and some board members, with additional external reviews from independent reviewers Please email your book proposal for the IET Book Series on Big Data to Professor Albert Y Zomaya at albert.zomaya@sydney.edu.au or to the IET at author_support@theiet.org Handbook of Big Data Analytics Volume 2: Applications in ICT, security and business analytics Edited by Vadlamani Ravi and Aswani Kumar Cherukuri The Institution of Engineering and Technology Published by The Institution of Engineering and Technology, London, United Kingdom The Institution of Engineering and Technology is registered as a Charity in England & Wales (no 211014) and Scotland (no SC038698) † The Institution of Engineering and Technology 2021 First published 2021 This publication is copyright under the Berne Convention and the Universal Copyright Convention All rights reserved Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may be reproduced, stored or transmitted, in any form or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency Enquiries concerning reproduction outside those terms should be sent to the publisher at the undermentioned address: The Institution of Engineering and Technology Michael Faraday House Six Hills Way, Stevenage Herts, SG1 2AY, United Kingdom www.theiet.org While the authors and publisher believe that the information and guidance given in this work are correct, all parties must rely upon their own skill and judgement when making use of them Neither the authors nor publisher assumes any liability to anyone for any loss or damage caused by any error or omission in the work, whether such an error or omission is the result of negligence or any other cause Any and all such liability is disclaimed The moral rights of the authors to be identified as authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988 British Library Cataloguing in Publication Data A catalogue record for this product is available from the British Library ISBN 978-1-83953-059-3 (hardback Volume 2) ISBN 978-1-83953-060-9 (PDF Volume 2) ISBN 978-1-83953-064-7 (hardback Volume 1) ISBN 978-1-83953-058-6 (PDF Volume 1) ISBN 978-1-83953-061-6 (2 volume set) Typeset in India by MPS Limited Printed in the UK by CPI Group (UK) Ltd, Croydon Contents About the editors About the contributors Foreword Foreword Preface Acknowledgements Introduction Big data analytics for security intelligence Sumaiya Thaseen Ikram, Aswani Kumar Cherukuri, Gang Li and Xiao Liu 1.1 Introduction to big data analytics 1.2 Big data: huge potentials for information security 1.3 Big data challenges for cybersecurity 1.4 Related work on decision engine techniques 1.5 Big network anomaly detection 1.6 Big data for large-scale security monitoring 1.7 Mechanisms to prevent attacks 1.8 Big data analytics for intrusion detection system 1.8.1 Challenges of ADS 1.8.2 Components of ADS 1.9 Conclusion Acknowledgment Abbreviations References Zero attraction data selective adaptive filtering algorithm for big data applications Sivashanmugam Radhika and Arumugam Chandrasekar 2.1 2.2 2.3 Introduction System model Proposed data preprocessing framework 2.3.1 Proposed update rule 2.3.2 Selection of thresholds 2.3.3 Sparsity model xv xvii xxv xxvii xxix xxxi xxxiii 1 5 7 10 12 12 12 15 15 15 16 21 21 23 24 26 27 28 vi Handbook of big data analytics, volume 2.4 Simulations 2.5 Conclusions References 29 33 33 Secure routing in software defined networking and Internet of Things for big data Jayashree Pougajendy, Arun Raj Kumar Parthiban and Sarath Babu 37 3.1 3.2 3.3 3.4 Introduction Architecture of IoT Intersection of big data and IoT Big data analytics 3.4.1 Taxonomy of big data analytics 3.4.2 Architecture of IoT big data 3.5 Security and privacy challenges of big data 3.6 Routing protocols in IoT 3.7 Security challenges and existing solutions in IoT routing 3.7.1 Selective forwarding attacks 3.7.2 Sinkhole attacks 3.7.3 HELLO flood and acknowledgment spoofing attacks 3.7.4 Replay attacks 3.7.5 Wormhole attacks 3.7.6 Sybil attack 3.7.7 Denial-of-service (DoS) attacks 3.8 The arrival of SDN into big data and IoT 3.9 Architecture of SDN 3.10 Routing in SDN 3.11 Attacks on SDN and existing solutions 3.11.1 Conflicting flow rules 3.11.2 TCAM exhaustion 3.11.3 ARP poisoning 3.11.4 Information disclosure 3.11.5 Denial-of-service (DoS) attacks 3.11.6 Exploiting vulnerabilities in OpenFlow switches 3.11.7 Exploiting vulnerabilities in SDN controllers 3.12 Can SDN be applied to IoT? 3.13 Summary References 37 40 41 42 42 43 45 46 47 47 47 48 48 48 48 49 49 50 54 58 58 61 62 62 63 63 63 64 65 66 Efficient ciphertext-policy attribute-based signcryption for secure big data storage in cloud Praveen Kumar Premkamal, Syam Kumar Pasupuleti and Alphonse PJA 73 4.1 Introduction 4.1.1 Related work 4.1.2 Contributions 74 75 76 Contents 4.2 Preliminaries 4.2.1 Security model 4.3 System model 4.3.1 System architecture of ECP-ABSC 4.3.2 Formal definition of ECP-ABSC 4.3.3 Security goals 4.4 Construction of ECP-ABSC scheme 4.4.1 Setup 4.4.2 Key generation 4.4.3 Signcrypt 4.4.4 Designcrypt 4.5 Security analysis 4.6 Performance evaluation 4.7 Conclusion References Privacy-preserving techniques in big data Remya Krishnan Pacheeri and Arun Raj Kumar Parthiban 5.1 5.2 Introduction Big data privacy in data generation phase 5.2.1 Access restriction 5.2.2 Data falsification 5.3 Big data privacy in data storage phase 5.3.1 Attribute-based encryption 5.3.2 Identity-based encryption 5.3.3 Homomorphic encryption 5.3.4 Storage path encryption 5.3.5 Usage of hybrid clouds 5.4 Big data privacy in data processing phase 5.4.1 Protect data from unauthorized disclosure 5.4.2 Extract significant information without trampling privacy 5.5 Traditional privacy-preserving techniques and its scalability in big data 5.5.1 Data anonymization 5.5.2 Notice and consent 5.6 Recent privacy preserving techniques in big data 5.6.1 HybrEx 5.6.2 Differential privacy 5.6.3 Hiding a needle in a haystack: privacy-preserving a priori algorithm in MapReduce framework 5.7 Privacy-preserving solutions in resource constrained devices 5.8 Conclusion References vii 76 78 80 80 81 82 82 82 83 84 85 87 94 99 99 103 103 105 105 106 107 107 107 108 108 109 109 109 111 111 112 115 115 115 117 119 121 122 123 viii Handbook of big data analytics, volume Big data and behaviour analytics Amit Kumar Tyagi, Keesara Sravanthi and Gillala Rekha 127 6.1 Introduction about big data and behaviour analytics 6.2 Related work 6.3 Motivation 6.4 Importance and benefits of big data and behaviour analytics 6.4.1 Importance of big data analytics 6.5 Existing algorithms, tools available for data analytics and behaviour analytics 6.5.1 Apache Hadoop 6.5.2 Cloudera 6.5.3 Cassandra 6.5.4 Konstanz Information Miner 6.5.5 Data wrapper 6.5.6 MongoDB 6.5.7 HPCC 6.6 Open issues and challenges with big data analytics and behaviour analytics 6.6.1 Challenges with big data analytics 6.6.2 Issues with big data analytics (BDA) 6.7 Opportunities for future researchers 6.8 A taxonomy for analytics and its related terms 6.9 Summary Appendix A References 128 130 133 133 133 Analyzing events for traffic prediction on IoT data streams in a smart city scenario Chittaranjan Hota and Sanket Mishra 7.1 7.2 7.3 Introduction Related works Research preliminaries 7.3.1 Dataset description 7.3.2 Data ingestion 7.3.3 Complex event processing 7.3.4 Clustering approaches 7.3.5 OpenWhisk 7.3.6 Evaluation metrics 7.4 Proposed methodology 7.4.1 Statistical approach to optimize the number of retrainings 7.5 Experimental results and discussion 7.6 Conclusion Acknowledgment References 134 135 135 135 135 135 136 136 136 136 137 138 139 139 140 142 145 146 148 148 148 150 151 151 152 153 155 159 160 164 165 165 Contents Gender-based classification on e-commerce big data Chaitanya Kanchibhotla, Venkata Lakshmi Narayana Somayajulu Durvasula and Radha Krishna Pisipati 8.1 Introduction 8.1.1 e-Commerce and big data 8.2 Gender prediction methodology 8.2.1 Gender prediction based on gender value 8.2.2 Classification using random forest 8.2.3 Classification using gradient-boosted trees (GBTs) 8.2.4 Experimental results with state-of-the-art classifiers 8.3 Summary References On recommender systems with big data Lakshmikanth Paleti, P Radha Krishna and J.V.R Murthy 9.1 Introduction 9.1.1 Big data and recommender systems 9.2 Recommender systems challenges 9.2.1 Big-data-specific challenges in RS 9.3 Techniques and approaches for recommender systems 9.3.1 Early recommender systems 9.3.2 Big-data recommender systems 9.3.3 X-aware recommender systems 9.4 Leveraging big data analytics on recommender systems 9.4.1 Healthcare 9.4.2 Education 9.4.3 Banking and finance 9.4.4 Manufacturing 9.5 Evaluation metrics 9.6 Popular datasets for recommender systems 9.7 Conclusion References ix 169 170 171 174 174 185 188 190 194 195 197 198 199 200 202 204 205 212 217 218 218 219 219 220 220 221 223 223 10 Analytics in e-commerce at scale Vaidyanathan Subramanian and Arya Ketan 229 10.1 Background 10.2 Analytics use cases 10.2.1 Business and system metrics 10.2.2 Data sciences 10.3 Data landscape 10.3.1 Data producers 10.3.2 Data consumers 10.3.3 Data freshness 10.3.4 Data governance 229 230 230 232 232 233 233 234 234 Overall conclusions Vadlamani Ravi and Aswani Kumar Cherukuri This volume covers applications and parallel architectures of machine learning Cyber security, e-commerce and finance are considered representative domains, where big data algorithms were proposed Parallel and distributed version of a few neural network architecture and clustering algorithms are proposed The list of application domains covered in this volume is by no means exhaustive Huge potential for developing novel, parallel and distributed machine learning algorithms still exists in fields such as healthcare, bioinformatics, medicine, cyber fraud detection, cyber security, financial services fraud detection, supply chain management, physics, chemistry, agriculture One of the future directions would include parallelizing hybrids of second, third and fourth generation of neural networks to solve complex problems For instance, multi-layer perceptron, spiking neural network and long short-term memory network could be hybridized and parallelized Index AB algorithm 288 accuracy 279, 296 ACTUS 346–7 mathematics of contract terms, contract algorithms and cash flow streams 348–50 description of cash flow streams 350–1 standard analytics as linear operators 351–4 proof of concept with a bond portfolio 354–9 AdaBoost (AB) 286 adaptive clustering 156 adaptive thresholding technique 146–7 advertisement blockers 106 agglomerative clustering 318 aggregation over contracts 354 AI/ML use-cases 232 algorithmic decision-making 137 Amazon 197, 199 Amazon movie review (AMR) dataset 246–7 analytical customer relationship management (ACRM) 251 anomaly detection system (ADS) challenges of 12 components of 12–14 anti-tracking extensions 105–6 Apache Hadoop 135, 219 Apache Jena 239 Apache Kafka 151, 155 Apache Spark 346 big data regression via PRBFNN in 241–9 ARP poisoning 62 artificial intelligence (AI) 286 and machine learning 138 artificial neural networks (ANNs) 273, 286 ASHE 11 association rules 131 attribute-based encryption (ABE) 74, 107 attribute-based signature (ABS) 74 attribute-based signcryption (ABSC) 74 AUC score 296 authentication of IoT devices 121–2 autoregressive (AR) methods 287 autoregressive–moving average (ARMA) models 287 awareness 138 banking sector, recommendation systems in 219 Basel Committee on Banking Supervision 343 batch data processing 237–8 Big Billion Day 234 big data, properties of value 104 variety 104 velocity 104 veracity 104 volume 103 big data analytics (BDA) architecture of 43–5 big network anomaly detection challenges for cybersecurity challenges with 136–7 374 Handbook of big data analytics, volume features of huge potentials for information security 2–5 for intrusion detection system 12 challenges of ADS 12 components of ADS 12–14 issues with 137–8 for large-scale security monitoring 7–10 mechanisms to prevent attacks 10–12 related work on decision engine (DE) techniques 5–7 for security intelligence taxonomy of 42–3 Big data and behaviour analytics 127 existing algorithms, tools available for data analytics and behaviour analytics 134 Apache Hadoop 135 Cassandra 135 Cloudera 135 data wrapper 135–6 HPCC (high performance computing cluster) 136 Konstanz Information Miner (KNIME) 135 MongoDB 136 importance of big data analytics 133–4 motivation 133 open issues and challenges with 136 challenges with big data analytics 136–7 issues with big data analytics (BDA) 137–8 opportunities for future researchers 138–9 taxonomy for analytics and its related terms 139 Big Data architecture designs 360–1 materialized architecture 360 mixed on-the-fly architecture 360 UDF-only on-the-fly architecture 360 big data privacy in data generation phase 105 access restriction 105 advertisement and script blockers 106 anti-tracking extensions 105–6 encryption tools 106 data falsification 106 fake identity 106 identity mask 106–7 sockpuppets 106 big data privacy in data processing phase 109 extract significant information without trampling privacy 111 protect data from unauthorized disclosure 109–10 big data privacy in data storage phase 107 attribute-based encryption (ABE) 107 homomorphic encryption 108 identity-based encryption 107–8 storage path encryption 108–9 usage of hybrid clouds 109 big-data recommender systems 212–17 big data regression via parallelized radial basis function neural network 241 contribution 242 dataset description 246–7 experimental setup 246 future directions 248–9 literature review 242–3 motivation 242 proposed methodology 243 K-means ỵỵ 243 K-means|| 243 parallel bisecting K-means 244 PRBFNN 244–6 results and discussion 248 bilinear pairing 77 bisecting K-means 244 Book-Crossing dataset 221 Index botnets 45 boundary detection 157 Breast Cancer Wisconsin dataset 258 business intelligence (BI) analytics 43 business metrics 230–2 Calinski–Harabasz index 154–5 Cassandra 135 CCFraud dataset 280–1 CF recommenders 204 ciphertext-policy attribute-based signcryption (CP-ABSC) 73 construction of 82 designcrypt 85–7 key generation 83–4 setup 82–3 signcrypt 84–5 performance evaluation 94–9 for secure big data storage in cloud 73 security analysis 87–94 security model 78 Game I 78–9 Game II 79 Game III 79–80 system model 80 formal definition of ECP-ABSC 81–2 security goals 82 system architecture of ECPABSC 80–1 CLARANS algorithm 325 class balancing 191–2 Cloudera 135 cloud server (CS) 81 clustering 6, 313 taxonomy 316 Clustering Large Applications based upon Randomized Search (CLARANS) 320 clustering using representatives (CURE) 325 CNX Nifty 288 CoAP (Constrained Application Protocol) 47 375 cold-start problem 202–3 collaborative filtering (CF) based recommender systems 198, 205–9 community-driven RSs 216–17 complete cold (CC) start 207 complex event processing (CEP) 145–7, 151 Compute Unified Device Architecture (CUDA) 255–6, 260, 313 CUDA-SI-eSNN model 285 conflicting flow rules 58–61 conformal matrix factorization 206 content-based filtering RSs 198, 205, 209–10 context-aware recommendation systems 217 context-based recommendations 204 contract-driven financial reporting 343 ACTUS methodology analytical output 347 financial contracts 346 input elements 346 mathematics of ACTUS 348–54 proof of concept with a bond portfolio 354–9 raw results 347 risk factors 346 future automated reporting 364–7 scalable financial analytics 359–64 control data-plane interface (CDPI) driver(s) 52 Cosine similarity 208, 258 Credit Card Fraud Detection dataset 281 cryptography 111 CUDASOM (Compute Unified Device Architecture (CUDA)-based self-organizing function Map) algorithm 251, 257–8 performance of 265–8 segmentation of customer complaints using 260–5 cultural challenge 137 376 Handbook of big data analytics, volume customer complaints, segmentation of 258 customer feedback analysis 252 customer relationship management (CRM) 251 cyberattacks on big data 45 botnets 45 denial-of-service (DoS) 45 malware 46 phishing 46 search poisoning 45 spamming 45 cybersecurity, challenges for D3.js 132 Data Analytics Supercomputer 136 data anonymization 122 data authentication 82 data confidentiality 82 data consumers 233 data drift 152 data forgetting 122 data freshness 234 data governance 234, 239 data growth 137 data ingestion 235–6 data landscape 232–4 data owner (DD) 81 data preprocessing 22, 236–7 data producers 233 data profiling 131 data quality 137 data sciences 232 datasets for recommender systems 221–3 data siloes 137 data sparsity 200, 205 data summarization 122 data trading 137 data users (DU) 81 data wrapper 135–6 DBMS (Database Management System) 130 Decisional Bilinear Diffie–Hellman (DBDH) assumption 77 decision engine (DE) approaches decision tree 131 deep dynamic transaction document (DTD) 365–6 deep learning 215 deep learning networks 288 Default Credit Card dataset 281 denial-of-service (DoS) 45 attacks 49, 63 density-based clustering algorithms 321 differential privacy 117–19 disease–diagnosis (D–D) and treatment recommendation system (DDTRS) 218–19 driver program 173 Dunn index 244–5 EAGLE EAP-TLS (Extensible Authentication Protocol-Transport Layer Security) 49 eBay 197 e-commerce and big data 171 e-commerce scale, analytics at architecture 234 batch data processing 237–8 data governance 239 data ingestion 235–6 data preprocessing 236–7 query platform 239 report visualization 238–9 streaming processing 238 background 229–30 business and system metrics 230–2 data landscape 232 data consumers 233 data freshness 234 data governance 234 data producers 233 data sciences 232 edge computing and plug-in architectures 122 educational sector, recommendation systems in 219 Index efficient ciphertext-policy attributebased signcryption (ECP-ABSC) 73 construction of 82 designcrypt 85–6 key generation 83–4 setup 82–3 signcrypt 84–5 formal definition of 81–2 security goals 82 system architecture of 80–1 e-marketing sites 197 encoding 192–4 encryption tools 106 ethical governance 137 Euclidean distance 206 evaluation metrics 220–1 evolutionary clustering 322 evolutionary optimized neural networks (NNs) 287 evolving spiking neural networks (eSNNs) stock market movement prediction using: see stock market movement prediction using eSNNs Facebook 199 Facebook datasets 215 Feng’s algorithm 320 5Vs 199, 202, 273 Flake’s algorithm 326 Flash sales 234 Flipkart 229–30 Flipkart Data Platform (FDP) 234–9 Flot 132 ForCES 53–4 Forecasting 131 FortNOX 61 Gaussian receptive fields 305 gender-based classification on e-commerce big data 169 e-commerce and big data 171 gender prediction methodology 174 377 classification using gradientboosted trees 188–90 classification using random forest 185–6 experimental results with state-ofthe-art classifiers 190 gender prediction based on gender value 174 gender prediction methodology 174 classification using gradient-boosted trees 188–90 classification using random forest 185–6 advanced features 187 basic features 186–7 classification 187 experimental results with state-ofthe-art classifiers 190 class balancing 191–2 encoding 192–4 gender prediction based on gender value 174 data preprocessing 174–5 experimental results 177–85 feature-based on product ID 176 feature extraction 175 feature extraction based on category level 175–6 feature extraction based on product category 175 gender prediction 176 operations using product category feature 176–7 genetic algorithm (GA) 288 getDistances kernel 257–8 gradient-boosted trees (GBTs) 169, 188–90 graph-based clustering (GBC) algorithms 318–19 graphics processing units (GPUs) 255, 274, 276 gray sheep 202–3 grid-based clustering 322 Hadoop 197, 213, 215, 313, 346 378 Handbook of big data analytics, volume harm-aware recommendation systems 218 health-care systems 218–19 HELLO flood and acknowledgment spoofing attacks 48 hierarchical document clustering 323 hierarchical latent Dirichlet allocation (HLDA) 334 hierarchical MMMF 206 HPCC (high performance computing cluster) 136 HybrEx 115–17 hybrid recommendation systems 211–12 ICICI bank dataset 254 IETF (Internet Engineering Task Force) 47 IMSs (Information Management Systems) 130 INC-eSNN 308 income 354 incremental SVD-based matrix factorization 206 information disclosure 62 intelligent transportation systems (ITS) 146, 148 Internet of Things (IoT), for big data 37 application of SDN to IoT 64–5 architecture of 40–1 big data analytics architecture of 43–5 taxonomy of 42–3 intersection of big data and 41–2 routing protocols in 46–7 security and privacy challenges of big data 45–6 security challenges and existing solutions in IoT routing 47 denial-of-service (DoS) attacks 49 HELLO flood and acknowledgment spoofing attacks 48 replay attacks 48 selective forwarding attacks 47 sinkhole attacks 47–8 sybil attack 48 wormhole attacks 48–9 software defined networking (SDN) 49 architecture of 50–4 attacks on SDN and existing solutions 58 routing in 54–8 intrusion detection big data approaches for 8–9 using decision management techniques intrusion detection system (IDS) IoT data encryption 122 item-based CF 208–9 Jarvis–Patrick algorithm 324 Jester dataset 223 Kernel Factory (KF) 286 key-policy ABE (KP-ABE) 74 key-policy ABSC (KP-ABSC) 74 K-means 152, 242, 31920 K-means ỵỵ 243 K-means|| 243 kNN algorithm 206–7 knowledge-based recommendation systems 212 knowledge management system (KMS) 216 Konstanz Information Miner (KNIME) 135 Korea composite stock price index (KOSPI) 287 latent Dirichlet allocation (LDA) 334 latent factor model (LFM) 205 latent model-based clustering 323 LEACH (low-energy adaptive clustering hierarchy) 47 link-discovery-based recommendation systems 217 Index Link Layer Discovery Protocol (LLDP) 54 liquidity 352–3 Lloyd algorithm 152 local area network (LAN) 323 local clustering 317 local linear WNN (LLWNN) 276 long short-term memory (LSTM) network 285 Louvain algorithm 336 machine learning techniques 273 Mahout 215, 219 malware 46 manufacturing industry, recommendation systems in 220 MapHybrid 116 mapping 131 MapReduce (MR) model of programming 323 Market segmentation 138 massive intelligence 43 master–slave-based parallel algorithm 325 matrix factorization 205–6 matrix factorization model for recommendation 206–7 maximum margin matrix factorization (MMMF) 206 mean absolute error (MAE) 220 mean reciprocal rank (MRR) 220 memory-based CF approaches 205, 207–9 memory-level analytics 43 Message Passing Interface (MPI) 313 Million Song Dataset 213, 221 minimum spanning tree (MST) 317–18 model-based CF approaches 205–7 modularity 332–3 MongoDB 136 monotone span program 78 MovieLens 215, 221 moving average (MA) models 287 379 National Association of Securities Dealers Automated Quotations (NASDAQ) 288 Nerstrand 319 Netflix 199, 221 net present value (NPV) 353 Network of Workstations (NoWs) 317 NeuCube 286, 288–9, 305 neural encoding 291 NLTK 257 nonnegative matrix factorization (NMF) 206 nonsensitive attribute 112 NVIDIA Tesla K-40 309 Object-Oriented Database Management Systems (OODBMSs) 130 Object-Relational Database Management Systems (ORDBMSs) 130 offline analytics 43 OpenFlow 53–4 OpenFlow switches, exploiting vulnerabilities in 63 OpenMP 318 Open Multi-Processing (OpenMP) 313 open source 138 OpenWhisk 152–3 opinion summarization 252 order-management system 233 overspecialization 202 parallel bisecting K-means 244 parallel clustering on SIMD/MIMD machines 320–1 parallel document clustering algorithms 323–5 parallel evolving clustering method (PECM) 249 parallel hierarchical clustering of big text corpora 313 agglomerative clustering 318 density-based clustering algorithms 321 380 Handbook of big data analytics, volume evolutionary clustering 322 graph-based clustering (GBC) algorithms 318–19 grid-based clustering 322 latent model-based clustering 323 open research challenges 337–8 parallel clustering on SIMD/MIMD machines 320–1 parallel document clustering algorithms 323–5 parallel hierarchical algorithms for big text clustering 325 parallel hierarchical cut clustering 326–8 parallel hierarchical latent Dirichlet allocation (PHLDA) 334–5 parallel hierarchical latent semantic analysis (PHLSA) algorithm 328–31 parallel hierarchical modularitybased spectral clustering 331–4 PHCUT vs PHLSA vs PHMS vs PHLDA 335–7 research challenges addressed 337 partitional clustering algorithms 319–20 spectral clustering 322 transform-based clustering 321–2 parallel hierarchical cut clustering (PHCUT) 326 parallelized radial basis function neural network (PRBFNN): see big data regression via parallelized radial basis function neural network parallel virtual machine (PVM) 320 particle swarm optimization (PSO) 275 partitional clustering algorithms 319–20 Pearson correlation coefficient (PCC) 208 personalized recommendations 204 PHCUT algorithm 327 phishing 46 Polish Bank dataset 281 Polymaps 132 postsynaptic potential (PSP) 292 power iteration clustering (PIC) algorithm 320 precision (prec) 220 prescriptive analytics 138–9 privacy 138 privacy-preserving a priori algorithm in MapReduce framework 119–20 privacy-preserving association rule mining 111 privacy preserving clustering 111 privacy-preserving solutions in resource constrained devices 121–2 privacy-preserving techniques in big data 103 big data privacy in data generation phase 105 access restriction 105–6 data falsification 106–7 big data privacy in data processing phase 109 extract significant information without trampling privacy 111 protect data from unauthorized disclosure 109–10 big data privacy in data storage phase 107 attribute-based encryption (ABE) 107 homomorphic encryption 108 identity-based encryption 107–8 storage path encryption 108–9 usage of hybrid clouds 109 privacy-preserving solutions in resource constrained devices 121–2 recent privacy preserving techniques in big data 115 Index differential privacy 117–19 HybrEx 115–17 privacy-preserving a priori algorithm in MapReduce framework 119–20 traditional privacy-preserving techniques and its scalability 111–12 data anonymization 112 notice and consent 115 probabilistic neural networks (PNNs) 287 probabilistic SVM (PSVM) 288 probability matrix factorization (PMF) 206 profiling 121 proximal MMMF 206 q-Computational Diffie–Hellman (q-CDH) problem 77 quality-aware big-data-based movie recommendation system 212 quasi-identifier (QID) 112 query platform 239 RACHET algorithm 324 radial basis function network (RBFN) 274 random forest (RF) 286 rating matrix 205 ratings, predicting 208 raw data 24 RDF 239 real-time analytics 43 Real-Time Bidding dataset 281 recall (rec) 220 received signal strength indicator (RSSI) 48 recommender systems (RSs) 197 big data and 199–200 challenges cold start 202 data sparsity 200 gray sheep 202 381 independent and identically distributed 202 overspecialization 202 scalability 202 serendipity 202 shilling attacks 202 components 198–9 evaluation metrics 220–1 influence of big data characteristics on 203 leveraging big data analytics on 218 banking and finance 219 education 219 healthcare 218–19 manufacturing 220 methodologies and advantages of 201 popular datasets for 221–3 taxonomy of 204 techniques and approaches for 204 big-data recommender systems 212–17 early recommender systems 205–12 X-aware recommender systems 217–18 recurrent neural networks (RNNs) 287 reduceMin kernel 258 regression 241 RegTech reporting 365 regularized matrix factorization 206 reliability-based trust-aware collaborative filtering (RTCF) 206 replay attacks 48 report visualization 238–9 reputation 137 risk-aware recommendation systems 217–18 root-mean-squared error (RMSE) 220 routing in SDN 54–8 routing protocols in IoT 46–7 RPL (IPv6 routing protocols) 47 S&P Bombay Stock Exchange (BSE) Sensex 288 382 Handbook of big data analytics, volume Salton factor 211 SAS Visual Analytics 132 scalable recommender systems 213–15 script blockers 106 Seabed search poisoning 45 security analysis data confidentiality 87–90 signcryptor privacy 90–4 security and privacy challenges of big data 45–6 security challenges and existing solutions in IoT routing 47 denial-of-service (DoS) attacks 49 HELLO flood and acknowledgment spoofing attacks 48 replay attacks 48 selective forwarding attacks 47 sinkhole attacks 47–8 sybil attack 48 wormhole attacks 48–9 selective forwarding attacks 47 self-organizing feature maps (SOM) 252, 254–5, 258 sensitive attribute 112 sensitivity 296 sentiments classification 252 serendipity 202 service-level agreement (SLA) 49 service-oriented architecture (SOA) 230 Seshadri’s algorithm 326 Shenzhen Stock Exchange (SZSE) 288 shilling attacks 202–3 signature-policy ABS (SP-ABS) 74 signcryptor privacy 82 silhouette coefficient 154 singular value decomposition (SVD) 206, 323 sinkhole attacks 47–8 sliding window (SW)-eSNN for incremental learning and stock movement prediction 297, 304–5 sliding window (SW)-eSSN 285 social media platforms 216 social rating networks (SocialMF) 207 social recommendation systems (SRSs) 200, 215–17 software defined networking (SDN) 37, 49–50 architecture of 50–4 routing in 54–8 SDN and existing solutions, attacks on 58 ARP poisoning 62 conflicting flow rules 58–61 denial-of-service (DoS) attacks 63 exploiting vulnerabilities in OpenFlow switches 63 exploiting vulnerabilities in SDN controllers 63 information disclosure 62 TCAM exhaustion 61–2 spamming 45 Spark 197 Spark MLlib library 213 Spark-SQL 360 Spark-UDF 363 SPARQL 239 sparse data 203 sparsity 28 specificity 295 spectral clustering 322 SPHINX 61 SPIN (sensor protocols for information via negotiation) 47 Splayed Additively Symmetric Homomorphic Encryption (SPLASHE) 11 state-of-the-art classifiers, experimental results with 190 class balancing 191–2 encoding 192–4 stock indicator (SI)-eSSN 285 stock indicators (SIs) 285 stock market movement prediction using eSNNs 285 Index dataset description and experiments with the SI-eSNN and the CUDA-eSNN models 295–304 future directions 308–9 Gaussian receptive fields influence 305–7 literature review 287–8 motivation 288–9 proposed CUDA-eSNN model 294–5 proposed SI-eSNN model for stock trend prediction algorithm for eSNN training 293–4 learning in the output neurons 292–3 neural encoding 291–2 neural model 292 overall architecture 289–91 testing (recall of the model on new data) 294 sliding window (SW)-eSNN for incremental learning and stock movement prediction 297, 304–5 streaming metrics 234 streaming processing 238 support vector machine (SVM) 6, 286 Suricata network intrusion detection system (NIDS) 10 sybil attack 48 system metrics 230–2 TCAM exhaustion 61–2 Telegraf 150 term frequency (TF)-based documentterm matrix 257 text-clustering algorithms 314 Theano 275 third-party auditor (TPA) 81 threshold autoregressive models 287 traditional privacy-preserving techniques and its scalability 111–12 data anonymization 112 383 k-anonymity 112–13 L-diversity 113–14 T-closeness 115 notice and consent 115 traffic prediction on IoT data streams in smart city scenario experimental results and discussion 160–4 proposed methodology 155 statistical approach to optimize number of retrainings 159–60 research preliminaries 148 clustering approaches 151–2 complex event processing (CEP) 151 data ingestion 150–1 dataset description 148–50 evaluation metrics 153–4 OpenWhisk 152–3 in smart city scenario 145 trained wavelet neural network (TAWNN) 276 transform-based clustering 321–2 trust 138 trust-based recommendation systems 216 trusted authority (TA) 80 Twitter 200 unforgeability 82 unstructured data 137 updateWeights kernel 258 user-based CF 207–8 user-defined functions (UDFs) 360 user identification 121 user tracking 121 utility monitoring 121 value 129, 199, 353–4 variability 129 variety 129, 199 velocity 129, 199 Venn diagram 131 veracity 129, 199 virtual drift 152 384 Handbook of big data analytics, volume visual sentiment analysis of bank customer complaints 251 Compute Unified Device Architecture 255–6 contribution 254 experimental setup 258 CUDA setup 260 dataset details 259 preprocessing steps 259–60 future directions 268 literature survey 254–5 motivation 253–4 proposed approach 256 implementation of CUDASOM 257–8 segmentation of customer complaints using SOM 258 text preprocessing 257 results and discussion performance of CUDASOM 265–8 segmentation of customer complaints using CUDASOM 260–5 self-organizing feature maps 255 volume 129, 199 wavelet neural network (WNN) for big data analytics in banking 273 experimental setup 278 datasets description 278–9 experimental procedure 279 future work 282 literature review 274–6 proposed methodology 277–8 results and discussion 279–82 techniques employed 277 WaveNet 277 web-scale datasets 314 weighted low-rank approximation 206 WordNet 325 wormhole attacks 48–9 X-aware recommender systems 217–18 YOW dataset 223 zero attraction data selective adaptive filtering algorithm for big data applications 21 proposed data preprocessing framework 24–6 proposed update rule 26–7 selection of thresholds 27–8 sparsity model 28–9 simulations 29–33 system model 23–4 zero attraction data selective least mean square (ZA-DS-LMS) algorithm 21 ... 37 3.1 3.2 3.3 3.4 Introduction Architecture of IoT Intersection of big data and IoT Big data analytics 3.4.1 Taxonomy of big data analytics 3.4.2 Architecture of IoT big data 3.5 Security and... 26 27 28 vi Handbook of big data analytics, volume 2.4 Simulations 2.5 Conclusions References 29 33 33 Secure routing in software defined networking and Internet of Things for big data Jayashree... and Engineering His research interests are social network mining, Big Data and data mining Arun Raj Kumar Parthiban is working as a faculty in the Department of Computer Science and Engineering

Tiêu đề	Handbook of Big Data Analytics Volume 2: Applications in ICT, Security and Business Analytics
Tác giả	Vadlamani Ravi, Aswani Kumar Cherukuri
Người hướng dẫn	Professor Albert Y. Zomaya
Trường học	University of Sydney
Thể loại	book
Năm xuất bản	2021
Thành phố	London

Định dạng
Số trang	419
Dung lượng	14,24 MB