Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 372 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
372
Dung lượng
2,33 MB
Nội dung
Table of Contents Cover Additional praise for Taming the Big Data Tidal Wave Wiley & SAS Business Series Title page Copyright page Dedication Foreword Preface Acknowledgments PART ONE: The Rise of Big Data CHAPTER 1: What Is Big Data and Why Does It Matter? WHAT IS BIG DATA? IS THE “BIG” PART OR THE “DATA” PART MORE IMPORTANT? HOW IS BIG DATA DIFFERENT? HOW IS BIG DATA MORE OF THE SAME? RISKS OF BIG DATA WHY YOU NEED TO TAME BIG DATA THE STRUCTURE OF BIG DATA EXPLORING BIG DATA MOST BIG DATA DOESN’T MATTER FILTERING BIG DATA EFFECTIVELY MIXING BIG DATA WITH TRADITIONAL DATA THE NEED FOR STANDARDS TODAY’S BIG DATA IS NOT TOMORROW’S BIG DATA WRAP-UP CHAPTER 2: Web Data: The Original Big Data WEB DATA OVERVIEW WHAT WEB DATA REVEALS WEB DATA IN ACTION WRAP-UP CHAPTER 3: A Cross-Section of Big Data Sources and the Value They Hold AUTO INSURANCE: THE VALUE OF TELEMATICS DATA MULTIPLE INDUSTRIES: THE VALUE OF TEXT DATA MULTIPLE INDUSTRIES: THE VALUE OF TIME AND LOCATION DATA RETAIL AND MANUFACTURING: THE VALUE OF RADIO FREQUENCY IDENTIFICATION DATA UTILITIES: THE VALUE OF SMART-GRID DATA GAMING: THE VALUE OF CASINO CHIP TRACKING DATA INDUSTRIAL ENGINES AND EQUIPMENT: THE VALUE OF SENSOR DATA VIDEO GAMES: THE VALUE OF TELEMETRY DATA TELECOMMUNICATIONS AND OTHER INDUSTRIES: THE VALUE OF SOCIAL NETWORK DATA WRAP-UP PART TWO: Taming Big Data: The Technologies, Processes, and Methods CHAPTER 4: The Evolution of Analytic Scalability A HISTORY OF SCALABILITY THE CONVERGENCE OF THE ANALYTIC AND DATA ENVIRONMENTS MASSIVELY PARALLEL PROCESSING SYSTEMS CLOUD COMPUTING GRID COMPUTING MAPREDUCE IT ISN’T AN EITHER/OR CHOICE! WRAP-UP CHAPTER 5: The Evolution of Analytic Processes THE ANALYTIC SANDBOX WHAT IS AN ANALYTIC DATA SET? ENTERPRISE ANALYTIC DATA SETS EMBEDDED SCORING WRAP-UP CHAPTER 6: The Evolution of Analytic Tools and Methods THE EVOLUTION OF ANALYTIC METHODS THE EVOLUTION OF ANALYTIC TOOLS WRAP-UP PART THREE: Taming Big Data: The People and Approaches CHAPTER 7: What Makes a Great Analysis? ANALYSIS VERSUS REPORTING ANALYSIS: MAKE IT G.R.E.A.T.! CORE ANALYTICS VERSUS ADVANCED ANALYTICS LISTEN TO YOUR ANALYSIS FRAMING THE PROBLEM CORRECTLY STATISTICAL SIGNIFICANCE VERSUS BUSINESS IMPORTANCE SAMPLES VERSUS POPULATIONS MAKING INFERENCES VERSUS COMPUTING STATISTICS WRAP-UP CHAPTER 8: What Makes a Great Analytic Professional? WHO IS THE ANALYTIC PROFESSIONAL? THE COMMON MISCONCEPTIONS ABOUT ANALYTIC PROFESSIONALS EVERY GREAT ANALYTIC PROFESSIONAL IS AN EXCEPTION THE OFTEN UNDERRATED TRAITS OF A GREAT ANALYTIC PROFESSIONAL IS ANALYTICS CERTIFICATION NEEDED, OR IS IT NOISE? WRAP-UP CHAPTER 9: What Makes a Great Analytics Team? ALL INDUSTRIES ARE NOT CREATED EQUAL JUST GET STARTED! THERE’S A TALENT CRUNCH OUT THERE TEAM STRUCTURES KEEPING A GREAT TEAM’S SKILLS UP WHO SHOULD BE DOING ADVANCED ANALYTICS? WHY CAN’T IT AND ANALYTIC PROFESSIONALS GET ALONG? WRAP-UP PART FOUR: Bringing It Together: The Analytics Culture CHAPTER 10: Enabling Analytic Innovation BUSINESSES NEED MORE INNOVATION TRADITIONAL APPROACHES HAMPER INNOVATION DEFINING ANALYTIC INNOVATION ITERATIVE APPROACHES TO ANALYTIC INNOVATION CONSIDER A CHANGE IN PERSPECTIVE ARE YOU READY FOR AN ANALYTIC INNOVATION CENTER? WRAP-UP CHAPTER 11: Creating a Culture of Innovation and Discovery SETTING THE STAGE OVERVIEW OF THE KEY PRINCIPLES WRAP-UP Conclusion: Think Bigger! About the Author Index Additional praise for Taming the Big Data Tidal Wave This book is targeted for the business managers who wish to leverage the opportunities that big data can bring to their business It is written in an easy flowing manner that motivates and mentors the nontechnical person about the complex issues surrounding big data Bill Franks continually focuses on the key success factor … How can companies improve their business through analytics that probe this big data? If the tidal wave of big data is about to crash upon your business, then I would recommend this book —Richard Hackathorn, President, Bolder Technology, Inc Most big data initiatives have grown both organically and rapidly Under such conditions, it is easy to miss the big picture This book takes a step back to show how all the pieces fit together, addressing varying facets from technology to analysis to organization Bill approaches big data with a wonderful sense of practicality—”just get started” and “deliver value as you go” are phrases that characterize the ethos of successful big data organizations —Eric Colson, Vice President of Data Science and Engineering, Netflix Bill Franks is a straight-talking industry insider who has written an invaluable guide for those who would first understand and then master the opportunities of big data —Thornton May, Futurist and Executive Director, The IT Leadership Academy identification of impact of qualification of regulation of risks of standards for structure of traditional data and use of value of volume versus velocity and complexity of Big data sources casino chip tracking radio frequency identification data (RFID) sensor data smart grid data social network data telematics data telemetry data text data time and location data web data Black box, telematics data from Business analytic professional understanding of analytic teams used in data analysis, importance of innovation, need for value of analytic professionals C Capacity planning, sandbox used for Casino chip tracking Central processing units (CPU), MPP systems and Centralized structures “Cherry picking” of analysis findings Clean data, analytic professional and Clickstream data Cloud computing criteria for environment National Institute of Standards and Technology (NIST) characteristics private clouds public clouds sandbox environment compared to scalability and analysis using Collaborative filtering Commodity models Communication skills advertising and analytic professionals use of delivery, importance of presentation skills and results, success of analysis and Core analytics, advanced analytics compared to Customer behavior behavior types faceless customer data feedback behavior knowledge, use of privacy and purchase paths and preferences research behavior shopping behavior transaction types (location flags) web data Customer segmentation D Data preparation and scoring embedded processes massively parallel processing (MPP) and predictive modeling markup language (PMML) and structured query language (SQL) and user-defined functions Data scientists See also Analytic professionals Data size, measurement of Data storage, MPP systems and Data visualization Decentralized/functional structures Development analytic data set (ADS) Discovery, see Innovation Diversification, analytic innovation and E Embedded scoring access of analytic data set (ADS) inputs batch updates integration of massively parallel processing (MPP) systems and model and score management model information model scoring output model validation and reporting predictive modeling markup language (PMML) and real-time scoring routines structured query language (SQL) and Ensemble models Enterprise analytic data set (EADS) characteristics of creation of data in logical versus physical structure process of table-based versus views updating use of Enterprise Data Warehouses (EDWs) External sandbox Extract, transform, and load (ETL) process F Faceless customer data Feedback behavior G G.R.E.A.T criteria, data analysis Graphical user interfaces (GUI) Grid computing H Hybrid sandbox Hybrid structures I Industrial engines and equipment use of sensor data Inferences versus computing statistics, data analysis and Information technology (IT) compared to analytic professionals Innovation See also Analytic innovation center analytic applications of principles “break out of the box” business need for center for combination of concepts common vision for defined discovery and, creation of diversification and focus on the target iterative approach to key principles perspective changes and priorities and ripple effects from risk and setting the stage for traditional approaches hampering Internal sandbox Internet transactions Intuition of analytic professionals Iterative approach to analytic innovation M MapReduce parallel programming framework of scalability and analysis using strengths and weaknesses of two-step process unstructured text analysis Massively parallel processing (MPP) central processing units (CPU) and data preparation and scoring using data storage and database systems embedded processes predictive modeling markup language (PMML) and scalability for analysis using structured query language (SQL) and user-defined functions Models commodity embedded scoring, management using ensemble scoring output validation and reporting N National Institute of Standards and Technology (NIST) cloud characteristics Next best offer O Open source software P Page rank Parallel programming frameworks, see MapReduce; Massively parallel processing (MPP) Passive RFID tags Point solutions Predictive modeling markup language (PMML) embedded scoring and massively parallel processing (MPP) systems Presentation skills See also Communication skills Privacy of data big data and web sources Private clouds Problem statement, framing for data analysis Production analytic data set (ADS) Public clouds Purchase paths and preferences R R Project for Statistical Computing Radio frequency identification data (RFID) asset tracking automated toll tags big data value and casino chip tracking data combined with fraud reduction from passive tags serial numbers tags, retail and manufacturing use of use of Real-time scoring Recency, frequency, and monetary (RFM) value Relational database management systems (RDBMS) Reporting, analysis compared to Research behavior Response modeling Retail and manufacturing use of RFID tags Risk, analytic innovation and S Samples versus population, data analysis and Sandbox environments analytic benefits of capacity planning using cloud environment compared to data analysis using external hybrid identification of new sources using internal workload management using Scalability centralization of data cloud computing combined analytical technologies data size, measurement of Enterprise Data Warehouse (EDW) extract, transform, and load (ETL) process grid computing history of MapReduce massively parallel processing (MPP) database systems merging analytic and data environments relational database management systems (RDBMS) structured query language (SQL) and Semi-structured data Sensor data external effects of structure of industrial engines and equipment monitoring output smart grid data and use of Serial numbers, RFID tags and Shopping behavior Smart grid data big data used for mixing data types sensors and smart meter readings use of utilities (power) and Social network data complications with ripple effects from innovation of telecommunication industries and total value of customer use of user interaction and Statistics data analysis and inferences compared to significance of Structured query language (SQL) embedded processes massively parallel processing (MPP) systems push down user-defined functions T Table-based EADS, views compared to Team structures centralized decentralized/functional hybrid Telecommunication industries and social network data Telematics data automotive insurance collection of black box use of Telemetry data, video games and Telephone to social media, ripple effects from innovation of Text data analysis of interpretation of meaning, emphasis changes of text mining tools unstructured data use of 360-degree view Time and location data global positioning systems (GPS) interpretation of marketing and use of Traditional data combined with traditional data differences from big data structure of Transactional data U Unstructured data MapReduce and sources text analysis User-defined functions, MPP systems User interfaces, analytic tools and Utilities (power) use of smart grid data V Video games, telemetry data and Vision common long-term perspective compensation and innovation principle of priorities and Visualization analytic tools using data graphics and tables immersive intelligence W Web browsing history mixing data types privacy and Web data abandoned basket statistics applications of assessing advertising results attribution modeling behavior types customer behavior customer segmentation faceless customer data feedback behavior knowledge, use of next best offer overview of privacy and purchase paths and preferences recency, frequency, and monetary (RFM) value research behavior response modeling shopping behavior 360-degree view transaction types (location flags) Web logs semi-structure of value of data in Workload management, sandbox used for ... Data: Franks, Bill Taming the big data tidal wave: finding opportunities in huge data streams with advanced analytics / Bill Franks pages cm — (Wiley & SAS business series) Includes bibliographical... BIG DATA EXPLORING BIG DATA MOST BIG DATA DOESN’T MATTER FILTERING BIG DATA EFFECTIVELY MIXING BIG DATA WITH TRADITIONAL DATA THE NEED FOR STANDARDS TODAY’S BIG DATA IS NOT TOMORROW’S BIG DATA. .. IS BIG DATA? IS THE BIG PART OR THE DATA PART MORE IMPORTANT? HOW IS BIG DATA DIFFERENT? HOW IS BIG DATA MORE OF THE SAME? RISKS OF BIG DATA WHY YOU NEED TO TAME BIG DATA THE STRUCTURE OF BIG