Business analytics a management approach by vidgen

440 21 0
Business analytics a management approach by vidgen

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

‘Yes, this is a book on business analytics—but it’s really much broader It encompasses up-to-the-minute topics like analysis of social media data, automated machine learning, visual analytics, open source tools, agile methods, and ethical issues like algorithmic bias It’s a complete and accurate guide to how analytics are currently practised in leading organizations.’ —Thomas H Davenport, Distinguished Professor, Babson College, USA; Research Fellow, MIT, USA; Senior Advisor, Deloitte, Author of Competing on Analytics and The AI Advantage ‘Without a doubt, we are witnessing an unprecedented avalanche of digital data Harvesting value from this deluge is easier said than done, but there are now many tools and methods available For a manager, this book is vital and provides an impressive overview of business analytics approaches to support a data-driven organisation, and runs the full gamut from understanding and analysing big data, to designing the right business model for strategising and organising a leadership that drives analytics value creation.’ —Professor Leroy White, Warwick Business School, UK ‘Business Analytics provides great coverage of business analytics – terminology, concepts, frameworks, methodologies, data preparation, modelling alternatives, model evaluation, implementation, organizational issues, and many real-world examples The integration of findings from research and surveys of current practices add both academic rigour and practicality Videos, readings, and software (e.g SAS VA, DataRobot) help to create a rich learning experience.’ —Hugh J Watson, C Herman and Mary Virginia Chair of Business Administration, Terry College of Business, University of Georgia, USA ‘An excellent work delivering the latest topics in business analytics written in a very approachable way Suitable for a wide readership from undergraduate students to managers Highly recommended!’ —Dr Mashiho Mihalache, Assistant Professor, Amsterdam Business School ‘Are you looking for a book that offers a management approach to business analytics and retains both technical depth and a practical application of tools and techniques? Then you should try this remarkable new book! Its strategic and organizational perspective makes it ideal for teaching analytics in business schools It also uses the cutting-edge tools such as automated machine learning, which ensures its relevance to organizations.’ —Asil Oztekin, Professor of Operations & Information Systems, Manning School of Business, University of Massachusetts Lowell, USA and 2017–2018 Former Chair of INFORMS Data Mining Section ‘Business Analytics covers very well the knowledge and skills one needs to run a data-intensive, analytics project end-to-end It is well-written and illustrated with real cases and comprehensively referenced It exposes students to contemporary platforms and software, making them business-ready I commend especially the coverage of ethical considerations in analytics, which is long overdue.’ —Dr Max Chipulu, Programme Lead, BSc Business Analytics Programmes, Southampton Business School, University of Southampton ‘In an age where information matters for business survival, companies need to understand and appreciate the value derived from data Business Analytics provides a toolkit for the practitioner to unlock said value I would highly recommend it to executives and students who are thinking about or already on the analytics journey.’ —Dr Yudhvir Seetharam, Senior Lecturer, University of the Witwatersrand, Johannesburg, South Africa ‘The book is lucidly written with several case studies illustrating key concepts in data visualization and business analytics The treatment, though rigorous, should be accessible to undergraduate students Excellent coverage of recent developments in the field.’ —Sanjay Paul, Associate Professor of Economics, Elizabethtown College, USA ‘An up-to-date piece of work, with a wide range of illustrative features that facilitate an understanding of complex concepts in business analytics.’ —Dr Joseph Jie Yu, Associate Professor in Information Systems, Nottingham University Business School, China ‘This is a great book for introducing MBA students into the area of business analytics with up-to-date information, a comprehensive coverage of topics and clear explanations for the non-specialist It reflects a careful approach to use analytics for creating value in business.’ —Martin Kunc, Professor of Business Analytics/Management Science, Southampton Business School, UK ‘I would like to applaud the authors’ inclusion of the latest developments and challenges associated with Big Data analytics in this book This book will not only introduce readers to the latest analytics technologies, but also guide readers through how to deal with the unique challenges of Big Data analytics.’ —Dr Nobuyuki Fukawa, Associate Professor of Marketing, Missouri University of Science & Technology, USA BUSINESS ANALYTICS A Management Approach RICHARD VIDGEN, SAM KIRSHNER AND FELIX TAN © Richard Vidgen, Sam Kirshner and Felix Tan, under exclusive licence to Springer Nature Limited 2019 All rights reserved No reproduction, copy or transmission of this publication may be made without written permission No portion of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, Saffron House, 6–10 Kirby Street, London EC1N 8TS Any person who does any unauthorized act in relation to this publication may be liable to criminal prosecution and civil claims for damages The authors have asserted his their rights to be identified as the authors of this work in accordance with the Copyright, Designs and Patents Act 1988 First published 2019 by RED GLOBE PRESS Red Globe Press in the UK is an imprint of Springer Nature Limited, registered in England, company number 785998, of Crinan Street, London, N1 9XW Red Globe Press® is a registered trademark in the United States, the United Kingdom, Europe and other countries ISBN 978–1–352–00725–1 paperback This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources Logging, pulping and manufacturing processes are expected to conform to the environmental regulations of the country of origin A catalogue record for this book is available from the British Library A catalog record for this book is available from the Library of Congress TABLE OF CONTENTS List of Figures and Tables vii Preface xiv Part I Business Analytics in Context  1 Introduction Business Analytics Development 23 Data and Information 49 Part II Tools and Techniques 67 Data Exploration 69 Clustering and Segmentation 108 Predictive Modelling with Regression 122 Predictive Modelling with Logistic Regression 152 Predictive Modelling with Classification and Regression Trees 174 Visualization and Communication 193 10 Automated Machine Learning 215 11 R 253 12 Working with Unstructured Data 284 13 Social Networks 303 Part III Organizational aspects 331 14 Business Analytics Development Methodology 333 v vi TABLE OF CONTENTS 15 Design and Agile Thinking 353 16 Ethical Aspects of Business Analytics 376 Appendices: Appendix A – Dataset Descriptions 398 Appendix B – GoGet Case Study 413 Appendix C – Business Analytics Capability Assessment (BACA) Survey 422 Index428 LIST OF FIGURES AND TABLES Figures 1.1 1.2 1.3 1.4 1.5 1.6 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 Business analytics in context (Vidgen 2014) Open data available from the London Datastore (LDS) for ‘Crime and Community Safety’ The Internet of Things Google Glass (https://www.varifocals.net/google-glass/) A taxonomy of disciplines related to analytics (Mortenson et al 2015) 15 Business analytics function 17 Core elements of a business analytics development f­ unction 23 Steps in the analytics process 24 Phases of the CRISP-DM reference model (Chapman et al 2000, p.13) 27 An A/B test 30 An A/B test in the UK courts service (Haynes et al 2012, p 10, fig 5) 31 Artificial intelligence (AI), machine learning, and deep learning (reprinted from Chollet 2018, p.4, Copyright (2018) with permission from Manning Publications) 32 Data scientist attributes (Data Science Radar™, Reprinted with permission from Mango Solutions 2019) 38 The DataRobot approach to automated machine learning (https://blog.datarobot com/ai-simplified-what-is-automated-machine-learning)42 Aligning the analytics development function 47 From data to wisdom 52 Farr’s analysis of mortality data (Farr 1885) 52 Farr’s analysis of cholera mortality data (Farr 1852) 53 Two movies compared 55 Data quality in context 56 Data quality in six dimensions 57 Normal distribution (mean = 0, sd = 1) 61 Exponential distribution 61 Anscombe’s quartet 70 Scatter plot showing the relationship between television, earnings and age for a small sample of the dataset 72 Heat map showing the relationship between television, earnings, and age for the entire dataset 72 The top of the SAS VA homepage window 73 Data Explorer window 74 Data options. 75 Automatic chart 76 Properties of the automatic chart 76 Role tab options 77 Bar chart aggregated by the sum of each employee’s age 77 Change the aggregation on a bar chart 78 Bar chart aggregated by the average age of each employee 78 Bar chart of average age across job roles and gender 79 How to change properties of a graph so gender is grouped 79 Better bar chart of average age across job roles and gender 80 vii viii LIST OF FIGURES AND TABLES 4.16 Data pane for the dataset country 81 4.17 Creating a hierarchy for the dataset country 82 4.18 Creating a custom category for the dataset country 83 4.19 Creating a new variable for the dataset country 84 4.20 Viewing the properties of measure data 84 4.21 Bar chart in SAS VA 86 4.22 Bar chart with grouping in SAS VA 87 4.23 Histogram in SAS VA 88 4.24 Line chart in SAS VA 89 4.25 Scatter chart in SAS VA 89 4.26 Bubble charts in SAS VA 90 4.27 Pie charts 91 4.28 Bar charts displaying the same information as the pie charts in Figure 4.692 4.29 Box plot showing outliers 93 4.30 Tree map 94 4.31 Heatmap 95 4.32 Geo map 96 4.33 Correlation matrix 97 4.34 Bar chart displaying the proportion of customers who are smokers 98 4.35 Histogram of the age variable 99 4.36 Setting a filter 100 4.37 Creating a new variable, age 2 101 4.38 Histogram of BMI 101 4.39 Bar chart visualization showing charges by region and sex 102 4.40 Bar chart visualization showing average charges by region and smoker 102 4.41 Bar chart visualization showing average charges by region, whether the charge is from a smoker and whether BMI is over or under 30 103 4.42 Line chart visualization showing average charges by age, whether the charge was made by a smoker, and whether BMI is over or under 30 104 4.43 Nested if statements 104 4.44 BMI and smoker grouped by age 105 4.45 Bubble chart of BMI and smoker 105 4.46 Bubble chart grouped by male and female 106 5.1 Clustering Mario Kart characters 111 5.2 Example of a dendrogram for hierarchical clustering 112 Example of k-means clustering 114 5.4 Individuality of countries in the dataset (higher scores represent greater individualism and lower scores represent more collectivist societies) 116 5.5 Default clustering of the Hofstede dataset 117 5.6 Cluster matrix for all six dimensions 118 118 5.7 Parallel coordinate plot for three clusters 5.8 Geo map of cultural clusters (based on three cluster groups) 119 5.9 Geo map cultural clusters (based on ten cluster groups) 120 6.1 Graph of exam marks – actual versus predicted (mean) 125 6.2 Scatter plot of hours of revision against exam mark with a fitted regression line 126 6.3 Scatter plot of hours of revision against exam mark with a fitted regression line and error terms 128 6.4 Creating a simple linear regression model in SAS VA 129 129 6.5 Linear regression model results in SAS VA 6.6 Multiple regression visualization produced in SAS VA 139 6.7 Residuals (scatter plot) 141 141 6.8 Residuals (histogram) LIST OF FIGURES AND TABLES 6.9 6.10 6.11 6.12 6.13 6.14 6.15 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 7.14 7.15 7.16 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 8.15 8.16 8.17 8.18 8.19 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 Residual plot – identifying outliers 143 Influence plot 144 Kitchen quality as a single, categorical predictor of sale price 145 Creating an interaction effect 147 Setting the variable selection parameter 149 House sale price model (variable selection = 0.01) 150 House sale price model – variables included 150 Online calculator of a natural logarithm for value (http://www.1728.org/logrithm.htm) 155 Online calculator of a natural anti-logarithm http://www.1728.org/logrithm.htm) 155 The logistic function 156 Expressing logit as a probability 157 Setting the response variable for logistic regression 158 Setting the response event 159 Setting properties of the analysis 159 SAS VA logistic regression results 160 SAS VA logistic regression fit summary 161 SAS VA logistic regression assessment – misclassification 162 SAS VA logistic regression assessment – lift 164 SAS VA logistic regression assessment – ROC 164 SAS VA logistic regression assessment – inspection of residuals 165 SAS VA logistic regression assessment – residuals 166 SAS VA generalized linear model (GLM) applied to logistic regression 171 SAS VA GLM model results 171 An illustration of a decision tree 175 Creating a SAS VA decision tree with Sex as predictor 176 Setting the event level to ‘Survived’ 176 SAS VA decision tree model with Sex as a single predictor 177 SAS VA decision tree model with Sex and Age as predictors 178 Entropy graph 180 SAS VA decision tree variables and growth strategy 182 SAS VA decision tree 183 SAS VA decision tree model performance 184 SAS VA decision tree model performance – misclassification 185 SAS VA decision tree model advanced growth strategy 185 SAS VA decision tree model advanced growth strategy 186 SAS VA decision tree model custom growth strategy 187 Model comparison – selecting the models to be compared 188 Model comparison – logistic regression vs decision tree 189 Decision tree with a continuous target 189 Decision tree with a continuous target 190 Variables used to predict house price (partial) 191 Model performance (ROC curve) 192 Example of a social network diagram 196 Unordered and ordered divergent colour spectrums 196 Sample idea illustration – publication process 198 Sample idea generation – brainstorming for health analytics (www.flickr.com/photos/ juhansonin/3093096757)  199 Sample DataViz – a dashboard (commons.wikimedia.org/wiki/File: Opsview_Monitor_6.0_Dashboard.jpg)199 Sample visual discovery – exploring countries’ wine by price and production quantity 200 Sample dashboard showing a report on sales execution 203 First bar chart in the sample report on sales execution 205 Two bar charts for the sample report on sales execution 206 ix 416 APPENDIX B: GOGET CASE STUDY How GoGet works Members join the GoGet community on a subscription basis with different rates applying depending on the frequency of use (Figure B.2) Insurance is included in the hourly charge with an excess of $2,000, which can be reduced to $300 by paying a higher hourly rate Figure B.2 GoGet member subscriptions Searching and booking Members search for vehicles and then make bookings Typically, this entails specifying the location, time required, and type of vehicle (Figure B.3) For example, a family may want a people mover that caters for children, pets, and bicycles Once a vehicle has been located then a booking can be made (Figure B.4) Community discipline For GoGet to work there need to be rules and norms that are enforced Community discipline is maintained through a set of rules and etiquettes that include (GoGet, 2018): • Returning the car late or not returning vehicle to pod ($25 plus an additional $25 per member inconvenienced) ã Less than a ẳ tank ($25 fine if next member complains) • Tyres (if a member damages a tyre they are responsible for its replacement) APPENDIX B: GOGET CASE STUDY Figure B.3 GoGet vehicle search Figure B.4 GoGet vehicle booking 417 418 APPENDIX B: GOGET CASE STUDY • Cleanliness (if the car is left in a dirty state then a $95 cleaning fine may be imposed) • Animals in cars (some cars are pet-friendly – all cars must be left in a clean state regardless and cleaning charges may apply) • Smoking in cars (strictly not allowed – first offence incurs a $95 fine and subsequent offences may result in termination of membership) • Taking a car without a booking (minimum $25 fine) • Call out charges (fees apply if it is a member at-fault issue) • Investigation fee (charges of $25 per hour for investigating issues such as parking and speeding fines) Operational aspects of GoGet Organizational structure The organization structure of GoGet is shown in Figure B.5 At the core of the business is the FleetCutter IT platform, through which GoGet’s vehicle operations are delivered The platform has been developed in-house and is a source of competitive advantage for GoGet Corporate sales are managed through a dedicated sales function that provides corporate employees access to GoGet vehicles GoGet is also able to make its FleetCutter platform available to corporates for the management of their own vehicle fleets, allowing third parties to benefit from the fleet management and optimization capabilities developed by GoGet GoGet Marketing Member services Corporate sales FleetCutter (IT platform) Fleet management Fleet optimization Figure B.5 GoGet organization chart Fleet management GoGet needs to ensure that cars are available, within a convenient distance, even during peak business hours – the right car available at the right time in the right place The Product Manager of GoGet explained the problem that the company was facing in relation to idling capacity: ‘We want more utilization but up to a certain point because if a vehicle isn’t available for usage consistently in an area then people will stop using it at all One of the things that we with idling capacity is to look for a mix of both residential and commercial uses because if you have just personal use, you will find that everyone wants to use the car in the evening and during the weekends.’ The company has also begun experimenting with ‘green alternatives’ such as electric cars and is keeping a watchful eye on autonomous vehicles APPENDIX B: GOGET CASE STUDY Fleet optimization GoGet invested heavily in the integration of systems to streamline business processes such as making bookings, determining car location, handling customer interactions, and providing remote customer assistance The integrated fleet optimization system allows large companies who have fleets of vehicles to install in-car technology into their vehicles and then have their vehicles on the same booking platform as the GoGet vehicles This gives partner companies much better visibility of how their vehicles are being used and how many vehicles they actually need It also means that they can use the GoGet cars when there is overflow during high usage periods – the base level of their fleet can be lower and costs can be reduced Internet of Things Through technology customer service staff can remotely view the location of all GoGet cars on their screens, providing directions to locate cars for customers – in addition to remotely opening the car doors, sounding the horn, to killing the engine Through this remote governance over the fleet, GoGet was able to expand from its initial concentration within the Sydney local area to a more distributed geographic pattern across Australia, all the while being able to securely monitor and maintain its fleet Geographical information system (GIS) By coupling the use of the Customer Relationship Management (CRM) system with a sophisticated geographical information system (GIS), GoGet is able to use its customer profiles and location information to provide more value-adding and targeted offerings, moving from offering just passenger cars to vans, SUVs, and even luxury cars, as well as a new minute-to-minute booking model A GoGet executive described the use of these systems: ‘So we have heat maps on our websites that shows a couple of different locations, people and utilization levels We use technology a lot in looking for new areas … demographics reporting and our existing data on similar profile suburbs.’ Customer relationship management (CRM) In particular, to sustain the growth of its business, GoGet sought to leverage the critical mass of its platform to carve out new, lucrative niches within its customer base and increase the attractiveness of their market offerings Towards this goal, GoGet implemented a CRM system, which maintains a profile of its members and helps staff create more personalized connections with them, making its members feel more privileged A particular challenge for GoGet is to measure customer lifetime value (CLV) Customer behaviour analysis While the great majority of customers are responsible and safe members of the community, some customers engage in fraudulent activity and others are high risk in terms of having vehicular collisions 419 420 APPENDIX B: GOGET CASE STUDY Fraud can involve taking cars and abandoning them without payment While the cars are very rarely stolen, fraudsters cost GoGet money through not paying for the service, leaving the car in an inconvenient location (which might be outside of Sydney), and are more likely to leave the vehicle in a damaged state Some customers have higher risks of being involved in a collision Descriptive data analysis suggests that customers who are not used to driving on the left-hand side of the road, for example those from the USA or mainland Europe, find roundabouts and junctions difficult to negotiate For US drivers roundabouts are a particular hazard as roundabouts are rare in the USA and this lack of experience is compounded by confusion over the direction of travel and right of way Further, many of the drivers using GoGet are occasional drivers (hence the motivation for using the service) who use GoGet cars relatively infrequently and thus may not be as competent as drivers who drive every day Repairing damaged cars is a significant cost both in terms of repairs and for insurance charges (insurance is only used when the damage repair is significant) Community building and pro-social agenda To reinforce its pro-social mission that had become diluted with the growth of its business partnerships, GoGet sought to reinforce the collective identity of its community via a variety of measures First, it gave human names to the GoGet cars on its website, effectively anthropomorphizing its inventory (Figure B.4) The rationale behind this seemingly trivial move was that it would improve the sense of ownership and create ‘personal’ relationships between the customers and their local GoGet car The Communications Manager of GoGet explained: ‘So they will see that resource always a bit as belonging to them, they will recognize it and know some of the other people who use the vehicle in the same street While it is not theirs, they still have some sort of vested feeling about it.’ Second, GoGet sought to build more intimate relationships with its customers by using a variety of social media platforms and mobile booking options As a Customer Services Manager explained: ‘It’s hard because we definitely have those two different types of members The members that want to be in car-sharing for the green side, the environmental side, and really care about it and the other [group] don’t get car-sharing.’ This was also evident in the reviews left by customers online, with many taking pride in using the product-service As one customer noted, they were ‘doing our bit to change the “car is king” mentality, one less car on the road’ However, the majority of positive feedback from customers (ranking GoGet’s service from ‘Ok’ to ‘Excellent’) cited cost-savings and convenience as the major value found in being a member With more frequent and quality interactions with the two types of customers enabled by these channels, GoGet was able to not only promote its pro-social objectives more effectively, but also monitor the opinions of its customers and subsequently tailor its offerings in response to those opinions For instance, online reviews from customers also illustrated that some customers APPENDIX B: GOGET CASE STUDY felt that, as the business expanded, GoGet’s prices had crept up – seeing GoGet go ‘from green to greenback’ – and found the vehicle booking options too inflexible In response, GoGet expanded customers booking options to include minute-to-minute reservations and more user-friendly mobile app services, providing more flexible and convenient use of its product-service Challenges GoGet has grown organically and in a sustainable way, learning about and developing the ­car-sharing market with patience and persistence It now faces challenges of how it can (1) continue to grow and develop new markets, (2) achieve greater market penetration and visibility, particularly through working with Government, and (3) integrate with public transport to provide a seamless travel experience Bibliography Tan, F., Cahalane, M., Tan, B., & Englert, J., (2017) How GoGet CarShare’s Product-Service System is facilitating collaborative consumption MIS Quarterly Executive, 16(4), 265–277 421 APPENDIX C: BUSINESS ANALYTICS CAPABILITY ASSESSMENT (BACA) SURVEY See the Excel spreadsheet BACA_2_1.xlsx for the survey items and radar chart Instructions to respondents This survey is intended to record your perceptions There are no right or wrong answers; you should trust your instinct and go with your first feeling For each question enter a response ranging from (strongly disagree) to (strongly agree) If you cannot answer the question, for example you not employ any dedicated data analytics personnel or you not know the answer to the question, then select ‘n/a’ (not applicable) This data will remain confidential to the survey organizer and will only be reported in summary and anonymized form Demographics (per respondent) • Respondent name • Organization • Time with organization (years) • Job title Table C.1 The Business Analytics Capability Assessment Survey 1= strongly disagree 7= strongly agree n/a Data – quality 422 DQ1 Our data is accessible         DQ2 Our data is at the right level         DQ3 Our data is accurate         DQ4 Our data is consistent         DQ5 Our data is timely         DQ6 Our data is credible         APPENDIX C: BUSINESS ANALYTICS CAPABILITY ASSESSMENT (BACA) SURVEY 1= strongly disagree 7= strongly agree n/a Data – privacy and security DE1 We state clearly what data is collected and stored concerning data subjects (e.g customers, employees)         DE2 We state clearly how data that is collected and stored is used         DE3 We state clearly how data that is collected and stored might be shared with third parties         DE4 We only use data for the purpose for which it has been collected (i.e we not reuse data collected for one purpose for another purpose)         DE5 We have an in-depth understanding of the legal and regulatory environment of our data use         DE6 Our data is stored in a secure manner (e.g encrypted)         Data – use DU1 We are able to share data effectively with partners         DU2 We make effective use of external data (e.g credit ratings)         DU3 We make effective use of open data (e.g weather and Census data)                 Organization – analytics strategy OS1 We have a clearly articulated business analytics strategy 423 424 APPENDIX C: BUSINESS ANALYTICS CAPABILITY ASSESSMENT (BACA) SURVEY 1= strongly disagree 7= strongly agree n/a OS2 Our business analytics strategy makes it clear how and where value is created         OS3 Our analytics strategy is communicated effectively throughout the organization         OS4 Our analytics strategy is aligned with our business strategy         Organization – culture OC1 People in our organization understand the value of data and evidence-based decision-making         OC2 Key decisions are made using evidence rather than ‘gut feel’ in our organization         OC3 Our senior management fully supports the use of business analytics         Organization – structure OO1 Business analytics has a clear and appropriate place in our organization structure         OO2 The business analytics team has a deep understanding of our business         OO3 The analytics function has strong leadership         OO4 We are willing to share the value created by analytics with partners, customers, and others         OO5 We partner effectively with outside bodies (e.g universities, professional associations) to supplement our analytics expertise         APPENDIX C: BUSINESS ANALYTICS CAPABILITY ASSESSMENT (BACA) SURVEY 1= strongly disagree 7= strongly agree n/a Process PR1 We have a defined process to ensure that algorithms are used ethically         PR2 We are able to identify unintended consequences that may arise from the use of algorithms in our decision-making         PR3 We have a defined process to ensure that our use of analytics does not damage our reputation or brand         PR4 We use agile methods effectively to build analytics applications         PR5 We are able to balance the use of analytics for exploitation (improving how things are done) and exploration (doing new things)         PR6 We are able to measure the value created from business analytics projects         PR7 We manage our analytics projects as a portfolio of investments         PR8 We have an effective process for the management of analytics project risk         People – analytics personnel HA1 Our analytics personnel are curious         HA2 Our analytics personnel are problem-oriented         HA3 Our analytics personnel use whatever tools are appropriate to solve business problems         425 426 APPENDIX C: BUSINESS ANALYTICS CAPABILITY ASSESSMENT (BACA) SURVEY 1= strongly disagree 7= strongly agree n/a HA4 Our analytics personnel often propose solutions to existing business problems         HA5 Our analytics personnel often propose novel uses of analytics to address new business opportunities         HA6 Our analytics personnel have excellent IT technical skills (e.g R, SAS, Tableau)         HA7 Our analytics personnel have excellent statistical and modelling skills         People – HRM HM1 We have no difficulty in hiring qualified data analytics personnel         HM2 Our data analytics personnel have excellent career progression opportunities         HM3 We have no difficulty in retaining data analytics personnel         Technology TE1 We have appropriate enterprise data platforms (e.g data warehousing, cloud storage)         TE2 We have an appropriate tool set for data analytics personnel (e.g R, SAS)         TE3 We have excellent tools for end-user visualization of data         Through our use of analytics we are able to make better decisions         Value VA1 APPENDIX C: BUSINESS ANALYTICS CAPABILITY ASSESSMENT (BACA) SURVEY 1= strongly disagree 7= strongly agree n/a VA2 As a result of using analytics we have built stronger customer relationships         VA3 Analytics has enabled us to improve the customer experience         VA4 Our use of analytics has allowed us to innovate our products and services         VA5 Through our use of analytics we are able to perform better than our competitors         VA6 We use analytics to create value for public good as well as for commercial purposes         VA7 Through our use of analytics we are able to make cost-savings         VA8 Through our use of analytics we are able to improve our profitability         VA9 Through our use of analytics we are able to improve our productivity         427 INDEX a/b testing  23, 28–30, 47 accessibility  50, 56 accountable for bias  394 accountable for errors  387 adoption strategy  368–369 advanced analytics  215, 333, 355 agencies  6, 62, 299, 304 agile methods  355, 369–370, 373, 425 agile scrum  369, 371–372 agile software  26, 48 agile thinking  353, 355, 375 agility  355, 373–375 algorithm-based  376, 388, 390, 396 algorithmic approaches  289 alignment  24, 47, 344, 360, 382, 384, 395 Amazon  9–10, 12, 30, 108, 296, 376, 380, 386 analytic capabilities  193, 361 analytic models  72, 108 analytic process  200–201, 213 analytic software  193, 201 anonymisation  20, 378, 394 anonymity  318, 379, 391 ANOVA  130, 133, 138–139, 274 application programming interface (API)  286, 297 approximation  275, 290 arbitrary value  113 articulation  382, 384 artificial intelligence (AI)  4, 15, 32, 223, 297, 386 artificial neural network (ANN)  192 assignment operator  256 associations  69, 305, 424 attributes  38, 40, 49, 111, 326 automated machine learning  23, 41–43, 216–217, 219 Bayesian classifier  35, 291 bayesian network  35–36 bias  29, 63, 317, 387–388, 394–395 big data  9–13, 17 binomial  170, 229, 272, 274 bivariate  101, 263, 266 bootstrapping  142 business model canvas  334, 336, 341–342, 346, 348 428 capability assessment  333, 336–337, 422–423 cardinality  60, 86–88, 90 Carshare  414–415 case study  315, 338–340, 359–360, 372, 384, 413–421 categorical data  60, 80, 82, 86–87 categorical variable  81, 88, 90, 146 Chi-square  167 classification and regression trees  35, 174–175 classification models  31, 236 cloud  5, 11–13, 43–44, 216, cloud-computing  22 clustering algorithms  111, 313 cluster matrix  117–119 coefficients  71, 131, 133, 157, 223, 245 collateralized debt obligations (CDO)  62 conditional dependencies  35 conditional probability  290 confusion matrix  161–163, 168, 184, 186–188, 223, 233–236, 247, 273–274, 278–279, 281–282, 293–294 consumer decisions  110 consumer segments  119 convergence  8, 18, 296 correlations  64, 70, 83, 96, 142, 269, 390 cross-validation  43, 227–230, 242, 249 dashboards  10, 39, 41, 44, 73, 193, 199, 201, 203 data  4–6 data acquisition  37, 216 data assets  285, 335 data cleaning  13, 37, 39, 63 data culture  16, 21 data distribution  99, 264 data exploration  25, 69–71, 73, 106, 158, 219, 252, 259, 263 data governance  17 data management  5, 11, 17, 50, 55, 59, 65 data mining  4, 26, 28, 64, 109, 381 data models  37 data-science  22, 47–48, 374, 396 data sets  41 data sources  5–6, 18–19, 71, 193, 201, 334, 390 data types  11, 59, 85, 98, 195, 257 INDEX data visualization  34, 44, 54, 69, 97, 193, 301 data warehouses  10–11, 17, 21 DataRobot  40, 42–43, 216–217, 223, 245 decision tree  35, 175, 177, 187–188, 191, 276, 279, 282 dependence  238–239, 248–249 design thinking  353, 356–360 dichotomous relationships  307 dichotomous variable  174, 398 digital trace  3, 51, 316–317, 381 discrete event simulation  15 Durbin–Watson test   142, 269 e-commerce  6, 9, 30, 33 Econometrics  15 eigenvector  310–312 equality  256, 395 ethics  5, 16, 20, 59, 85, 110, 123, 197, 376–396 evidence-based  351, 424 experimental hypothesis  134 exploratory data analysis  39, 218, 226 exploratory multivariate analysis  70 exponential distribution  61 external data  6, 423 Facebook  8–9, 303, 316, 318, 321, 328, 377, 380, 385, 387, 391 factor analysis  291 false negatives  161–162, 186, 188, 233, 236–237 false positives  161–163, 186, 233, 235, 237, 293 filtering data  82, 100, 208 financial modelling  347 fit statistics  130–132, 139–140, 166–168 fit summary  160–161 flow chart  391–392 four V’s  10–11, 22 F-statistic  130, 268, 271 F-value  131, 136 Gaussian distribution  60 General Data Protection Regulation (GDPR)  392–397 general population  53, 293 genetic algorithms  35 geospatial data  284, 299, 302 glm  170, 172, 245, 272, 399 GoGet  338–339, 359, 369, 413 Google  4, 8, 9, 12, 30, 32, 49, 50, 108, 301, 347, 377, 380, 386 goodness of fit  131–133, 135 Hadoop  13, 22 heterogeneity  105, 377, 382 hierarchical clustering  108, 111–112, 115 hierarchical data  93, 95 histograms  87, 99, 224, 263–264 ideation  358, 361 idling capacity  418 independent variables  128, 131, 153, 160, 276 individuality  116–119, 408 Infrastructure-as-a-Service (IaaS)  12 Internet of Things (IoT)  6–9, 22, 380 join statement  260 k-folds  25 k-means  34, 108, 111, 113–115, 121, 179 Kolmogorov-Smirnov (KS)  165, 188, 237 kurtosis  60, 83 latent dirichlet analysis (LDA)  36 laws of probability  lift  161, 163–164, 184, 188, 190, 233, 237, 247 linear models  188, 245 linear regression  34, 36, 41, 99, 122–124, 127–129, 133, 135, 137–138, 140, 142, 147–148, 152–153, 157, 165, 168, 170, 172–174, 253, 265, 267–268, 385 logarithms  153–157 logistic regression  25, 34, 41, 43–44, 46, 152–153, 155, 157–174, 184, 186–189, 192, 204, 232, 235, 253, 271–272, 274–276, 278–279, 293, 398 machine learning  4, 31–32, 43, 108, 215–216, 386 manipulation  29, 41, 44, 252–253, 260, 387 MapReduce  13 matrix  95–96 maximum likelihood estimation (MLE)  157 misclassification  161–162, 184–186, 188, 190 missing data  63, 83, 159 missing value analysis  64, 232 multicollinearity  142, 269 multiple linear regression  122, 129, 138, 140, 148, 153, 170, 246, 267–268, 253 natural language processing (NLP)  32, 36, 108, 296–297, 302 network analysis  36, 110, 290, 303–304, 307, 313, 315–318, 321–322, 325, 328–329, 412 network density  307–308 network diagram  195–196 429 430 INDEX nominal variables  80, 260 non-identifying dataset  378 non-linear  41, 100, 148, 270 null hypothesis  128, 130–131, 133–134, 137 online transaction processing (OLAP)  50 opportunity canvas  336, 353, 367–369, 375 ordinal  88 ordinal data  195 ordinal (en)coding  231–232 ordinal regression  153, 174 ordinal variable  60, 80, 175 outcome–oriented  354 outlier  25, 62–63, 71, 93, 101, 140, 142, 200 overfitting  39, 50, 147, 149, 151, 165, 187, 227, 243, 281–282 parameter estimates  130, 139, 166, 172 partition data  230 Platform-as-a-Service  12 principal components analysis (PCA)  109 prototype  39, 358–359, 365–367 p-value  128, 131, 134, 160, 268–269 Python  40–41, 43–44, 283 quantile–quantile (QQ)  263–265, 269 R  23–24, 40–41, 44–47, 253–283, 286–292, 325–328 random forests  192, 253, 279 ratio variable  60 receiver-operating characteristic (ROC)  161, 163, 184 reciprocity  309, 318 R-square  131, 135, 268, 275 sampling data  71 sampling distribution  142 SAS  23, 40–41, 43–44, 72, 193, 203 SEMMA  26–28 sentiment analysis  36, 286, 289 significance tests  129, 133 social media  3, 8–10 social network analysis  36, 303–304, 315 social networks  303–305, 328 soft systems methodology  336, 352 SPSS  40, 151, 259, 284 structured query language (SQL)  13, 25, 40, 44, 252, 259–260, 262 standard deviation  60, 70, 135, 143, 218 standard error  131–133, 268, 271 statistical analysis  40–41, 69, 299, 328 statistical programming  41, 253 storyboarding  364–365 subtrees  187 support vector machine (SVM)  34, 192 systems methodology  336, 344, 346, 352 Tableau  44, 71, 252, 426 text analysis  36, 240–242, 285, 323, 393 text mining  232, 240–241 transformation (data)  25, 27, 32, 37, 60, 82, 142, 148, 217, 220 transformation (organization)  16, 21, 346 true negative  161–163, 223, 233, 235 true positive  161, 163–164, 233, 235, 237, 293 t-statistic  131 t-test  131 t-values  131 Twitter  8–9, 286–288, 291–292, 303–304, 316, 318, 321–325, 328, 377, 391, 393 unique identifier  7, 60 univariate analysis  70, 101 unstructured data  284–285 validation partitions  228, 230 value creation  3, 18, 71, 334 value proposition  341, 343, 347, 376 variance inflation factor (VIF)  142, 269 visual analytics  23, 40, 69, 108 visualization software  69, 71, 107 visualization techniques  71, 107 visualize data  253 Watson  297–299 weighted average  54–55 wireframe  366 word cloud  240, 242, 285–286, 291–292, 391 z-distribution  60 z-scores 143 ... 1.1: Data warehouses and data lakes Data lakes and data warehouses Amazon defines a data lake as ? ?a centralized repository that allows you to store all your structured and unstructured data at any... data and analytics performance reporting 30 Safeguarding reputation e.g., reputation and brand damage caused by inappropriate use of data, data leakage, selling data 31 Working with academia... making data with key decision making in the business Creating a big data and analytics strategy having a clear big data and analytics strategy that fits with the organisation’s business strategy

Ngày đăng: 08/09/2021, 10:11

Mục lục

    LIST OF FIGURES AND TABLES

    PART I BUSINESS ANALYTICS IN CONTEXT

    A framework for business analytics

    Evidence: A/B testing

    From data to wisdom

    Production view of data quality

    Consumption view of data quality

    The dangers of assuming normally distributed data

    Data does not speak for itself

    PART II TOOLS AND TECHNIQUES

Tài liệu cùng người dùng

Tài liệu liên quan