Co m pl im en ts of The State of Data Analytics and Visualization Adoption A Survey of Usage, Access Methods, Projects, and Skills Matthew D Sarrel Raise Your Big Data IQ Zoomdata Master Class makes it easy to get a big data analytics education Learn from top industry experts on topics like modern data and analytics platforms, big and streaming data analytics, and more Before you know it, people will wonder how you got so smart! Check out Zoomdata Master Class today! Learn from: Tony Baer, Ovum; Howard Dresner, Dresner Advisory Services; Matt Aslett, 451 Research; Wayne Eckerson, Eckerson Group; Mark Madsen, Third Nature; Mike Lock, Aberdeen Group …and more! The State of Data Analytics and Visualization Adoption Matthew D Sarrel Beijing Boston Farnham Sebastopol Tokyo The State of Data Analytics and Visualization Adoption by Matthew D Sarrel Copyright © 2017 O’Reilly Media, Inc All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://oreilly.com/safari) For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Editor: Nicole Tache Production Editor: Kristen Brown Copyeditor: Octal Publishing, Inc September 2017: Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Ellie Volckhausen First Edition Revision History for the First Edition 2017-09-18: First Release The O’Reilly logo is a registered trademark of O’Reilly Media, Inc The State of Data Analytics and Visualization Adoption, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limi‐ tation responsibility for damages resulting from the use of or reliance on this work Use of the information and instructions contained in this work is at your own risk If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsi‐ bility to ensure that your use thereof complies with such licenses and/or rights 978-1-491-99942-4 [LSI] Table of Contents The State of Data Analytics and Visualization Adoption Introduction Data Analytics and Visualization Usage: The Big Picture Key Areas of Analytics by Industry Usage and Access of Analytics by Industry Working with the Data: Joining, Sourcing, Streaming Requisite Skills for Analytics by Industry The Value of Big Data Today Summary 10 11 13 iii The State of Data Analytics and Visualization Adoption Introduction Regardless of industry or company size, businesses are increasingly relying on data analytics and visualization to build a competitive advantage Organizations are racing to gather, store, and analyze data from many different sources in many different formats In the race toward success, businesses are transforming themselves to make data-driven decisions, and the associated technology is evolv‐ ing as rapidly (or more so) as the businesses themselves The fast-evolving data analytics and visualization technology land‐ scape means that businesses and individuals are scrambling to make the best technology choices Businesses need to know that they’re choosing the right languages, products, architectures, and data sour‐ ces Individuals need to know that they’re learning the right skills to snare the right jobs Those who choose poorly run the risk of being left behind as they fail to take advantage of the timely insights pro‐ vided by well-conceived and timely data analytics and visualization programs For this reason, in the spring of 2017 Zoomdata commissioned O’Reilly Media to field a survey to assess the state of data analytics and visualization adoption 875 survey respondents identified their industry, job role, company size, reasons for using analytics, tech‐ nologies used in analytics programs, the perceived value of analytics programs, and more Results indicate the following: • Big data analytics and visualization programs are most mature in manufacturing, financial service, and technology/software companies • Projects are typically built for business users and business ana‐ lysts who commonly rely on visual dashboards to gain the insights that they require to optimize business processes and better understand customers • Relational databases are the most common data source (although analytic databases and Hadoop are the most common source of big data) • Companies are hungry for Python, SQL, and relational database skills • Kafka and Spark are emerging as the streaming data technolo‐ gies of choice • Customer 360/customer insights is the most common use case After veracity (data quality), variety followed by volume are the most valued characteristics of big data across all industries Our goal with this report is to highlight the results of this survey so that they might inform your career or organization as you embrace new technologies for data collection, storage, analysis, and visualiza‐ tion Data Analytics and Visualization Usage: The Big Picture The 875 respondents who participated in this survey represent a variety of industries (Figure 1-1) More than 40% reported working in technology/software This is followed by just over 10% in finan‐ cial services, almost 8% in healthcare/medical technology, and roughly 5% in manufacturing, government, retail, or education/ academia | The State of Data Analytics and Visualization Adoption Figure 1-1 Industries represented in the survey As shown in Figure 1-2, respondents primarily indicated that they were engineers/developers (18%), data scientists (17%), data ana‐ lysts/business analysts (15%), or architects (13%) and they work at companies of various sizes It is interesting to note that Managers and CxOs are actively engaged with these topics, with 14% of respondents compared to 8% for IT professionals Figure 1-2 Job roles represented in the survey Surprisingly, small businesses of fewer than 50 employees make up many respondents (26%) It’s refreshing to see small business lead the charge toward the new technologies and business processes related to data analytics and visualization (Figure 1-3) Data Analytics and Visualization Usage: The Big Picture | Figure 1-3 Organizational size (by number of employees) represented in the survey More than 50% of respondents indicated that they use analytics for customer insights/customer 360, followed by business process opti‐ mization (43%; Figure 1-4) It’s important to note that these areas directly support line-of-business activities This supports the idea that businesses are building data analytics and visualization pro‐ grams in order to make data-driven decisions and create competi‐ tive advantage Figure 1-4 Key areas using analytics within organizations Key Areas of Analytics by Industry Although aggregate survey results are interesting, when you drill down into specific industries, you begin to see some important trends This also allows you to understand the state of data analytics and visualization use in your industry and provides guidance for developing programs that help build competitive advantage Picking up where we left off discussing the aggregate data, let’s take a look at the key areas of analytics use by industry (Figure 1-5) Cus‐ tomer insights/customer 360 is an area of focus for more than 50% of respondents in the technology/software, financial services, and retail industries, and surprisingly for more than 30% of respondents in education/academia The potential business impact of under‐ | The State of Data Analytics and Visualization Adoption standing customers cannot be underestimated Understanding cus‐ tomer needs is likely to lead to happy customers, and happy customers are likely to lead to greater revenue Figure 1-5 Areas of analytics by industry The exception is in healthcare/medical technology where healthcare data analysis is far and away the most common key area of data ana‐ lytics and visualization use This doesn’t come as very much of a Key Areas of Analytics by Industry | surprise though because this is an industry specific use case If you’re not analyzing healthcare data, you’re probably not much of a healthcare/medical technology company Healthcare data analysis is followed by other important business-related analyses such as cus‐ tomer insights/customer 360 and business process optimization Business process analysis is another important use of data analytics and visualization, and occupies a top-three spot in every industry as reported by survey respondents Business process optimization is the top use of data analytics and visualization in manufacturing and government Optimizing business processes typically results in decreased operating costs and can also lead to greater customer sat‐ isfaction, so this is a strategic way to build competitive advantage across many industries Similarly, the retail and manufacturing industries also place an emphasis on supply chain analytics and visualization initiatives Uncovering supply chain problems in a timely manner gives retail and manufacturing businesses an opportunity to find alternate sour‐ ces An optimal supply chain is certainly a competitive advantage for these businesses Fraud detection/cyber security intelligence is an important use of data analytics and visualization in financial services and govern‐ ment Fraud detection is critical to any financial service, given that this industry is rife with attempted fraud Where there’s money, there’s likely to be attempted fraud Detecting and eliminating fraud builds trust with customers while decreasing operating costs Cyber security intelligence is a focus of numerous government agencies, while preventing fraud is critical to elections and efficient ongoing operations Looking at the question “At what stage are big data analytics project(s) in your organization” by industry helps us to understand how the rate of adoption varies by industry In our top six industries —financial services, government, healthcare/medical technology, manufacturing, retail, and technology/software—we see that adop‐ tion runs the gamut from “we don’t have big data analytics projects” (18%) to “multiple projects” (22%) Manufacturing leads the “multi‐ ple projects” category, with 28%, while government lags in this cate‐ gory, with 7% Let’s examine the stage of data analytics projects by specific industry (Figure 1-6) The leading response in manufacturing is “multiple | The State of Data Analytics and Visualization Adoption projects” at 28% followed by “in development” at 22% We see a sim‐ ilar case in the financial services industry, with about 25% of respondents note having “multiple projects,” and about 21% of respondents having “in development” projects Technology/software respondents indicate that 21% are involved in multiple projects and in-development projects In healthcare/medical technology the pic‐ ture is a little muddled in that 25% of respondents are engaged in multiple projects, whereas 26% report that they aren’t engaged in any big data analytics projects Retail is in a similar position with 23% reporting no projects and 25% reporting multiple projects In government, “we don’t have big data analytics projects” leads at 33% followed by “defining requirements” at 27% Figure 1-6 Stage of data analytics projects by industry Usage and Access of Analytics by Industry Looking at the target user for big data analytics and visualization projects (Figure 1-7), we see that in aggregate our survey respond‐ ents are developing for business users This means that the analytics and visualization software must be easy to use and intuitive Busi‐ ness users can’t afford to spend all day focused on the mechanics of analytics For analytics to provide competitive advantage, business users must be able to quickly and easily convert data into insights and take action Figure 1-7 Target users of big data analytics project(s) This holds true across our top six industries (Figure 1-8) Digging deeper, business analysts are the second most common target in Usage and Access of Analytics by Industry | government (tied with customers), manufacturing, retail, and tech‐ nology/software Data scientists are the second most common target users in financial services and healthcare/medical technologies Figure 1-8 Target users of big data analytics project(s) by industry We asked survey participants, “Where would big data analytics be available to users?” and the responses are split roughly evenly between embedded in an application of business process and stand‐ alone business intelligence (BI) applications (Figure 1-9) Financial services (57%) and technology/software (54%) show a slight prefer‐ ence for embedded, whereas retail (58%) shows a slight preference for standalone BI applications Figure 1-9 Method for accessing big data analytics We asked survey participants to identify how users would interact with data analytics: dashboards, embedded in applications, or opera‐ tional reports (Figure 1-10) Across our top categories, respond‐ ents showed a strong preference toward dashboards The second most common way for users to interact with big data analytics was operational reports However, the second most common way for financial services users to interact with big data analytics was embedded in applications | The State of Data Analytics and Visualization Adoption Figure 1-10 User interaction with data analytics Working with the Data: Joining, Sourcing, Streaming We asked survey participants how they join data from multiple sources in order to analyze it (Figure 1-11) In our top six categories, data warehouse/datamart was the predominant response This was especially true in retail (56%) Virtual federation/mashup (blending data on-the-fly without moving into a warehouse) is most widely used in healthcare/medical technology (24%), technology/software (21%), and government (21%) Figure 1-11 Joining data methodology We asked survey participants to identify their main data sources (Figure 1-12) Not surprisingly, relational database is the leading response in our top six industries, topping out at 39% in healthcare/ medical technology The leading nonrelational and big data stores are ranked as analytic database, Hadoop, NoSQL database, cloud data store, in-memory database, and search database Financial services (24%) and government (25%) make the heaviest use of ana‐ lytic databases, whereas retail (11%) and technology/software (10%) make the heaviest use of cloud data stores Hadoop usage hovers around 15%, except in government where it drops to 9% Inmemory databases are used primarily by manufacturing (10%) and government (9%) Manufacturing (12%) is also the heaviest user of search databases Working with the Data: Joining, Sourcing, Streaming | Figure 1-12 Main data sources for analytics Spark and Kafka are far and away the most common technologies used for analyzing streaming data (Figure 1-13) This holds true for respondents across all industries as well as respondents in our top industries Spark and Kafka account for over 65% of streaming data analysis in our survey Technology/software (37%) is the leading industry for Kafka followed by financial services (33%) Spark, pop‐ ular across all industries, is led by retail (40%), healthcare/medical technology (37%), and technology/software (35%) Confluence is most widely adopted in government (30%), which is also where Streamsets (11%) is most common Figure 1-13 Technologies for analyzing streaming data Requisite Skills for Analytics by Industry Turning to the analytics-related skills that industries are staffing based on the technologies that they are planning to adopt, we see that overall the skills in the most demand are Python, SQL, and rela‐ tional databases, followed by Hadoop and Java (Figure 1-14) This holds true in our top industries as well, with government leading the demand for Python (19%) and relational database (17%) skills, and healthcare/medical technology leading the demand for SQL (18%) skills 10 | The State of Data Analytics and Visualization Adoption Figure 1-14 Required analytics-related skills by industry The Value of Big Data Today We asked survey participants to rank the value of four characteris‐ tics of big data: veracity, velocity, variety, and volume This gives insight into the overall use and business impact of big data analytics and visualization programs Volume refers to the amount of data that is gathered and analyzed Variety refers to the many different sources and types of data—structured and unstructured data—that is gathered and analyzed Velocity refers to the pace at which data is gathered and analyzed And last, but certainly not least, veracity refers to how closely the data approximates the “truth” and lacks bia‐ ses, abnormalities, and inaccuracies Successful big data analytics programs must consider the combination of volume, variety, veloc‐ ity, and veracity in order to provide business insight Anything less will fail to provide the competitive advantage the company desired when launching its big data analytics and visualization initiative Looking at the combined data across all industries, the most valued characteristic of data is veracity (Figure 1-15) This isn’t terribly sur‐ prising, given that without veracity, there wouldn’t be much value in big data projects at all It doesn’t matter how powerful your analytics programs are if you’re feeding them biased and inaccurate data Next in importance is variety This indicates that analytics and visu‐ alization solutions must be able to combine multiple sources and types of data, structured and unstructured, to provide the insights that businesses need Next in importance is volume Finally, velocity has the least value to survey respondents, indicating that they con‐ tinue to place tremendous value in typical business data and not as much in unstructured and streaming data This is consistent with the relative lack of adoption of streaming data analysis as reported in other questions The Value of Big Data Today | 11 Figure 1-15 Value of data characteristics (1 being most valuable, being least valuable) Digging deeper into our top six industries, we see that a high value is placed on veracity across the board, although technology/software and manufacturing don’t hold veracity in as high regard as retail, financial services, government, and healthcare/medical technology (Figure 1-16) Variety is most valued by retail, financial services, and technology/software (Figure 1-17) Interestingly, volume (Figure 1-18) shares a similar value across our top six industries Velocity, the least valuable characteristic of big data in our overall responses, does have value for technology/software and manufactur‐ ing (Figure 1-19) Figure 1-16 Importance of veracity 12 | The State of Data Analytics and Visualization Adoption Figure 1-17 Importance of variety Figure 1-18 Importance of volume Figure 1-19 Importance of velocity Summary The survey results show that to offer business value, analytics and visualization programs are typically aimed at supplying business users and business analysts with the information they require This information is most often embedded in an application or in a stand‐ alone BI application, and is engaged with via dashboards The value placed on veracity tells us that this information must be accurate and unbiased Relational databases are the most popular main data sources for organizations Beyond that, analytic databases and Hadoop are the most common sources of big data This coincides with our respond‐ ents prioritizing Python, SQL, and relational database skills for ana‐ lytics workers The emphasis on relational databases and SQL indicates that our survey respondents still place tremendous value in Summary | 13 typical business data and not as much in unstructured and stream‐ ing data However, those working with streaming data rely heavily on Kafka and Spark Manufacturing, financial services, and technology/software are fur‐ thest along the adoption curve for big data analytics and visualiza‐ tion technology, with companies in these verticals reporting that they are in the “multiple projects” and/or “development” phases These three industries are followed by healthcare/medical technol‐ ogy and retail while government brings up the rear with over half of respondents indicating that they either have no big data analytics projects in-progress or they’re currently defining requirements 14 | The State of Data Analytics and Visualization Adoption About the Author Matthew D Sarrel is the founder of Sarrel Group, a technical and content marketing consulting practice and product test lab Matt has over 30 years of experience in technology analysis, implementation, testing, and marketing with a focus on security, networking, and big data He has worked for some of the largest and smallest tech com‐ panies in the world Matt has written for numerous publications such as PCMag, eWeek, InfoWorld, GigaOm, CIO, eSecurityPlanet, Allbusiness.com, and Backayard Magazine Matt is passionate about cooking with fire and competes on the KCBS BBQ circuit ... architects (13%) and they work at companies of various sizes It is interesting to note that Managers and CxOs are actively engaged with these topics, with 14% of respondents compared to 8% for IT. .. editions are also available for most titles (http://oreilly.com/safari) For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Editor:... volume, variety, veloc‐ ity, and veracity in order to provide business insight Anything less will fail to provide the competitive advantage the company desired when launching its big data analytics