1. Trang chủ
  2. » Công Nghệ Thông Tin

IT training delivering embedded analytics in modern apps khotailieu

36 33 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 36
Dung lượng 6,6 MB

Nội dung

Co m pl im en ts of Delivering Embedded Analytics in Modern Applications A Product Manager’s Guide to Integrating Contextual Analytics Federico Castanedo & Andy Oram Delivering Embedded Analytics in Modern Applications A Product Manager’s Guide to Integrating Contextual Analytics Federico Castanedo and Andy Oram Beijing Boston Farnham Sebastopol Tokyo Delivering Embedded Analytics in Modern Applications by Federico Castanedo and Andy Oram Copyright © 2017 O’Reilly Media, Inc All rights reserved Printed in the United States of America Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472 O’Reilly books may be purchased for educational, business, or sales promotional use Online editions are also available for most titles (http://oreilly.com/safari) For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com Editor: Nicole Tache Production Editor: Nicholas Adams Copyeditor: Rachel Monaghan May 2017: Interior Designer: David Futato Cover Designer: Randy Comer Illustrator: Rebecca Demarest First Edition Revision History for the First Edition 2017-04-25: First Release The O’Reilly logo is a registered trademark of O’Reilly Media, Inc Delivering Embed‐ ded Analytics in Modern Applications, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work Use of the information and instructions contained in this work is at your own risk If any code samples or other technology this work contains or describes is sub‐ ject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights 978-1-491-98442-0 [LSI] Table of Contents Delivering Embedded Analytics in Modern Applications Overview of Trends Driving Embedded Analytics The Impact of Trends on Embedded Analytics Modern Applications of Big Data Considerations for Embedding Visual Analytics into Modern Data Environments Deep Dive: Visualizations Conclusion 15 23 28 A Self-Assessment Rating Chart 29 iii CHAPTER Delivering Embedded Analytics in Modern Applications Organizations in all industries are rapidly consuming more data than ever, faster than ever—and need it in forms that are easy to vis‐ ualize and interact with Ideally, these can be seamlessly embedded into the applications and business processes that employees are using in their everyday activities so they can make more effective data-driven decisions, instead of decisions based on intuition and guesses Research indicates that data-driven employees and organi‐ zations outcompete and are more successful than those that are not data-driven In keeping with these trends, business leaders increas‐ ingly require their vendors and IT organizations to embed analytics in business applications With embedded analytics, organizations leverage vendors’ domain expertise to provide analytics guidance for the application users The trade press has recently focused on helping businesses become “data-driven organizations.” As the result of a study, the Harvard Business Review bluntly announced, “The more companies charac‐ terized themselves as data-driven, the better they performed on objective measures of financial and operational results” and “The evidence is clear: data-driven decisions tend to be better decisions.” The article follows up with specific examples Infoworld stresses speed, automation of data collection, and independent thinking among staff The barriers to becoming data-driven are the focus of a recent Forbes article; ways forward include a compelling justification for the data and a central organization capable of handling it, adding up to a “data-driven culture.” McKinsey (which helped conduct the HBR study) also stresses organizational transformation, which involves learning analytical skills and integrating data in decision making The trends and research all point to a single conclusion: organiza‐ tions, customers, and employees seek a data-driven environment and they value those applications that make it easier to make sense of all the data that is available for decision making This report examines the architecture and characteristics that allow software vendors and developers to meet this need and increase the value of their application by using embedding analytics What are the basic requirements of embedded analytics on a modern data platform? Essentially, to provide speed-of-thought interaction with powerful visualizations that help knowledge work‐ ers take action on data While data comes from multiple sources (some of it historical, some of it streaming in real time) and is stored with various technologies for cost efficiencies, all these complexities must be managed by the visual platform, delivering visual analytics in a wide variety of formats and devices The following list summa‐ rizes the requirements of an embedded analytics tool on a modern data platform: • Integrate with the modern data architecture by accepting data from a wide variety of input sources, where each may have dif‐ ferent data types • Process large amounts of data quickly, and respond to interac‐ tive requests within seconds • Embed easily into web pages or other media browsers, includ‐ ing applications created by third-party developers • Scale up automatically and have the ability to process streaming data • Adhere to security restrictions, so users see only the data to which they are supposed to have access To begin, let’s examine in more detail the trends driving embedded analytics | Chapter 1: Delivering Embedded Analytics in Modern Applications Overview of Trends Driving Embedded Analytics For application vendors, embedding modern analytics is an oppor‐ tunity to provide added value, increasing customer stickiness and revenue For organizations, it is an opportunity to leap forward in their data-driven initiatives These initiatives are critical for compet‐ itive advantage and rely heavily on the following trends: Speed of change (velocity) In business, finance, and technology, trends that used to occur over weeks or months now take place within minutes and must be responded to as fast as possible Until recently, companies could get by with checking data every few months and changing their strategies a couple times a year But now, consumers and clients can get news within minutes of it happening, and change their preferences based on what’s posted to the internet Rollouts of new products are traditionally spaced across seasons (spring, summer, etc.) But nowadays, a trend can catch fire in a matter of days, thanks to the speed at which information spreads through social media Bad news travels even faster—if there is a problem such as when the Galaxy Note phone started to catch fire, markets shift instantly As the pace of change increases, having the right data and ana‐ lyzing it quickly to make decisions is key to the organization’s ability to switch directions much more rapidly Knowledge workers Although the term knowledge work first appeared in 1959, at that time the people whose decisions were crucial to organiza‐ tional success were stuck at their desks, perhaps in isolated offi‐ ces with nothing but a telephone and a pile of journals or stacks of mainframe printouts connecting them to information It took the arrival of the internet era to provide knowledge workers with continuous streams of data about the outside world, a resource particularly exploited by young people in the work‐ force These people know just as much about what is happening among clients, suppliers, and competitors as what is happening within their own organization And they are speaking up to demand access at speed-of-thought response times—seconds, not minutes or hours Overview of Trends Driving Embedded Analytics | Availability of data Sources of data that were unimaginable to earlier generations are now commonly used Today companies can enhance their internal data with social, weather, credit history, census data, and a wide range of data sets that are readily available For example, these data sets may include the dutifully logged behav‐ ior of web visitors, real-time updates on inventory and sales in retail stores, and the terabytes of data streaming in from sensors in the field While data may be more available, availability is only the first step in making data useful and impactful to the organization Data becomes relevant only when it is utilized—that is, when people act or make decisions based on it The challenge of mak‐ ing data useful to drive actions is even harder with so much data available One reason for this is that it is necessary to combine multiple sources of information (such as customer interactions, real-time transactions, social communication, and location data) to obtain insights that were not available before Another reason is that understanding the data enough to explore it requires domain expertise, and users may easily get lost when dealing with large amounts of data Free exploration can be a gift or waste of time Finally, since some data may be sensitive, it is also important to restrict access to only authorized users Lowering costs Decreasing costs of technology and its components—especially storage and hardware—allow organizations to more with less Organizations typically archive data to tape storage, which is fine for emergency recovery from system failures, but does not allow instant access or fine-tuned queries The need for massive storage due to the large amounts of data collected, and the falling cost of disk storage, means organizations are now keeping billions of records within reach, contributing to the data explosion revolution As memory also shrinks in size, gets cheaper, and is distributed across clusters of commodity hardware, calculations that used to suffer from slow disk access can be carried out in primary memory Massive data sets are now stored in memory, allowing lightning-fast random access Analytical tools can also run inmemory on clusters of low-cost servers to process real-time streams in a timely manner | Chapter 1: Delivering Embedded Analytics in Modern Applications seen in data sources, channels of input, data storage, and visualiza‐ tions Product managers developing new modern applications are proba‐ bly faced with the complexities of Big Data and a modern data envi‐ ronment Embedding visual analytics into such an environment has considerations that differ from the traditional embedded analytics or BI in organizations The differences are related to the granularity of collected information, the use of immutable data, the way data is retrieved, the data models employed, the interactivity of the output, and the speed of data We’ll look at these differences in the following sections Collection Modern data sources go far beyond the transactional sales data that organizations historically collected and stored In the past, customer interaction was limited to data captured in transactional business applications, but today customer interaction and engagement is cap‐ tured from multiple sources—clickstream, social, transactions—and the information often streams in real time With the Internet of Things, organizations also track sensors, each of which transmits a stream of data that updates its status every minute, or even every second Today’s organizations need to manage data that can be transactional, real-time streams, or historical data To manage both history and real time, an important concept has emerged in the last years—the master data set (and, relatedly, immutable data) The master data set is the source of truth in any system and cannot with‐ stand corruption In hybrid real-time and batch systems the master data set stores both the historical data as well as the real-time data coming from streams The approach is to store incoming data in the master data set, as raw as possible since rawness allows you to deduce more information To better understand this concept, see the intraday stock graph of Google, Amazon, and Apple on March 22 in Figure 1-2 If we consider close prices, AAPL gains 1.12%, AMZN gains 0.57%, and GOOG loses 0.10%, but at some point around 10:45 AM the three stocks were very close at 0.25% This kind of detailed analysis cannot be done only with open and close prices and requires enough data granularity, which translates into collect‐ ing more data and having more storage 16 | Chapter 1: Delivering Embedded Analytics in Modern Applications Figure 1-2 Stock comparison of Google, Amazon, and Apple Source: Google Finance, screenshot taken by Federico Castanedo Storage Data storage has spawned many new types of databases in the past 15 years Relational databases remain central, but new layouts such as Hadoop (which created its own Hadoop File System, or HDFS, format), document databases, and other formats falling under the broad umbrella of NoSQL are popular These flexible, analyticsfriendly formats are pushing flat files and spreadsheets out of the office to business processes; these have to handle ever-growing quantities of data As organizations face a growing amount of unstructured data, search frameworks such as Elasticsearch and Solr also become central to many applications, particularly because they can pull actionable information out of unstructured text They can help answer questions such as “Where people complain about food poisoning?” in restaurant reviews As we have seen, modern organizations use multiple sources with diverse formats, and want to keep the data in one place without copying it In contrast, traditional BI typically copies data into a data warehouse or data mart, stored in relational databases To manage multiple sources (variety of data) cost effectively, organizations and software vendors turn to modern and flexible storage architectures instead of relational databases So, the use of NoSQL databases is becoming more common Data from different sources is usually joined through metadata that links up data that goes together—for instance, matching a customer in a data set obtained from a broker with the customer who just vis‐ ited your business’s website Each data set will have a unique ID for each customer (barring duplicates, which are common and can be found through various analytic techniques) Foreign keys are com‐ Considerations for Embedding Visual Analytics into Modern Data Environments | 17 monly used in relational databases to link different tables But your application will have to match customers from different data sets in many formats To this, it’s useful to have a separate table, such as a dimension table Unlike with traditional databases, when you store immutable data in a NoSQL database, the information isn’t just updated but rather is added along with timestamp information This characteristic has the effect that data will be “eternally true,” meaning that data is always true because it was true at the time it was obtained This is also known as a fact-based system One salient feature of storing immuta‐ ble data is that you can recover your system at any point in time by referring to the last stored timestamp Of course, this comes at the cost of increasing the storage required Retrieval Modern data environments are challenged by the size and speed of data and often not have the luxury of moving the data or repli‐ cating it before analyzing it In traditional BI environments data is moved from the source systems into the data warehouse in a com‐ plex task called extract, transform, and load (ETL) Traditionally, ETL is an in-between step manually coded by a data expert for each BI application before users can explore the data This meant that data was most likely “old” by the time business users used it for their decisions Modern environments use data federation, which involves a much lighter-weight process of joining data from the various native formats in which it is stored, without physically moving it For example, a common pattern is to maintain reference data (cus‐ tomer name, address, etc.) in a relational database while placing incoming records about customer behavior in a more flexible and scalable data format such as Hadoop The relational database is val‐ uable for doing queries on relatively structure forms of data, but Hadoop is more appropriate for analytics that find new relationships among different types of customer attributes, such as age and shop‐ ping habits Data federation will define how visual analytics should merge the data for combined insights, without moving it The bene‐ fits are twofold: retain the integrity of the data and enable access to fresh data Data quality remains an issue in old and modern environments Some data cleanup is required, for example, removing outliers or harmonizing different strings like “CA” and “California.” Modern 18 | Chapter 1: Delivering Embedded Analytics in Modern Applications environments automate the cleanup process in some way or another, and there is a huge business around data normalization and auto‐ matic data cataloging with some startups like Tamr and Gamalon The speed of retrieving data for visualization and analytics is also impacted by organizations’ increased use of the cloud for storage More companies are finding it more convenient to rent storage in the cloud than to keep adding servers on-premise The varied dis‐ tances traveled by data in different locations may put extra stress on speed In addition, different cloud services offer different APIs, and the tools pulling the data have to understand and adapt to each API For these reasons, organizations often deploy a hybrid architecture where historical data may be stored on-premise because it is queried frequently and would tally up high costs if stored in a third-party cloud provider So modern visualization and analytics tools should be able to run queries across hybrid cloud and on-premise data, combining results where appropriate Data Models BI data warehouses develop semantic data models to establish rela‐ tionships between the data in different tables These models are just as useful in modern, real-time data processing, but the newer forms of intelligence can also discover new relationships in more loosely structured data Instead of having a relational database, modern environments use NoSQL databases with schema-free data models These more recent data models allow you to store unstructured or semistructured data Architecturally, an application should connect directly to data in its original storage format From the 1980s on, many businesses created data warehouses that collected data from diverse sources and organ‐ ized them in a single relational format The rationale behind this copying made sense at the time, because the data warehouse could comprehensively handle all the queries businesspeople made But in a real-time decision-making environment, duplicating and copying data is no longer viable, nor you always want data in a single warehouse format Hadoop, NoSQL databases, and streaming sour‐ ces were invented because they are superior for many common ana‐ lytics on certain kinds of data, especially real-time data You want to take advantage of all appropriate formats Considerations for Embedding Visual Analytics into Modern Data Environments | 19 Thus, application vendors must be flexible in relation to data sour‐ ces Even if you can support all the organization’s data sources now, you must also be able to incorporate new ones as the organization develops them Organizations may also move data into a cloud pro‐ vider, and switch cloud providers from time to time These migra‐ tions may be a good reason for buying a solution that has already developed connections with the common data sources Output Traditional BI output usually involves static charts and printed reports, whereas modern knowledge workers want interactive dis‐ plays that change in response to an iterative process of asking ques‐ tions of the data For many use cases it is impossible to define all possible queries that a user would request and generate the graphs of these queries ahead of time Instead, ad hoc queries of historical and fresh data are becoming more common and valuable to speed-ofthought analysis One important difference between traditional BI and modern ana‐ lytics in the form of data science is the added capability to predict outcomes using predictive analytics BI departments usually focus on descriptive analytics or reporting (analyzing historical data), whereas the goal of data science is to build predictive models that generalize well over future data, essentially predicting future out‐ comes It is valuable to be able to easily transition data and analytics between your BI tool and your data science tools such as Jupyter and Zeppelin Speed When a new BI report must be prepared, it often takes weeks in tra‐ ditional BI systems Even if data is drawn directly from the data warehouse and presented to the user, the warehouse itself may be updated nightly or every couple of days With streaming data updat‐ ing every second, there is a significant time lag between the data available at the source and data available at the data warehouse In-memory processing, which is supported by Spark and inmemory databases, is crucial for handling the speed of incoming data This means configuring systems with adequate memory, whether on your own servers or in the cloud 20 | Chapter 1: Delivering Embedded Analytics in Modern Applications Modern visualization and analytics can perform at the speed of data at its sources, meeting the demand to work with fresh data—the lat‐ est information available about sales, web visits, or traffic move‐ ments—in order to make better and faster decisions For instance, in the case of a pipeline company, sensors may return data once a sec‐ ond about the speed of a liquid’s flow along with the pressure and heat of certain components in the pipeline The person reviewing this data may also be interested in prices and sales volumes, to deter‐ mine whether to make a change in production Thus, the pipeline example illustrates the need for both real-time and historical data Location data and timestamps are therefore important parts of the data Microservices Large data sources, on the scale of billions of rows and thousands of users, call for large-scale services as well The tools that process and display data must be able to scale The modern paradigm for this scaling is called microservices On the programming level, microser‐ vices divide code rigorously into separate modules to make mainte‐ nance and upgrades easier without breaking other parts of the application On the deployment level, microservices are run in sepa‐ rate containers that can be created quickly and taken out of commis‐ sion cleanly as needed Microservices make it particularly easy to incorporate third-party services, so that vendors can collaborate to provide complete solutions Parallelism Parallelism is a common programming technique for speeding queries First of all, different data stores can be consulted in parallel Second, each data store can be partitioned in intelligent ways that anticipate how queries will run For instance, if data for each month is stored on a separate disk, results from multiple months can be queried in parallel Visualizations can also start with the outlines of retrieved data and fill in details gradually as they arrive, the way interlaced graphics show up gradually in some web browsers Interactive Visualizations Finally, visualizations need to be interactive, because the volume and variety of data being offered is too large to show everything in a Considerations for Embedding Visual Analytics into Modern Data Environments | 21 chart An example of interactivity was presented in the use cases shared earlier, where a user can pull up statistics about insurance quotes, then drill down into the details of a single quote The user may also want to show a year’s worth of data and then focus on a single month that looks unusual The visualization will now only have to pull up new data from storage to satisfy on-the-fly queries, but will have to choose the appropriate chart or other format for dis‐ play For a deeper dive into the requirements for visualization, see the next section Summary A recent article in the Harvard Business Review calls for more auto‐ mation of data analysis, so that CEOs and other knowledge workers can focus on decision making rather than trying to figure out the trends in data Modern analytics tools for organizing and presenting data need to support such sophisticated visualizations To summa‐ rize, modern analytics environments embedded in applications need to: • Take data from many different sources—modern and traditional —which have to be combined along one or more dimensions • Connect directly to data in its original storage format—no movement of data • Support both streaming and historical data as well as the ability to explore across both • Automatically update visualizations with new real-time data without requiring user requests • Embed visualizations into a variety of web pages and applica‐ tions • Pull up data quickly in response to interactive queries • Scale to allow the analysis of large input data sources (scaling is enabled through microservices and parallel implementations) • Understand the access rights and restrictions imposed by the data stores in order to scrupulously monitor security Organiza‐ tions are very concerned with restricting access to data that pro‐ vides a competitive advantage, such as the sales data for each product They may also need to impose restrictions to protect the privacy of their customers 22 | Chapter 1: Delivering Embedded Analytics in Modern Applications Deep Dive: Visualizations Visualization is the component of the modern embedded analytics platform that the end users interact with the most As such, it war‐ rants its own section, discussing its unique requirements in the modern data environment Visualizations in a modern environment should cover the characteristics outlined next Flexibility In your application, you would want to integrate the charts into web pages or touch devices with the following flexibility: • A user visiting her company’s product listing will want to see current sales directly on that page, rather than having to open a new page or application • The visualizations should also be mobile-friendly, of course, because so many people check data on the move, on mobile devices • The visualizations must also be adapted to real-time streaming data For instance, a web page may pull new information once per second about the ad impressions served by the organization, and update a chart • The visualizations may include additional calculations For instance, the chart may show an average number of ad impres‐ sions per second, in which case the average must be recalculated as incoming data is added In traditional BI, the user had to manually submit a request for each chart or table That worked fine in a static environment where data was updated on a weekly or monthly basis Dynamic data environments must interact with sources behind the scenes and update displays without requests from the user Ease of Use Big Data should lead to big insights, but it doesn’t always work out that way Most BI technologies are tools or workbenches that require extensive training before users can perform even rudimentary anal‐ ysis In the context of embedding analytics, ease of use, modern look-and-feel, and intuitiveness are key, as they improve the end Deep Dive: Visualizations | 23 user’s perception of the entire application Embedded analytics on Big Data and streaming data should be simple, intuitive, and collab‐ orative as users visually interact with data Ease of use in embedded analytics includes these features: • Speed-of-thought interaction Whether it is streaming data, real-time data, Big Data, or unstructured data, users expect to work with the most up-to-date data Users are also impatient— visualizations should respond within seconds so they don’t have to wait for the results • Guided visualizations and dashboards providing a high-level overview as well as allowing for easy drill-down to details • Support dashboards as well as exploratory data analysis, which helps users generate questions by revealing the depth, range, and content of their data stores • Intuitive time-series controls for streaming data, with the ability to switch between real-time and historical data • Easy data blending as well as the ability to add visualizations of various data sources on the fly • Collaboration and the ability to share visualizations with the rest of the team or organization • An inviting user interface that is well suited to the business and less technical users Filters for Guided Analytics Data retrieval and visualization always involve some sort of filtering A dashboard may start by presenting the user with certain com‐ monly requested information—such as total income and topperforming stores—and then respond to requests for other information Strategically restricting what the user sees, so as to be of maximum value, is called guard-railing It is also key to guided dis‐ covery, where a software vendor can offer clients tremendous value A typical example of guided discovery is a pharmaceutical company that offers broad data access to its analysts but just a dashboard with key performance indicators to salespeople The salespeople, how‐ ever, can still expand their view of the data through such actions as drilling down into items or changing the timeframes Software ven‐ dors can substantially help their customers by bringing the domain 24 | Chapter 1: Delivering Embedded Analytics in Modern Applications knowledge to distinguish what different types of users need in an interface The SQL language supports the most common types of filtering Users ask questions such as “What was the best-selling product at each store?” or “How many people bought corn chips in the 48 hours before the Superbowl game?” These questions translate to the following typical query filters: • Retrieving filtered fields or columns, such as geographic loca‐ tion, age, purchase date, and product type • Grouping results by the same criteria, such as by city or month • Limiting displays to the top results, such as the most recent data or the highest priced products Switch Between Real-Time and Historical Data Users also switch between real-time data and historical data They may, for instance, want to look at sales data over a period of years to see what the most popular products have been—the cash cows for the company—and then look at streaming data for these products to see whether they are increasing or decreasing in popularity This rapid switching requires not only access to the data storage for these different types of information, but also the ability to join them as indicated earlier In this way, if the user pulls up a product from one data source and clicks on it in the display, everything about that product can be retrieved from a new data source Advanced stream‐ ing visualization tools introduce a slider that lets you slide along the timeline between real time and history, changing the data that is dis‐ played Embedded Visualizations Visualization is always part of a larger decision-making process and therefore must often be embedded into a web application Simple adaptations include using an iFrame to drop a visualization into a page But the combination of JavaScript and CSS allows application developers to integrate the visualization in more subtle and supple ways Vendors need this in order to provide a seamless, “white label” display that doesn’t call attention to its use of software from different sources Customizations may be as simple as applying the font size Deep Dive: Visualizations | 25 and color of the larger page to the embedded visualization, or may fine-tune what is displayed (the type of graph, labels on axes, etc.) Zoomdata’s RESTful API, for example, allows web developers to make queries and display them in any manner that is appropriate to a particular user at a particular time As depicted in Figure 1-3, embedded analytics can be divided into the following categories (from light to deep): White-Label Allows you to rebrand the BI and analytics application tool’s user interface and visualizations to match your organization’s established look-and-feel standards, including attributes such as colors, fonts, logos, and more Customized Allows for rapid lightweight embedding of dashboards and vis‐ ualizations into your application using simple web techniques such as iFrames Infused Analytics Provides a deeper API-based integration in your application such that the embedded analytic content appears to be a native part of the application Extended Meets the specific needs of your application by adding custom visualization types and data source connectors Figure 1-3 Categories of embedded contextual analytics Source: Zoomdata Users want data quickly and in a form they can easily interpret Retrieving the particular information they want from potentially bil‐ 26 | Chapter 1: Delivering Embedded Analytics in Modern Applications lions of records, and formatting it within seconds, requires a modern approach to storage, retrieval, and display To meet the needs of modern visualizations, application developers need a wide range of skills to perform data retrieval, querying, and data presen‐ tation Product managers should also bear in mind that the skills required today are rapidly changing with new technologies being introduced An application might use Hadoop today and move to a cloud datastore tomorrow Data sources available today may become legacy in the near future With limited availability of resources, it is worthwhile to invest time in finding the right platform that delivers superb modern visual analytics Therefore, it is also worthwhile to investigate tools that can already these things, rather than rein‐ vent several wheels at once Developers and software vendors should consider the unique chal‐ lenges of the modern data environment as they assess options for embedding analytics Developing a framework that supports a modern data environment in-house would require significant resources The requirements may continue to change and with them the need for an ongoing investment in skills and staffing, which takes away from the focus on the application vendor’s core compe‐ tency This is why many software vendors look for third-party visual analytics platform providers who can handle their requirements for embedding analytics directly into the everyday workflow of their application You may use the following table as a simple guide when evaluating modern analytics tools (1 being most likely and being least likely) Criterion Easy to embed? Designed for integration? Fast time to value? Flexible deployment options (cloud, on-premise, hybrid)? Connects to wide set of modern data sources? Extensible—add custom data sources? Extensible—add custom visualization types? Speed-of-thought performance? Support for streaming data? Support for blending of data from multiple sources without data movement/ETL? Deep Dive: Visualizations | 27 Conclusion To recap, your goals as a software application vendor typically include growing revenues and retaining customers through main‐ taining and improving customer satisfaction with your product In order to support these goals, ask yourself if your current solution for BI and analytics within your product is lacking, either because it was homegrown and analytics is not your core competence, or because you have previously embedded a now-outdated legacy ana‐ lytics tool If so, then you may want to consider embedding a modern data analytics platform into your application that infuses data visualization and analytics in a way that is attractive and com‐ pelling for prospective and existing customers As a technical organization you might think the best approach is to homegrow the visualization and analytics capabilities within your application That may be possible, but first ask yourself the follow‐ ing important questions: • Are we capable of meeting and beating the capabilities of modern analytical platforms built by vendors whose sole focus is BI and analytics? • Is this the most efficient use of our time and resources? • What is the opportunity cost of homegrown visualization and analytics versus investing in the core features of our application? After reading this report you may be wondering how close your organization is to providing real-time and interactive Big Data visu‐ alizations We suggest completing the self-assessment rating chart in Appendix A 28 | Chapter 1: Delivering Embedded Analytics in Modern Applications APPENDIX A Self-Assessment Rating Chart The Self-Assessment Rating Chart provides a quick checklist of requirements that will help you determine how close your organiza‐ tion’s BI application is to providing real-time, interactive Big Data visualizations Try rating your organization on a scale from to 6, with being unprepared and being totally prepared Criterion Able to receive data from all sources? Easy to add new sources? Joins data from multiple sources seamlessly? Accepts real-time streams? Works with common cloud providers? Easy to scale flexibly, such as through microservices? Recognizes underlying fine-grained security controls? Runs in memory? Supports common filtering, grouping, and limits? Embeddable through JavaScript and CSS, for white labeling? Mobile friendly, with the same features available on desktop and mobile devices? Able to alter display in response to user interaction? Response time of a few seconds? Updates automatically in real time? Able to connect historical and real-time data? 29 About the Authors Federico Castanedo is the Lead Data Scientist at Vodafone Group in Spain, where he analyzes massive amounts of data using artificial intelligence techniques Previously, he was Chief Data Scientist and cofounder at WiseAthena.com, a startup that provides business value through artificial intelligence For more than a decade, he has been involved in projects related to data analysis in academia and industry He has published several sci‐ entific papers about data fusion techniques, visual sensor networks, and machine learning He holds a PhD in Artificial Intelligence from the University Carlos III of Madrid and has also been a visiting researcher at Stanford University Andy Oram is an editor at O’Reilly Media An employee of the company since 1992, Andy currently specializes in programming topics His work for O’Reilly includes the first books ever published commercially in the United States on Linux, and the 2001 title Peerto-Peer ... Cybersecurity Cybersecurity is a field that relies on large-scale network security and cybersecurity data sets The objective for cybersecurity is to improve situational awareness and shorten the time it. .. tunity to provide added value, increasing customer stickiness and revenue For organizations, it is an opportunity to leap forward in their data-driven initiatives These initiatives are critical... high quality of care Kaiser Permanente is one of the largest not-for-profit health plans in the US, serving more than 11.3 million members It has 38 hospitals and 626 outpatient facilities Although

Ngày đăng: 12/11/2019, 22:16