Artificial Intelligence for Big Data Complete guide to automating Big Data solutions using Artificial Intelligence techniques Anand Deshpande Manish Kumar BIRMINGHAM - MUMBAI Artificial Intelligence for Big Data Copyright © 2018 Packt Publishing All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews Every effort has been made in the preparation of this book to ensure the accuracy of the information presented However, the information contained in this book is sold without warranty, either express or implied Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals However, Packt Publishing cannot guarantee the accuracy of this information Commissioning Editor: Sunith Shetty Acquisition Editor: Tushar Gupta Content Development Editor: Tejas Limkar Technical Editor: Dinesh Chaudhary Copy Editor: Safis Editing Project Coordinator: M anthan Patel Proofreader: Safis Editing Indexer: Priyanka Dhadke Graphics: Tania Dutta Production Coordinator: Aparna Bhagat First published: M ay 2018 Production reference: 1170518 Published by Packt Publishing Ltd Livery Place 35 Livery Street Birmingham B3 2PB, UK ISBN 978-1-78847-217-3 www.packtpub.com mapt.io Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career For more information, please visit our website Why subscribe? Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals Improve your learning with Skill Plans built especially for you Get a free eBook or video every month Mapt is fully searchable Copy and paste, print, and bookmark content PacktPub.com Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy Get in touch with us at service@packtpub.com for more details At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks Contributors IBM cognitive toolkit based on Watson IBM initially developed Watson as an engine that could play the game of Jeopardy In this game, a human moderator asks questions in a somewhat cryptic manner in natural language The question is heard by all the participants at the same time The players can press a buzzer to indicate that they are ready with an answer The first player to press the buzzer gets the chance to answer the question Watson was successful in outperforming the Jeopardy world champion in year 2010 As we can see, this process also goes through the Observe | Interpret | Evaluate | Decide cycle Here is the high-level architecture of IBM Watson as an intelligent machine that can answer questions in natural language: > Figure 12.6: High level architecture of IBM Watson as an intelligent machine "The computer's techniques for unravelling Jeopardy! clues sounded just like mine That machine zeroes in on keywords in a clue then combs its memory (in Watson's case, a 15-terabyte databank of human knowledge) for clusters of associations with those words It rigorously checks the top hits against all the contextual information it can muster: the category name; the kind of answer being sought; the time, place, and gender hinted at in the clue; and so on And when it feels sure enough, it decides to buzz This is all an instant, intuitive process for a human Jeopardy player, but I felt convinced that under the hood my brain was doing more or less the same thing." –Ken Jennings (one of the best players in Jeopardy) After the initial success of Watson as a Jeopardy engine, IBM has evolved Watson into Cognitive Intelligence as a Service and it is available on IBM cloud The Cognitive System enablers that we have seen earlier in this chapter (Data, Computation, Connectivity, Sensors, Understanding of human brain functioning, Nature, and collective intelligence) are made available with a common interface on the platform Watson-based cognitive apps At the time of writing, IBM supports the following cognitive applications as services on the IBM Cloud platform: Watson assistant: This application was formally named as "Conversation" This application makes it easy to add a natural language interface to any application It is easy to train the model for the domain-specific queries and implement customized chatbots Discovery: This application enables search into the user's documents as well as a generic cognitive keyword based search on the internet The service delivers connections, metadata, trends, and sentiment information by default It is possible to input data from local filesystems, emails, and scanned documents in unstructured format It is also possible to connect to enterprise storage repository (sharepoint) or a relational database store It can seamlessly connect to the content on cloud storages Knowledge Catalog: The application facilitates organization of data assets for experimenting with various data science algorithms and hypothesis A data science project in the knowledge catalog contains data, collaborators, notebooks, data flows, and dashboards for visualization Watson knowledge catalog is a handy and useful application when there are thousands of datasets and hundreds of data scientists who need access to these datasets simultaneously and need to collaborate The knowledge catalog provides tools to index the data, classify the documents, and control access based on the users and roles The application supports three user roles Administrators with full control over the data assets, Editors who can add content to the catalog and grant access to various users, and Viewers who have role-based access to data assets Language Translator: This is an easy to use application that is a handy tool that can be easily incorporated within mobile and web applications in order to provide language translation services This can facilitate development of multilingual applications Machine Learning: With this app we can experiment and build various machine learning models in a context sensitive assisted manner within the Watson studio The models are very easy to build with model builder web application available on IBM cloud The flow editor provides a graphical user interface to represent the model and this is based on SparkML nodes representation of the DAGs (Directed Acyclic Graphs) Natural Language Understanding: This is a cognitive application which makes it easy to interpret the natural language based on pre-built trained models It makes it very easy to integrate within mobile and web applications The app supports identification of concepts, entities, keywords, categories, sentiment, emotion and most importantly semantic relationship between the natural language text presented as input Personality Insights: This application gets as close as possible to cognitive intelligence human beings demonstrate while interacting with each other We judge a person by the use of specific words in the language, the assertion in making certain statements, pitch, openness to ideas from others, and so on This application applies linguistic analytics and personality theory using various algorithms and comes up with a Big Five, Needs, and Values score based on the text available in Twitter feeds, blogs, or recorded speeches from a person The output from the service is delivered in a JSON format that contains percentile scores on various parameters, as seen in the following screenshot: Figure 12.7: Percentage scores on various parameters Speech to Text and Text to Speech: These are two services to add the speech recognition capabilities to the enterprise applications The services transcribe the speech from various languages and a variety of dialects and tones The services support broadband and narrow-band audio formats The text transmissions (requests and responses) support JSON format and UTF-8 character set Tone Analyzer: This is another cognitive skill that we humans possess From the tone of a speaker, we can identify the mood and the overall connotation This determines the overall effectiveness of a specific communication session when it comes to call centers and other customer interactions The service offerings can be optimized based on the detected tone of the client This service leverages cognitive linguistic analytics for identification of various types of tones and categorize emotions (anger, joy, and so on), social nature (openness, emotional range, and so on), and language styles (confident and tentative) Visual Recognition: This services enables applications to recognize images and identify objects and faces that are uploaded to the service The tagged keywords are generated with confidence scores The service utilizes deep learning algorithms Watson Studio: This service makes it very easy to explore machine learning and cognitive intelligence algorithms and embed the models into the applications The studio provides data exploration and preparation capabilities and facilitates collaborations among project teams The data assets and notebooks can be shared and visualization dashboards can be easily created with the Watson Studio interface Developing with Watson Watson provides all the services listed previously along with many more on IBM Cloud infrastructure There is a consistent web-based user interface for all the services, which enables quick developments of the prototypes and tests The cognitive services can be easily integrated within the applications since most of those work with REST API calls to the service The interactions with Watson are secure with encryption and user authentication Let us develop a language translator using Watson service Setting up the prerequisites In order to leverage IBM Watson services, we require an IBMid: Create an IBMid at https://console.bluemix.net/registration/?target=%2Fdeveloper%2Fwatson%2Fdashboard Log in to IBM Cloud with the login name and password Browse the Watson services catalog at https://console.bluemix.net/catalog/?search=label:lite&category= watson: Figure 12.8: IBM services catalog Select Service Name (you can use the default name), region/location to deploy the service in, and create the service by clicking on the Create button Create the service credentials (username and password) for authenticating the requests to your language translation service: Figure 12.9: Language translator Once we get the service credentials along with URL endpoint, the language translator service is ready to serve the requests for translating text between various supported languages Developing a language translator application in Java We proceed as follows: Create a Maven project and add the following dependency for including Watson libraries: com.ibm.watson.developer_cloud java-sdk 5.2.0 Write the Java code for calling various API methods for LanguageTranslator: package com.aibd; import com.ibm.watson.developer_cloud.language_translator.v2.*; import com.ibm.watson.developer_cloud.language_translator.v2.model.*; public class WatsonLanguageTranslator { public static void main(String[] args) { // Initialize the Language Translator object with your authentication details LanguageTranslator languageTranslator = new LanguageTranslator("{USER_NAME}","{PASSWORD}"); // Provide the URL end point which is provided along with service credentials languageTranslator.setEndPoint("https://gateway.watsonplatform.net/language-translator/api"); // Create TranslateOptions object with the builder and adding the text which needs to be // translated TranslateOptions translateOptions = new TranslateOptions.Builder() addText("Artificial Intelligence will soon become mainstream in everyone's life") modelId("en-es").build(); // Call the translation API and collect the result in TransalationResult object TranslationResult result = languageTranslator.translate(translateOptions) execute(); // Print the JSON formatted result System.out.println(result); // This is a supporting API to list all the identifiable languages IdentifiableLanguages languages = languageTranslator.listIdentifiableLanguages() execute(); //System.out.println(languages); // The API enables identification of the language based on the entered text IdentifyOptions options = new IdentifyOptions.Builder() text("this is a test for identification of the language") build(); // The language identification API returns a JSON object with level of confidence // for all the identifiable languages IdentifiedLanguages identifiedLanguages = languageTranslator.identify(options).execute(); //System.out.println(identifiedLanguages); // API to list the model properties GetModelOptions options1 = new GetModelOptions.Builder().modelId("en-es").build(); TranslationModel model = languageTranslator.getModel(options1).execute(); //System.out.println(model); } } Output # 1: The translation output is returned in JSON format, which contains a number of words that are translated, the character count, and the translated text in the target language based on the model that is selected: { "word_count": 9, "character_count": 70, "translations": [ { "translation": "Inteligencia Artificial pronto será incorporar en la vida de todos" } ] } Output # 2: The listIdentifiableLanguages provides the list of languages that are supported in JSON format: { "languages": [ { "language": "af", "name": "Afrikaans" }, { "language": "ar", "name": "Arabic" }, { "language": "az", "name": "Azerbaijani" }, { "language": "ba", "name": "Bashkir" }, { "language": "be", "name": "Belarusian" }, Output # 3: The service provides API for identifying the language of the text that is provided as input This is a handy feature for the mobile and web applications where the user can key-in text in any language and the API detects the language and translates into the target language The output is presented in JSON format with the confidence score for each language In this case, the service is returning language as English (en) with 0.995921 confidence: { "languages": [ { "language": "en", "confidence": 0.995921 }, { "language": "nn", "confidence": 0.00240049 }, { "language": "hu", "confidence": 5.5941E-4 }, Output # 4: The model properties can be displayed with the GetModelOptions API call: { "model_id": "en-es", "name": "en-es", "source": "en", "target": "es", "base_model_id": "", "domain": "news", "customizable": true, "default_model": true, "owner": "", "status": "available" } Frequently asked questions Q: What are the various stages of AI and what is the significance of cognitive capabilities? A: In terms of applicability and its resemblance level with the human brain, AI can be divided into three stages Applied AI is the application of machine learning algorithms on the data assets in order for the smart machines to define the next course of action These smart machines operate on the models that can operate within a pre-defined environmental context as well as to a certain degree work within stochastic environments This level of AI is generally available and is finding use cases and applications in our day to day lives Cognitively Simulated AI is the next stage in AI development In this stage, the intelligent machines are capable of interfacing with human beings in a natural format (with speech, vision, body movements and gestures, and so on) This type of interface between man and machine is seamless and natural and the intelligent machines in this stage can start becoming complementary to human capabilities The next stage is Strong AI with which we intend to develop intelligence machines that match or exceed human cognitive capabilities With the availability of large volumes of data along with the machine's brute-force, potentially these intelligent machines can fully augment human capabilities and help us define solutions for some of the most difficult problems and open new frontiers in AI At that point, it will be difficult to differentiate the intelligent machines from human beings in terms of their cognitive intelligent behavior Q: What is the goal of Cognitive Systems and what are the enablers that move the systems towards the goal? A: The primary goal of developing Cognitive Systems is to create intelligent machines that supplement and augment human capabilities while keeping the interface between man and machine through primary senses Instead of interacting with keyboard, mouse with the machine, we interface through the five primary senses and mind as the sixth organ and sense The most important enabler for the development of Cognitive Systems that incorporate strong AI is availability of data and computation power to process the data Q: What is the significance of big data in development of Cognitive Systems? A: The theory of machine learning, various algorithms, and Cognitive Systems has existed for decades The acceleration in the field has started with the advent of big data The systems learns from the past patterns that can be searched in the data The supervised learning and learning models are more accurate with availability of large volumes of data Big data also allows the systems to have access to heterogeneous data assets that provide key contextual insights within the environment, which makes the intelligent machines more informed and hence enables wholistic decision making Cognitive Systems also get benefit from the availability of big data assets The knowledge that is available in unstructured format can be utilized with the use of cognitive intelligence and it opens an entirely new frontier for Cognitive Systems Summary In this chapter, we were introduced to cognitive computing as the next wave in the development of artificial intelligence By leveraging the five primary human senses along with mind as the sixth sense, the new era of Cognitive Systems can be built We have seen the stages of AI and the natural progression towards strong AI along with the key enablers for achieving strong AI We have also seen the history of Cognitive Systems and observed that the growth is accelerated with availability of big data, which brings large data volumes and the processing power in a distributed computing framework While the human brain is far from being fully understood, the prospects are looking great with the pioneering work done by some of the large companies that have access to the largest volumes of digital data The consistent push towards democratizing the AI by enabling AI as a service, these companies are accelerating research for the entire community In this book, we have introduced some of the fundamental concepts in Machine Learning and AI and discussed how big data is enabling accelerated research and development in this exciting field However, just like any new tool or innovation in our hand, as long as we not lose sight of the overall goal to complement and augment human capabilities, the field is wide open for more research and some of the exciting new use cases that can become mainstream in the near future Other Books You May Enjoy If you enjoyed this book, you may be interested in these other books by Packt: Artificial Intelligence with Python Prateek Joshi ISBN: 9781786464392 Realize different classification and regression techniques Understand the concept of clustering and how to use it to automatically segment data See how to build an intelligent recommender system Understand logic programming and how to use it Build automatic speech recognition systems Understand the basics of heuristic search and genetic programming Develop games using Artificial Intelligence Learn how reinforcement learning works Discover how to build intelligent applications centered on images, text, and time series data See how to use deep learning algorithms and build applications based on it MySQL for Big Data Shabbir Challawala ISBN: 9781788397186 Explore the features of MySQL and how they can be leveraged to handle Big Data Unlock the new features of MySQL for managing structured and unstructured Big Data Integrate MySQL and Hadoop for efficient data processing Perform aggregation using MySQL for optimum data utilization Explore different kinds of join and union in MySQL to process Big Data efficiently Accelerate Big Data processing with Memcached Integrate MySQL with the NoSQL API Implement replication to build highly available solutions for Big Data Leave a review - let other readers know what you think Please share your thoughts on this book with others by leaving a review on the site that you bought it from If you purchased the book from Amazon, please leave us an honest review on this book's Amazon page This is vital so that other potential readers can see and use your unbiased opinion to make purchasing decisions, we can understand what our customers think about our products, and our authors can see your feedback on the title that they have worked with Packt to create It will only take a few minutes of your time, but is valuable to other potential customers, our authors, and Packt Thank you! .. .Artificial Intelligence for Big Data Complete guide to automating Big Data solutions using Artificial Intelligence techniques Anand Deshpande Manish Kumar BIRMINGHAM - MUMBAI Artificial Intelligence. .. progressively learn about Artificial Intelligence for Big Data starting from the fundamentals and eventually move towards cognitive intelligence Chapter 1, Big Data and Artificial Intelligence Systems,... brute force Best of both worlds Big Data Evolution from dumb to intelligent machines Intelligence Types of intelligence Intelligence tasks classification Big data frameworks Batch processing