Microsoft Word Demo Draft docx Draft of Chapter 1 With Regards Ritesh Bhagwat Note Following is a draft version Introduction to Data Driven computing AI The journey into the world of artificial inte.
Draft of Chapter With Regards Ritesh Bhagwat Note: Following is a draft version Introduction to Data Driven computing & AI The journey into the world of artificial intelligence is extraordinary It is extraordinary because it shows us that by just changing our perspective towards something we already know; we can learn something new and amazing The base of all Artificial Intelligence is built on things that we all have probably already studied in our school and college If someone has studied math up to high school level, then chances are that there would be nothing new in this book But I can assure you that we will learn something new from all the things that we already know Everything will be built on things that we know This is the very beauty of Artificial Intelligence So, let us get started with this journey of a lifetime When I was growing up as a teenager in the 1990’s the way to stand out in a conversation was by talking intelligent scientific things If we knew what is a “light year” or what does supersonic mean, we appeared Intelligent If we could a complex Math calculation quickly, we were hailed as a genius All these traits were essentially about remembering data and manipulating it For a very long-time human intelligence is judged on our memory and how we process the memory (data) stored in our brains With the arrival of smartphones and similar technology, the definition of Human intelligence is going to change With Artificial Intelligence, all the data processing is done in a gadget as small as our phone, the need to remember or memorize data will not be so meaningful But we as human beings will have to evolve at a higher level May be Knowledge and information will be for a Machine and wisdom will be for us humans which is a good thing as we will evolve in consciousness If you are reading this book, I’m sure you must have heard that AI is taking over and AI is going to change the world Have you ever wondered what is AI? How is this AI or modern computing different from traditional computing? Let us a fun activity to understand why we need AI and what type of question does AI try to solve The following table lists down 10 famous personalities of the world along with their domain of work and gender S No Name Domain of Work Gender Roger Federer Sachin Tendulkar Mahatma Gandhi Sports Sports Leadership M M M Steffi Graf Nelson Mandela Sports Leadership F M Robert Downey Junior Art & Movies M Tom Cruise Art & Movies M Steve Jobs Tech Scarlett Johansson Art & Movies F Tech & Philanthropy M 10 Bill gates M Now try to answer the following questions How many of the above personalities are female? Name the personalities whose domains of work is Tech Are there any female personalities included in the above list whose domain of work is Art and movies? Who are the top two most popular personalities from the list? Whose domain of work is the best from the above personalities? The answers to the first questions are very simple 2 Steve Jobs & Bill Gate Yes (Scarlett Johansson) How about the fourth and the fifth questions? Do we have a universal answer to the fourth and the fifth question? No, we don’t These questions are subjective, not well defined or are vague in nature For every individual the definition of a famous personality or best domain of work is different If we have 1000 respondents to answer these questions: • • When correct, all the 1000 respondents will give the same answers to 1st three questions We will most probably get different answers for the 4th & 5th question Let us talk about the example in a different way We can get the answers to the first three questions by setting rules We can write a simple program that will scan the domain of work and gender of our personalities and we will get the answers So, we have data we set rules and we get answers Now how can we answer the fourth and the fifth question? The best we can is ask all our 1000 respondents to vote with their answers The most common answer becomes the rule If for the fifth question the most common answer is that the best line of work is “Art & Movies” then it becomes our answer Do remember if we change the number of our respondents then the answer can also change So, what we are doing here is we have data we give answers to the data and we get a rule We can see the difference between the two approaches One is rule-driven, and another is answer driven A problem to which we cannot set rules to get answers qualifies to be a problem which should be solved by Artificial Intelligence In the context of AI, the problems to which we can give solutions with rules come under the category of traditional or classical computing and the one where we can’t set rules are the area of Artificial Intelligence which I loosely term as Modern computing Do note that the rule-based questions can also be solved by artificial intelligence, but it is not worth it to solve those with AI as it is computationally expensive It is like you have a pizza, a knife and a sword You should always use a knife to cut the pizza and not the sword You can cut the pizza with a sword, but it is not worth it Now that we have a bit of understanding about AI, let us try and understand with a simple example of what we mean when we say that everything in AI is built of things that we already know Suppose you run an OTT platform like Netflix of Amazon Prime and you have three loyal customers Steve, Natasha and Tony There customers watch movies and give them ratings A fourth customer Scott logs into the platform and watches two movies, Iron Man and Jerry Maguire and has given his rating to the movies We have to recommend Scott more movies to keep him hooked to our platform How can we this using the data that we have? If we can figure out a way by which we can know who out of Steve, Tony and Natasha has movie preferences like Scott then we can recommend Scott other movies watched by that customer To our surprise we can use high school math to this! Let us see how it works Let’s assume the ratings given by customers is represented in the following table Iron Man Steve Natasha Tony Scott Scott logs in Jerry Maguire Easiest way to see which of the three customers’ preference is closest to Scott is by subtracting the scores given by Scott from the scores given by other customers In other words, we can find the “distance” between Scotts’ score from score of Steve, Natasha & Tony The simplest way to see the distance is by subtracting the corresponding values of the movie rating given by Scott and given by other customers Let us say the rating of Iron man is represented by x and Jerry Maguire by y So, the distance can be calculated by the formula: • |x1-x2| + |y1-y2| Where: x1 = Rating of Iron man by old customer (Steve or Natasha or Tony) x2: Rating of Iron man by Scott y1: Rating of Jerry Maguire by old customer (Steve or Natasha or Tony) y2: Rating of Jerry Maguire by Scott |K| represents the absolute value of K which will always be positive If K= then • • |3| =3 |-3| =3 This process of computing distances using absolute value is known as Manhattan Distance Jerry Maguire Iron Man Steve Natasha Tony Scott Scott logs in 4 Referring the above table, the Manhattan distance between: • • • Scott and Steve = |4-2| + |2-1| = Scott & Natasha = |4-3| + |2-3| =2 Scott & Tony = |4-5| + |2-4| =2 So, we can see that the Manhattan distance between Natasha and Scott is the lowest of all three hence Natasha’s movie preferences should be closer to that of Scott We can go ahead and recommend Scott, all the other movies watched by Natasha and highly rated by her Chances are that he will also like those movies This was the Manhattan distance We all are more familiar with something known as Euclidean distances Euclidian distance can also be used to solve the same problem Euclidian distance is calculated by formula: • • • • x1 = Rating of Iron man by old customer (Steve or Natasha or Tony) x2: Rating of Iron man by Scott y1: Rating of Jerry Maguire by old customer (Steve or Natasha or Tony) y2: Rating of Jerry Maguire by Scott Euclidean distance between • • • Scott and Steve: Sqrt (5) = 2.24 Scott and Natasha: Sqrt (2) = 1.41 Scott and Tony: Sqrt (5) = 2.24 We can see that the Euclidean distance between Natasha and Scott is the lowest In the case of Euclidean distance, we have another way of representing the dataset We can represent the dataset in an X-Y coordinate system Here x axis represents Iron man and y axis represents Jerry Maguire Steve has given a rating of to iron man and to Jerry Maguire so the coordinate representation is (2 ,1) Same concepts follow to everyone’s ratings Just looking at the plot here, we can see that Natasha and Scott are closest to each other and hence they have same preferences of the movies We have two movies so we have a two-dimensional space, if we had three movies, we would have moved up to a three-dimensional space and if we have n number of movies, we can move to n dimensional space By the way, the problem that we just solved is known as Collaborative filtering Essentially, we just built a recommendation engine using collaborative filtering And all that by just using school math! How cool is that! Fun Fact “We studied about two distances namely Manhattan distance and Euclidean distances These distances come from a family of distances known as Minkowski distance A general formula for Minkowski distances is: In this formula of Minkowski distance if: p= then it is known as Manhattan distance p= then it is known as Euclidean distance There are many other distances in Minkowski family like hamming distance where p =0 and so on You can google about other distances as those distances are beyond the scope of our book The most used distance is the Euclidean distance” Machine Learning: What does it mean? At its core machine learning is the ability of a system to learn on its own without being explicitly programmed What sets Machine learning apart from traditional computing is its “human-like” ability to learn on its own As kids, we all have made that mistake of touching something that is very hot That burning sensation is unforgettable But what we learn from that experience is never touch something that is hot In a similar way when a machine is exposed to some data it remembers that data and makes its decisions based on that memory that it gathered by that data What we mean when we say human like ability to make decisions? Let’s say it is raining heavily outside and a friend comes to our home as asks you to go for a picnic How will we decide whether it’s worth going for the picnic in heavy rains? You it on your past experience right? To put it down into a process there are roughly three steps involved • • • Recall: Recall what happened in a scenario Process: Think of the scenario Decide: Take a decision Applying this 3-point technique to our Picnic decision • • • Recall : Whenever it rains very heavily, traffic is hit badly It happened last time Process : It is raining very heavily now so traffic should be hit badly Decide : Let us stay at home and have a hot cup of green tea! Humans make decisions based on experience The experience of machines is Data Machines make decisions based on Data But how does a machine get experience Let’s try to understand this from the following example Suppose we have a dataset of patients with their blood pressure and whether the patient has a heart disease or not Patient No Blood Pressure Heart Disease High High High High Normal High Yes Yes Yes Yes No ??? Based on the data points we want to predict the heart condition of a 6th patient who has high Blood pressure We pass the data to the machine and ask for an answer The machine will scan this data and find a “pattern” that whoever has a high blood pressure also has heart disease so it is highly likely the machine will tell us that the 6th patient has a heart problem So, the answer is Yes You can also notice here that we could have also come up to this conclusion by just using the statistical concept of correlation between Blood Pressure and Heart Disease Statistics and Statistical modeling play a very important role in the field of Machine learning In the context of Artificial Intelligence, which as we studied earlier, is answer driven and not rule-driven, it is also important to note here that to identify whether the sixth patient has heart disease or not, we • • Gave the machine answers in the form of the records of patients The machine in response gave us a rule that as per the data whoever has high blood pressure also has heart disease Important to note here that it may not be medically correct but is correct with respect to the data to which the machine is exposed to If the data was different the outcome would have been different It is generally perceived that the more data a machine has the better the outcomes To sum up the above activity what we did: • • We “trained” the machine with a data set This data set is called as training data Asked for answers to the machine on new data on which it was not trained This is exactly how all machine learning works You train your algorithm (Machine) on huge datasets, the algorithm learns obvious and not so obvious (hidden) patterns in the dataset When you expose the algorithm to a new dataset which it has not seen earlier, the algorithm tries to answer your question on the new data set based on the learning it has acquired from the training data In practical scenarios, there is one more step before exposing your algorithm to a new-data set This is called the testing stage We break up our original dataset into parts (ratio of 80:20 or 75:25 etc.) • • Training Data Testing Data You train the algorithm on training data and validate/test your algorithm on the testing data In the testing data we already have the answers So first we hide all the answers as if they are not present We expose out testing data to the model which was built on the training data and we predict the outcomes on the testing data Now we compare the outcomes of this testing data from the actual outcomes that we kept hidden By comparing the predicted outcomes with the actual outcomes, we can evaluate • • How accurate is our model? How big is the error in our model? Once the algorithm gives good results on testing data then the algorithm is good for being used in real-life problems Types of Machine Learning As a Beginner we need to know that there are two types of Machine Learning: • • Supervised Machine Learning Unsupervised Machine Learning There is another type of Machine learning known as reinforcement learning Let us leave that for now as it is outside the scope at the beginning level To understand the difference between supervised machine learning and unsupervised machine learning we have to understand what is labelled and unlabelled data Labelled Data and Unlabelled Data Labelled data means it has a tag attached to itself The tag can be anything like a name, a number, a class, a type Unlabelled data does not have A tag attached to it In the above picture unlabelled data is just bunch of fruits (objects) Imagine if we did not know how fruits look then for us those would be just a bunch of objects as there is no description of those objects available For a machine (computer) the unlabelled data set is just a bunch of objects On the other hand label data has clear classification that those objects are Apples and Pears If someone doesn't even know how Apple or Pear looks she can just read the label and understand that it is something called an Apple and something called as a Pear For a machine these are not just any objects but distinct type of objects one is Apple and one is a Pear Now that we've understand what is labelled and unlabelled data let us pay our attention to the following points • • Supervised learning is all about working with labelled data Unsupervised learning is all about working with unlabelled data Let us explore the about points in more detail Supervised Machine learning Supervised machine learning is answering question like the above Based on the colour shape and size the machine learning algorithm will identify whether the fruit is an Apple , grape or banana These attributes like colour, shape, size are known as the features of the data Consider credit card transaction ,based on Location, time of transaction, amount of transaction a machine learning algorithm can try to identify whether a transaction is a fraudulent transaction or a legitimate transaction Here fraudulent transaction is labelled as one and legitimate transaction is labelled as zero Consider another example of x-rays of lungs A machine learning algorithm will study the patterns of the x-ray and try to predict whether the patient has lung infection or not an xray of an infectious lung is classified as one and the x-ray of a healthy lung is classified as zero Consider another example where you are trying to buy a house and you want to predict the price of the house You can take features like number of bedrooms, square feet area , nearness to hospital , location and city and accordingly predict the price of house All the above examples represent supervised learning where we know exactly what we are looking for and we have a label attached to it Further Supervised learning can further be classified into types • • Regression Classification Regression: When the label attached to a supervised learning algorithm is a continuous value which can be represented on a number line then we say that we are working on the regression problem Predicting the cost of a house is a continuous value similarly predicting the price of a company stock is a continuous value , predicting tomorrow's temperature is the continuous value , predicting salary of an employee with a particular skill set is again a continuous value All these values can be represented on a number line hence all these are regression problems Classification: When the label attached to a supervised learning algorithm is a discrete value then it is a classification problem When we are trying to predict if a patient has a disease or not the outcome is represented in discrete values of zero and one, when we are trying to predict whether a credit card transaction is fraudulent or not then the outcome is discrete, when we are trying to predict whether a customer is going to buy a product or not the outcome is again discrete So all these are classification problems Regression vs Classification In the above figure if you want to predict the temperature it can take any value on the number line so it's a regression problem While if we just want to find out about the weather tomorrow whether it's going to be rainy or sunny then it's a classification problem which takes discrete values Binary and Multiclassification: One more distinction in classification is binary versus multi classification When the output is discrete and can have only two outcomes like either yes or no ,one or zero ,Rainy or sunny it is a binary classification problem Predicting whether a patient has a disease or not is a binary classification problem because it can have only outcomes If we are trying to predict a fruit weather the fruit is an Apple or a grape or a pineapple then this problem is a multi-classification problem because there are more than discrete outcomes Multi classification problems are very common in the field of computer vision where you need to detect objects in the real world Unsupervised Learning: Now that we are clear about supervised learning let's talk a bit about unsupervised learning Imagine a bag has there types of fruits (Objects) When we pass pictures of these fruits through an unsupervised machine learning algorithm the algorithm will create segments one representing the fruit 1, second the fruit and 3rd the fruit We can give them any labels like A, B and C Important point to note here is that the machine learning algorithm has not called those fruits as Apple grapes and bananas rather just label them in different categories namely A, B and C It could have been any labels 0,1, all red, green orange etc These segments are created based on the colour size and shape of the fruit As stated earlier colour, shape , size are the features of the objects This is what precisely unsupervised learning is Unsupervised learning has many use cases in real world Imagine that you are a phone manufacturer and you are planning to launch a premium phone You have a large database of customers and you would only like to target your premium customers for that phone If you have the spending patterns of the customers you can easily categorise them into premium and non-premium customers What you need to is pass an unsupervised machine learning algorithm on the customer data set and ask it to return two different categories based on the spending patterns The feature used for segmentation here is spending pattern Many times unsupervised learning is used as a precursor to supervises learning Imagine if you want to find out whether a fruit is an Apple or not But the data set that we have are images of different types of fruits So what we can is run an unsupervised learning algorithm and cluster these into different segments One will be labelled as Apple another will labelled grapes and the 3rd one will be labelled pineapple As we already have labelled apple data we can take the cluster of the images which are labelled as Apple and build a supervised learning algorithms which tries to identify an Apple Similarly, Imagine you are working on a healthcare problem where you need x-rays of lungs to identify a respiratory problem However the data set that you have is a huge repository of MRI and X Ray scans of distinct body parts which include lungs, liver, brain and kidney So before working on the actual problem you need to first segregate the lung X Rays and nonlung X Rays This can be done with unsupervised learning where the algorithm will study how the patterns of a images in X rays and create a separate clusters for lung ,Kidney, brain, liver Now you can just take the scans related to lung and start working on your problem ****************More to come Soon***************** ...Note: Following is a draft version Introduction to Data Driven computing & AI The journey into the world of artificial intelligence is extraordinary It is extraordinary because it shows... we need AI and what type of question does AI try to solve The following table lists down 10 famous personalities of the world along with their domain of work and gender S No Name Domain of Work... new -data set This is called the testing stage We break up our original dataset into parts (ratio of 80:20 or 75:25 etc.) • • Training Data Testing Data You train the algorithm on training data