Machine learning is a scientific discipline that explores the construction and study of algorithms that learn from data. ■ ML is used to train computers to do things that are impossible to program in advance (e.g. handwriting recognition, fraud detection). ■ ML is an important part of Data Mining, KDD, Data Science ■ ML has a strong ties to statistics and mathematical optimization; statistics and optimization techniques are usually at the core of ML algorithms
MACHINE LEARNING & APPLICATIONS Nguyen Phan Bach Su Hoa Sen University Hoa Sen Research Seminar Outline ■ Brief introduction of machine learning (ML) – Basic concepts – Common tasks and Methods/Algorithms – A quick demonstration ■ Key issues ■ ML, Data Mining, and Business Analytics Machine Learning ■ Machine learning is a scientific discipline that explores the construction and study of algorithms that learn from data ■ ML is used to train computers to things that are impossible to program in advance (e.g handwriting recognition, fraud detection) ■ ML is an important part of Data Mining, KDD, Data Science ■ ML has a strong ties to statistics and mathematical optimization; statistics and optimization techniques are usually at the core of ML algorithms Examples: ■ Predicting the stock prices based on the current and historical data ■ Predict how much inventory to stock in the case of hurricanes (Walmart) ■ How to group customers based on their characteristics and buying behaviors? ■ Email classification (spam vs non-spam) ■ Predict who (customers) will quit using your service (MegaTelCo) Machine Learning Tasks ■ Supervised learning ■ Unsupervised learning ■ Reinforcement learning Supervised Learning ■ Given example inputs and their desired outputs (labeled training data), the goal of supervised learning is to learn/find a general rules that maps inputs to outputs – Classification: target variable is discrete (e.g., spam email) – Regression: target variable is real-valued (e.g., stock price) Training set Learning Algorithm Input x Mapping function f Output y Example: USER’S PREFERENCES Example Author Thread Length Where Read User Action e1 known new long home skips e2 unknown new short work reads e3 unknown follow Up long work skips e4 known follow Up long home skips e5 known new short home reads e6 known follow Up long work skips e7 unknown follow Up short work skips e8 unknown new short work reads e9 known follow Up long home skips e10 known new long work skips These are some training and test examples obtained from observing a user deciding whether to read articles posted to a threaded discussion board depending on whether the author is known or not (source: http://artint.info/html/ArtInt_171.html) Example: Write-off (Provost and Fawcett, 2013) ML algorithms for Supervised Learning ■ Classification ➡ K-nearest neighbor ➡ Decision tree ➡ Naive Bayes ➡ Logistics regression ➡ Artificial Neural Network (ANN) ➡ Support Vector Machine (SVM) ■ Regression ➡ Linear/Non-linear regression ➡ Symbolic regression ➡ ANN Example: Write-off ■ Solving write-off problem (binary classification) with decision tree algorithm ➡ Measure the purity of the members/examples entropy = -p1*log(p1) - p2*log(p2) - … pi is the probability of i in the set ➡ Splitting based on the information gain (IG) Figure 3-3 Entropy of a two-class set as a function of p(+) entropy(S) = ≈ - 0.7 × log2 (0.7) + 0.3 × log2 (0.3) - 0.7 × - 0.51 + 0.3 × - 1.74 0.88 IG(parent, children) = is only part of the story We would like to measure how informative an attribute entropy(parent) - [p(c1)*entropy(c1isEntropy )with+respect p(c 2)*entropy(c 2) + it…] to our target: how much gain in information gives us about the value ≈ of the target variable An attribute segments a set of instances into several subsets En‐ tropy only tells us how impure one individual subset is Fortunately, with entropy to measure how disordered any set is, we can define information gain (IG) to measure how much an attribute improves (decreases) entropy over the whole segmentation it creates Strictly speaking, information gain measures the change in entropy due to any10 amount of new information being added; here, in the context of supervised segmentation, we Example: group similar news http://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoidiagram-using-spark-and-mllib/ 19 Example: Facebook friends 20 Reinforcement Learning ■ Reinforcement learning is learning what to how to map situations to actions so as to maximize a numerical reward ■ Learning about, from, and while interacting with an external environment ■ The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them 21 Remarks on ML ■ Overfitting: occurs when a statistical model describes random error or noise instead of the underlying relationship ■ Testing/Validating: to measure the accuracy of ML prediction models and their generalization ■ Scalability: is the ability of ML algorithms to handle large scale problems ■ Feature scaling/Normalization: standardize the range of independent variables or features of data ■ Feature manipulation: includes feature selection and feature construction ■ Interpretability: is about how easy we can explain the results/ models obtained by ML algorithms 22 A quick demonstration ■ Titanic 23 History Late 19th century Francis Galton and Karl Pearson quantifying the relation- ship between offspring and parental characteristics Early 20th century Charles Spearman factor analysis → IQ tests 1901 Karl Pearson principal components analysis 1930s Ronald Fisher linear discriminant function analysis to solve a taxonomic problem 1930s-1940s Bartlett and Roy multivariate analysis of variance 1930s-1940s Driver and Kroeber/Robert Tryon cluster analysis 1950s-2000s evolution of AI, machine learning and data mining neural network, decision tree, genetic algorithms, support vector machine 2000s-?? business intelligence, big data 24 KDD/Data Mining ■ Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data [Fayyad] ■ Data Mining is a problem solving methodology that finds a logical or mathematical description, of a complex nature, of patterns and regularities in a set of data [Decker and Focardi] ■ Data Mining is often related to learning/adaptive algorithms and methods ■ KDD/DM is not new techniques but rather a multi-disciplinary field of research: all make a contribution (later) 25 Business Analytics ■ Business analytics (BA) refers to the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning [Bartlett,2013] ■ Using data to make better decisions; basically operations research with emphasis on data 26 Trends Econometrics Business Analytics Operations Research 27 Trends Machine Learning Econometrics Business Analytics Operations Research 28 29 of personality and character based on simple observations For analytics-minded leaders, then, the challenge boils down to knowing UPS has broadened its focus from logistics to customers, in the interest of providing superior service While such multipronged strate- Common applications of BA Analytics competitors make expert use of statistics and modeling to improve a wide variety of functions Here are some common applications: FUNCTION DESCRIPTION EXEMPLARS Supply chain Simulate and optimize supply chain flows; reduce inventory and stock-outs Dell, Wal-Mart, Amazon Customer selection, loyalty, and service Identify customers with the greatest profit potential; increase likelihood that they will want the product or service offering; retain their loyalty Harrah’s, Capital One, Barclays Pricing Identify the price that will maximize yield, or profit Progressive, Marriott Human capital Select the best employees for particular tasks or jobs, at particular compensation levels New England Patriots, Oakland A’s, Boston Red Sox Product and service quality Detect quality problems early and minimize them Honda, Intel Financial performance Better understand the drivers of financial performance and the effects of nonfinancial factors MCI, Verizon Research and development Improve quality, efficacy, and, where applicable, safety of products and services Novartis, Amazon, Yahoo harvard business review • january 2006 Copyright © 2005 Harvard Business School Publishing Corporation All rights reserved THINGS YOU CAN COUNT ON page Thomas H Davenport (2005), Competing on Analytics, HRM 30 Discussions ■ ML can be applied to many business applications; there is a large number of ML algorithms proposed in the literature but there are not many useful applications ■ ML, operations research, optimization, decision science have many things in common → now, business analytics ■ Many research directions have not been explored ■ There is a tendency to invest more on business information systems → more data (big data) → opportunity and challenges ■ Need to equip statistics, optimization, ML, data mining knowledge 31 Opportunity 127 Industries Semiconductor, electronic manufacturing Deep analytical talent (employment by industry) - thousands of people The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data source: http://www.mckinsey.com/features/big_data 32 Thank You! 33 [...]... http://chimpler.wordpress.com/2014/07/11/segmenting-audience-with-kmeans-and-voronoidiagram-using-spark-and-mllib/ 19 Example: Facebook friends 20 Reinforcement Learning ■ Reinforcement learning is learning what to do how to map situations to actions so as to maximize a numerical reward ■ Learning about, from, and while interacting with an external environment ■ The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most... tree and the partitions it imposes in instance space The 14 Example: Write-off ■ Linear discrimination function: perceptron, logistics regression, support vector machine f(x) = w0 + w1x1 + w2x2 + … 15 Unsupervised Learning ■ Unsupervised learning studies how systems can learn to represent particular input patterns (unlabeled) in a way that reflects the statistical structure of the overall collection... 1930s-1940s Bartlett and Roy multivariate analysis of variance 1930s-1940s Driver and Kroeber/Robert Tryon cluster analysis 1950s-2000s evolution of AI, machine learning and data mining neural network, decision tree, genetic algorithms, support vector machine 2000s-?? business intelligence, big data 24 KDD/Data Mining ■ Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying... Operations Research 27 Trends Machine Learning Econometrics Business Analytics Operations Research 28 29 of personality and character based on simple observations For analytics-minded leaders, then, the challenge boils down to knowing UPS has broadened its focus from logistics to customers, in the interest of providing superior service While such multipronged strate- Common applications of BA Analytics... reserved THINGS YOU CAN COUNT ON page 6 Thomas H Davenport (2005), Competing on Analytics, HRM 30 Discussions ■ ML can be applied to many business applications; there is a large number of ML algorithms proposed in the literature but there are not many useful applications ■ ML, operations research, optimization, decision science have many things in common → now, business analytics ■ Many research directions... superior service While such multipronged strate- Common applications of BA Analytics competitors make expert use of statistics and modeling to improve a wide variety of functions Here are some common applications: FUNCTION DESCRIPTION EXEMPLARS Supply chain Simulate and optimize supply chain flows; reduce inventory and stock-outs Dell, Wal-Mart, Amazon Customer selection, loyalty, and service Identify... problem solving methodology that finds a logical or mathematical description, of a complex nature, of patterns and regularities in a set of data [Decker and Focardi] ■ Data Mining is often related to learning/ adaptive algorithms and methods ■ KDD/DM is not new techniques but rather a multi-disciplinary field of research: all make a contribution (later) 25 Business Analytics ■ Business analytics (BA) ... introduction of machine learning (ML) – Basic concepts – Common tasks and Methods/Algorithms – A quick demonstration ■ Key issues ■ ML, Data Mining, and Business Analytics Machine Learning ■ Machine learning. .. (customers) will quit using your service (MegaTelCo) Machine Learning Tasks ■ Supervised learning ■ Unsupervised learning ■ Reinforcement learning Supervised Learning ■ Given example inputs and their desired... Facebook friends 20 Reinforcement Learning ■ Reinforcement learning is learning what to how to map situations to actions so as to maximize a numerical reward ■ Learning about, from, and while interacting