Advanced Topics monou Typewriter Follow me of LinkedIn for more Steve Nouri linkedin cominstevenouri Data Preprocessing Advanced Topics Components of Learning • Suppose that a bank want.Advanced Topics monou Typewriter Follow me of LinkedIn for more Steve Nouri linkedin cominstevenouri Data Preprocessing Advanced Topics Components of Learning • Suppose that a bank want.
Advanced Topics Follow me of LinkedIn for more: Steve Nouri https://www.linkedin.com/in/stevenouri/ Data Preprocessing Advanced Topics Components of Learning • Suppose that a bank wants to automate the process of evaluating credit card applications – Input 𝑥 (customer information that is used to make a credit application) – Target function 𝑓: 𝑋 → 𝑌 (ideal formula for credit approval), where 𝑋 and 𝑌 are the input and output space, respectively – Dataset 𝐷 of input-output examples 𝑥1 , 𝑦1 , … , 𝑥𝑛 , 𝑦𝑛 – Hypothesis (skill) with hopefully good performance: 𝑔: 𝑋 → 𝑌 (“learned” formula to be used) Advanced Topics Data Preprocessing Recall the Perceptron For 𝐱 = 𝑥1 , … , 𝑥𝑑 (“features of the customer”), compute a weighted score and: 𝑑 Approve credit if 𝑤𝑖 𝑥𝑖 > threshold , 𝑖=1 𝑑 Deny credit if 𝑤𝑖 𝑥𝑖 < threshold 𝑖=1 Advanced Topics Data Preprocessing Perceptron: A Mathematical Description This formula can be written more compactly as 𝑑 𝐱 = sign 𝑤𝑖 𝑥𝑖 − threshold , 𝑖=1 where 𝐱 = +1 means ‘approve credit’ and 𝐱 = − means ‘deny credit’; sign 𝑠 = +1 if 𝑠 > and sign 𝑠 = −1 if 𝑠 < This model is called a perceptron Advanced Topics Data Preprocessing Perceptron: A Visual Description 𝑥1 𝑥2 𝑥3 𝑦 0 -1 1 1 1 1 1 0 -1 -1 1 0 -1 Input Nodes 𝑥1 𝑥2 𝑥3 0.3 0.3 Output Node Σ 0.3 𝑡 = 0.4 𝑦 Advanced Topics Data Preprocessing Perceptron: A Visual Description Input Nodes 𝑥1 𝑥2 𝑥3 0.3 0.3 Output Node Σ 0.3 𝑡 = 0.4 𝑦 Advanced Topics Data Preprocessing Perceptron Learning Process The key computation for this algorithm is the weight update formula: 𝑤𝑗 𝑘+1 = 𝑤𝑗 𝑘 + 𝜆 𝑦𝑖 − 𝑦𝑖 𝑘 𝑥𝑖𝑗 , where 𝑤 𝑘 is the weight parameters associated with the 𝑖th input linke after the 𝑘th iteration, 𝜆 is a parameter known as the learning rate, and 𝑥𝑖𝑗 is the value of the 𝑗th feature of the training example 𝑥𝑖 Advanced Topics Data Preprocessing Perceptron Learning Process 𝑤𝑗 𝑘+1 = 𝑤𝑗 𝑘 + 𝜆 𝑦𝑖 − 𝑦𝑖 𝑘 𝑥𝑖𝑗 , If 𝑦 = +1 and 𝑦 = −1, then the prediction error is 𝑦𝑖 − 𝑦𝑖 = To compensate for the error, we need to increase the value of the predicted output If 𝑦 = −1 and 𝑦 = +1, then 𝑦𝑖 − 𝑦𝑖 = −2 To compensate for the error, we need to decrease the value of the predicted output Advanced Topics Data Preprocessing Data Perceptron Limitations 𝑥1 𝑥2 𝑦 = 𝑥1 XOR 𝑥2 1 1 1 0 Model Input 𝑥1 𝑤1 Output Σ 𝑥2 𝑤2 𝑡 𝑦 0,1 1,1 0,0 1,0 The following cannot all be true: 𝑤1 × + 𝑤2 × < 𝑡 𝑤1 × + 𝑤2 × > 𝑡 𝑤1 × + 𝑤2 × > 𝑡 𝑤1 × + 𝑤2 × < 𝑡 Data Preprocessing Advanced Topics Network Architectures • Three different network architectures: Single-layer feed-forward Multi-layer feed-forward Recurrent • The architecture of a neural network is linked with the learning algorithm used to train 10 Advanced Topics Data Preprocessing kNN on the Error (RMSE) Scale Erroneous 1.1296: Global average 1.0651: User average 1.0533: Movie average 0.96 kNN 0.91 0.9514: Cinematch (baseline) 0.8693: Ensemble 0.8563: Grand Prize: Accurate Data Preprocessing Advanced Topics Item-Oriented kNN CF • Problems: – Suppose that a particular item is predicted perfectly by a subset of the neighbors, where the predictive subset should receive all the weight Pearson correlation cannot this – Suppose the neighbors set contains three movies that are highly correlated with each other Basic neighborhood methods not account for interactions among neighbors 43 – Suppose that an item has no useful neighbors rated by a particular user The standard formula uses a weighted average of rates for the uninformative neighbors Advanced Topics Data Preprocessing Interpolation Weights To address the problem of arbitrary similarity measures, we can use a weighted sum rather than a weighted average: 𝑝𝑎,𝑖 = 𝑟𝑎 + 𝑟𝑢,𝑖 − 𝑟𝑢 × 𝑤𝑎,𝑢 𝑢∈𝐾 Now, we can allow 𝑢∈𝐾 𝑤𝑎,𝑢 ≠ Advanced Topics Data Preprocessing Interpolation Weights To address the other problems, we can model relationships between item 𝑖 and its neighbors This can be learned through a least squares problem from all other users that rated 𝑖: 𝑤 𝑟𝑣𝑖 − 𝑏𝑣𝑖 − 𝑣≠𝐾 𝑤𝑎,𝑢 𝑟𝑣𝑢 − 𝑏𝑣𝑢 𝑢∈𝐾 Data Preprocessing Advanced Topics Interpolation Weights The Result: – Interpolation weights derived based on their role; no use of an arbitrary similarity measure – Explicitly account for interrelationships among the neighbors Challenges: – Dealing with missing values – Avoiding overfitting – Efficient implementation Data Preprocessing Advanced Topics From Local to Latent Trends Inherently, nearest neighbors is a local technique What about capturing non-local, or latent, trends? 47 Advanced Topics Data Preprocessing Latent Factor Models • Decompose user ratings on movies into separate item and movie matrices to capture latent factors Frequently performed using singular value decomposition (SVD) • Estimate unknown ratings as inner-products of factors Ratings 5 3 5 4 Movies 4 4 ~ 2 -.4 -.5 -.2 1.1 2.1 -.7 2.1 -2 -1 Users 1.1 -.2 -2 -.5 -.4 1.4 2.4 -.9 -.8 1.4 -1 1.4 2.9 -.7 1.2 -.1 1.3 2.1 -.4 1.7 2.4 -.3 -.6 • Very powerful model, but can easily overfit Advanced Topics Data Preprocessing Factorization on the Error (RMSE) Scale Erroneous 1.1296: Global average 1.0651: User average 1.0533: Movie average 0.93 0.9514: Cinematch (baseline) 0.89 0.8693: Ensemble Factorization 0.8563: Grand Prize: Accurate Data Preprocessing Advanced Topics Ensemble Creation • Factorization and kNN models are used at various scales • These models can be combined to form an ensemble • Stacked generalization or blending is used – A linear regression model can be trained over the base model predictions – Models can be weighted differently at different scales 50 Data Preprocessing Advanced Topics Combining Multi-Scale Views Residual Fitting Weighted Average A Unified Model Global Effects Factorization Factorization Regional Effects kNN kNN Local Effects Data Preprocessing Advanced Topics Seek Alternative Perspectives The previous models all address the movies The problem, however, is about users! Data Preprocessing Advanced Topics The Third Axis: Implicit Information • Improve accuracy by exploiting implicit feedback • Implicit behavior is abundant and easy to collect: – Rental history, search patterns, browsing history, etc • Allows predicting personalized ratings for users that never rated The Idea: Characterize users by which movies they rated, rather than how they rated Advanced Topics Data Preprocessing The Big Picture Where you want to be? • All over the global-local axis • Relatively high on the quality axis • All over the explicit-implicit axis Global Local Quality Ratings Explicit Binary Implicit Advanced Topics Data Preprocessing Ensemble on the Error (RMSE) Scale Erroneous 1.1296: Global average 1.0651: User average 1.0533: Movie average 0.9514: Cinematch (baseline) Ensemble 0.89 0.8693: Ensemble 0.8563: Grand Prize: Accurate Advanced Topics Data Preprocessing The Take-Away Messages Solving challenging data mining and data science problems require you to: Think deeply – Design better, more innovative algorithms Think broadly – Use ensembles of multiple predictors Think differently – Model the data from different perspectives and in different ways 56 Follow me of LinkedIn for more: Steve Nouri https://www.linkedin.com/in/stevenouri/ ... & Interpretation Advanced Topics Machine Learning and AI via Brain simulations - Andrew Ng Data Preprocessing Advanced Topics Feature Learning A set of techniques in machine learning that learn... Applications Machine Learning and AI via Brain simulations - Andrew Ng Advanced Topics Data Preprocessing Popularity and Applications Machine Learning and AI via Brain simulations - Andrew Ng Advanced Topics. .. ~1 hour) Machine Learning and AI via Brain simulations - Andrew Ng Advanced Topics Data Preprocessing Neural Networks Machine Learning and AI via Brain simulations - Andrew Ng Advanced Topics Data