Naive Bayes Classification (NBC) là một thuật toán phân loại dựa trên tính toán xác suất áp dụng định lý Bayes mà ta đã tìm hiểu ở bài trước (xem bài trước tại đây). Thuật toán này thuộc nhóm Supervised Learning (Học có giám sát). Theo định lý Bayes, ta có công thức tính xác suất ngẫu nhiên của sự kiện yy khi biết xx như sau: P(y|x) = dfrac{P(x|y)P(y)}{P(x)} ~~~~~ (1) P(y∣x)= P(x) P(x∣y)P(y) (1) Giả sử ta phân chia 1 sự kiện xx thành nn thành phần khác nhau x_1, x_2, dots, x_nx 1 ,x 2 ,…,x n . Naive Bayes theo đúng như tên gọi dựa vào một giả thiết ngây thơ rằng x_1, x_2, dots x_nx 1 ,x 2 ,…x n là các thành phần độc lập với nhau. Từ đó ta có thể tính được: P(x|y) = P(x_1 cap x_2 cap dots cap x_n |y) = P(x_1|y) P(x_2|y) dots P(x_n|y) ~~~~~ (2) P(x∣y)=P(x 1 ∩x 2 ∩⋯∩x n ∣y)=P(x 1 ∣y)P(x 2 ∣y)…P(x n ∣y) (2) Do đó ta có: P(y|x) propto P(y) prodlimits_{i=1}n P(x_i|y) ~~~~~ (3) P(y∣x)∝P(y) i=1 ∏ n P(x i ∣y) (3)
Gaussian Naive Bayes Classifier: Iris Data Set Nguyen Van Hai Nguyen Tien Dung Dao Anh Huy Luu Thanh Duy Ton Duc Thang University Outline Overview Iris Data Set Bayes Theorem Normal distribution Prepare Data Load Data Split Data Group Data Summarize Data Mean Standard Deviation Summary Outline Build Model Overview Prior Probability Likelihood Joint Probability Marginal Probability Posterior Probability Test Model Get Maximum A Posterior Predict Accuracy Code Results Print the results Overview Iris Data Set The data set has independent variables and dependent variable that have different classes with 150 instances - The first columns are the independent variables (features) - The 5th column is the dependent variable (class) sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) class: Iris Setosa Iris Versicolour Iris Virginica Iris Data Set For example: Figure: Random Row Sample Bayes Theorem Naive Bayes, more technically referred to as the Posterior Probability, updates the prior belief of an event given new information The result is the probability of the class occuring given the new data P (class/f eatures) = P (class) ∗ P (f eatures/class) P (f eatures) P(class/features) : Posterior Probability P(class) : Class Prior Probability P(features/class) : Likelihood P(features) : Predictor Prior Probability Normal distribution The probability density of the normal distribution is: f (x|µ, σ ) = √ πσ e− (x−µ)2 σ2 Where ’µ’ is the mean or expectation of the distribution, ’σ’ is the standard deviation, and ’σ ’ is the variance (1) Prepare Data Load Data Read in the raw data and convert each string into an integer 10 Code Code 31 Code 32 Code 33 Code 34 Code 35 Code 36 Code 37 Code 38 Code 39 Code 40 Results Print the results 42 Print the results 43 THE END ... is critical in understanding the veracity of the model 29 Code Code 31 Code 32 Code 33 Code 34 Code 35 Code 36 Code 37 Code 38 Code 39 Code 40 Results Print the results 42 Print the results 43... Iris Versicolour Iris Virginica Iris Data Set For example: Figure: Random Row Sample Bayes Theorem Naive Bayes, more technically referred to as the Posterior Probability, updates the prior belief... P(class/features) This where all of the preceding class methods tie together to calculate the Gauss Naive Bayes formula with the goal of selecting MAP 25 Test Model Get Maximum A Posterior The get best