Bài giảng Khai phá dữ liệu (Data mining) Support vector machine

Trịnh Tấn Đạt Khoa CNTT – Đại Học Sài Gòn Email: trinhtandat@sgu.edu.vn Website: https://sites.google.com/site/ttdat88/ Contents  Introduction  Review of Linear Algebra  Classifiers & Classifier Margin  Linear SVMs: Optimization Problem  Hard Vs Soft Margin Classification  Non-linear SVMs Introduction  Competitive with other classification methods  Relatively easy to learn  Kernel methods give an opportunity to extend the idea to  Regression  Density estimation  Kernel PCA  Etc Advantages of SVMs -  A principled approach to classification, regression and novelty detection  Good generalization capabilities  Hypothesis has an explicit dependence on data, via support vectors – hence, can readily interpret model Advantages of SVMs -  Learning involves optimization of a convex function (no local minima as in neural nets)  Only a few parameters are required to tune the learning machine (unlike lots of weights and learning parameters, hidden layers, hidden units, etc as in neural nets) Prerequsites  Vectors, matrices, dot products  Equation of a straight line in vector notation  Familiarity with  Perceptron is useful  Mathematical programming will be useful  Vector spaces will be an added benefit  The more comfortable you are with Linear Algebra, the easier this material will be What is a Vector ?  Think of a vector as a directed line segment in N-dimensions! (has “length” and “direction”)  Basic idea: convert geometry in higher dimensions into algebra!  Once you define a “nice” basis along each dimension: x-, y-, z-axis …    Vector becomes a x N matrix! v = [a b c]T Geometry starts to become linear algebra on vectors like v! a     v = b   c  y v x Vector Addition: A+B A+B + w = ( x1 , x ) + ( y1 , y ) = ( x1 + y1 , x + y ) A B C A+B = C (use the head-to-tail method to combine vectors) B A Scalar Product: av a v = a ( x1 , x ) = ( ax1 , ax ) av v Change only the length (“scaling”), but keep direction fixed Sneak peek: matrix operation (Av) can change length, direction and also dimensionality! Vectors: Magnitude (Length) and Phase (direction) v = ( x , x ,  , x )T n n v =  x2 (Magnitude or “2-norm”) i i =1 If v = 1, a unit vector Alternate representations: Polar coords: (||v||, ) Complex numbers: ||v||ej (unit vector => pure direction) y ||v||  “phase” x 10 Consider a Φ Φas shown below F é (a) F ê ù ê ê ê a1 ê ê ê 2am ê ê a12 ê a ê ê 2 m a ê (b) = ê ê 2a1a2 ê ê 2a1a3 ê ê ê 2a1am ê ê 2a2 a3 ê ê ê 2a2 am ê ê êë 2am- am ú ú é ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú úû ê ù ê ê ê 2b1 ê ê ê 2bm ê ê b12 2 ê b ê ê m ê b ê ê 2b1b2 ê ê 2b1b3 ê ê ê 2b1bm ê ê 2b2 b3 ê ê ê 2b2 bm ê ê êë 2bm- bm ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú úû 63 Collecting terms in the dot product  First term = +  Next m terms = å  Next m terms = m 2ai bi i=1 å  Rest =  Therefore m ai2 bi2 å i=1 m å å m m 2ai bi 2ai a j bi b j i=1 i=1 j=i+1 F (a) F (b) = 1+ 2å m i=1 bi + å m b + å i=1 i m å m 2ai a j bi b j i=1 j=i+1 64 Out of Curiosity (1+ a b) = (a b) + 2(a b) +1 2 ỉ å m ỉ å m bi ÷+1 è i=1 ø i=1 m m æ m ö = å å bi a j b j + ỗ bi ữ+1 ố i=1 ứ i=1 j=1 =ỗ =ồ m i=1 bi ữ + ỗ ứ ố (ai bi )2 + 2å m å m i=1 j=i+1 æ bi a j b j + ỗ m bi ÷+1 è i=1 ø 65 Both are Same  Comparing term by term, we see  Φ.Φ = (1 + a.b)2  But computing the right side is lot more efficient, O(m) (m additions and multiplications)  Let us call (1 + a.b)2 = K(a,b) = Kernel 66 Φ in “Kernel Trick” Example 2-dimensional vectors x = [x1 x2]; Let K(xi,xj)=(1 + xiTxj)2, Need to show that K(xi,xj)= φ(xi) Tφ(xj): K(xi,xj) = (1 + xiTxj)2 = 1+ xi12xj12 + xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] = φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] 67 Other Kernels  Beyond polynomials there are other high dimensional basis functions that can be made practical by finding the right kernel function 68 Examples of Kernel Functions ◼ Linear: K(xi,xj)= xi Txj ◼ Polynomial of power p: K(xi,xj)= (1+ xi Txj)p ◼ Gaussian (radial-basis function network): K ( x i , x j ) = exp(− ◼ xi − x j 2 2 ) Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1) 69  The function we end up optimizing is å R a k=1 s.t k R R a å å k=1 l=1 0£ a k k a lQkl where Qkl = yk yl K(xk , xl ) £ C, " k and å R a k yk = k=1 70 Multi-class classification Multi-class classification  One versus all classification Multi-class SVM Multi-class SVM SVM Software  Python: scikit-learn module  LibSVM (C++)  SVMLight (C)  Torch (C++)  Weka (Java) … 75 Research  One-class SVM (unsupervised learning): outlier detection  Weibull-calibrated SVM (W-SVM) / PI -SVM: open set recognition Homework  CIFAR-10 image recognition using SVM  The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class * There are 50000 training images and 10000 test images  These are the classes in the dataset: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck  Hint : https://github.com/wikiabhi/Cifar-10 https://github.com/mok232/CIFAR-10-Image-Classification

Tiêu đề	Support Vector Machine
Tác giả	Trịnh Tấn Đạt
Người hướng dẫn	TAN DAT TRINH, Ph.D.
Trường học	Saigon University
Chuyên ngành	Information Technology
Thể loại	Lecture
Năm xuất bản	2024
Thành phố	Ho Chi Minh City

Định dạng
Số trang	77
Dung lượng	1,43 MB