Bài giảng Máy học nâng cao: Support vector machine cung cấp cho người học các kiến thức: Introduction, review of linear algebra, classifiers & classifier margin, linear svms - optimization problem, hard vs soft margin classification, non linear svms. Mời các bạn cùng tham khảo.
Trịnh Tấn Đạt Khoa CNTT – Đại Học Sài Gòn Email: trinhtandat@sgu.edu.vn Website: https://sites.google.com/site/ttdat88/ Contents Introduction Review of Linear Algebra Classifiers & Classifier Margin Linear SVMs: Optimization Problem Hard Vs Soft Margin Classification Non-linear SVMs Introduction Competitive with other classification methods Relatively easy to learn Kernel methods give an opportunity to extend the idea to Regression Density estimation Kernel PCA Etc Advantages of SVMs - A principled approach to classification, regression and novelty detection Good generalization capabilities Hypothesis has an explicit dependence on data, via support vectors – hence, can readily interpret model Advantages of SVMs - Learning involves optimization of a convex function (no local minima as in neural nets) Only a few parameters are required to tune the learning machine (unlike lots of weights and learning parameters, hidden layers, hidden units, etc as in neural nets) Prerequsites Vectors, matrices, dot products Equation of a straight line in vector notation Familiarity with Perceptron is useful Mathematical programming will be useful Vector spaces will be an added benefit The more comfortable you are with Linear Algebra, the easier this material will be What is a Vector ? Think of a vector as a directed line segment in N-dimensions! (has “length” and “direction”) Basic idea: convert geometry in higher dimensions into algebra! Once you define a “nice” basis along each dimension: x-, y-, z-axis … Vector becomes a x N matrix! v = [a b c]T Geometry starts to become linear algebra on vectors like v! a v = b c y v x Vector Addition: A+B A+B v+ w = ( x1 , x ) + ( y1 , y ) = ( x1 + y1 , x + y ) A B C A+B = C (use the head-to-tail method to combine vectors) B A Scalar Product: av a v = a ( x1 , x ) = ( ax1 , ax ) av v Change only the length (“scaling”), but keep direction fixed Sneak peek: matrix operation (Av) can change length, direction and also dimensionality! Vectors: Magnitude (Length) and Phase (direction) v = ( x , x , , x )T n n v = x2 (Magnitude or “2-norm”) i i =1 If v = 1, a unit vector Alternate representations: Polar coords: (||v||, ) Complex numbers: ||v||ej (unit vector => pure direction) y ||v|| “phase” x 10 Consider a Φ Φas shown below é ê ê ê ê ê ê ê ê ê ê ê ê ê F(a) F(b) = ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ë a1 2am a 2 a2 am 2a1a2 2a1a3 2a1am 2a2 a3 2a2 am 2am-1am ù ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú û é ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ê ë 2b1 2bm b12 b22 bm 2b1b2 2b1b3 2b1bm 2b2 b3 2b2 bm 2bm-1bm ù ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú ú û 63 Collecting terms in the dot product First term = + Next m terms = m å 2a b Next m terms = i i i=1 m åa Rest = i bi2 i=1 m m å 2a b m å å 2a a b b i i i=1 Therefore i j i j i=1 j=i+1 m m m m F(a) F(b) = 1+ 2å bi + å b + å å 2ai a j bi b j i=1 i=1 i i=1 j=i+1 64 Out of Curiosity (1+ a b) = (a b) + 2(a b) +1 2 ổ ổ m = ỗ bi ữ + ỗ bi ữ +1 è i=1 ø è i=1 ø m m æ m ö = å å bi a j b j + ỗ bi ữ +1 ố i=1 ø i=1 j=1 m m ỉ = å (ai bi )2 + 2å å bi a j b j + ỗ bi ữ +1 è i=1 ø i=1 i=1 j=i+1 m m m 65 Both are Same Comparing term by term, we see Φ.Φ = (1 + a.b)2 But computing the right side is lot more efficient, O(m) (m additions and multiplications) Let us call (1 + a.b)2 = K(a,b) = Kernel 66 Φ in “Kernel Trick” Example 2-dimensional vectors x = [x1 x2]; Let K(xi,xj)=(1 + xiTxj)2, Need to show that K(xi,xj)= φ(xi) Tφ(xj): K(xi,xj) = (1 + xiTxj)2 = 1+ xi12xj12 + xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] = φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] 67 Other Kernels Beyond polynomials there are other high dimensional basis functions that can be made practical by finding the right kernel function 68 Examples of Kernel Functions ◼ Linear: K(xi,xj)= xi Txj ◼ Polynomial of power p: K(xi,xj)= (1+ xi Txj)p ◼ Gaussian (radial-basis function network): K ( x i , x j ) = exp(− ◼ xi − x j 2 2 ) Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1) 69 The function we end up optimizing is R R R åak - ååa kalQkl where Qkl = yk yl K(xk , xl ) k=1 k=1 l=1 s.t £ a k £ C, "k R and åa k yk = k=1 70 Multi-class classification Multi-class classification One versus all classification Multi-class SVM Multi-class SVM SVM Software Python: scikit-learn module LibSVM (C++) SVMLight (C) Torch (C++) Weka (Java) … 75 Research One-class SVM (unsupervised learning): outlier detection Weibull-calibrated SVM (W-SVM) / PI -SVM: open set recognition Homework CIFAR-10 image recognition using SVM The CIFAR-10 dataset consists of 60000 32x32 color images in 10 classes, with 6000 images per class * There are 50000 training images and 10000 test images These are the classes in the dataset: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck Hint : https://github.com/wikiabhi/Cifar-10 https://github.com/mok232/CIFAR-10-Image-Classification ... Width x- What we know: w x+ + b = +1 w x- + b = -1 w (x+-x-) = Also x+ = x- + λ w |x+ - x-|= M 30 Width of the Margin What we know: w x + + b = +1 M = || x - x || = || l w || w x - +... each dimension: x-, y-, z-axis … Vector becomes a x N matrix! v = [a b c]T Geometry starts to become linear algebra on vectors like v! a v = b c y v x Vector Addition:... Products -1 p = a (aTx) ||a|| = aTa = 13 Projection: Using Inner Products -2 p = a (aTb)/ (aTa) Note: the “error vector e = b-p is orthogonal (perpendicular) to p i.e Inner product: (b-p)Tp =