1. Trang chủ
  2. » Công Nghệ Thông Tin

Machine Learning Clustering

30 399 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 30
Dung lượng 772,5 KB

Nội dung

1 Machine Learning Clustering Nguyen Thi Thu Ha Email: hantt@epu.edu.vn 2 What is clustering • Clustering can be considered the most important unsupervised learning problem; • An other definition of clustering could be “the process of organizing objects into groups whose members are similar”. 3 What is clustering • A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters. 4 What is clustering • In this case we identify the 4 clusters into which the data can be divided; • the similarity criterion is distance: • two or more objects belong to the same cluster if they are “close” according to a given distance. (called distance-based clustering.) 5 What is clustering • Another kind of clustering is conceptual clustering: two or more objects belong to the same cluster if this one defines a concept common to all that objects. • In other words, objects are grouped according to their fit to descriptive concepts, not according to simple similarity measures. Why? • determine the intrinsic grouping in a set of unlabeled data. • what constitutes a good clustering? 6 Application • Marketing: finding groups of customers with similar behavior given a large database of customer data containing their properties and past buying records; • Biology: classification of plants and animals given their features; • Libraries: book ordering; 7 Application • City-planning: identifying groups of houses according to their house type, value and geographical location; • Earthquake studies: clustering observed earthquake to identify dangerous zones; • WWW: document classification; clustering weblog data to discover groups of similar access patterns. 8 Problems • dealing with large number of dimensions and large number of data items. • the effectiveness of the method depends on the definition of “distance” (for distance- based clustering); 9 Classification of clustering algorithm • Exclusive Clustering • Overlapping Clustering • Hierarchical Clustering • Probabilistic Clustering 10 [...]... Hierarchical Clustering • min d(i,j) = d(BA,NA/RM) = 255 => merge BA and NA/RM into a new cluster called BA/NA/RM L(BA/NA/RM) = 255 m=3 26 Hierarchical Clustering BA/NA/R M FI MI/TO BA/NA/R M 0 268 564 FI MI/TO 268 564 0 295 295 0 27 Hierarchical Clustering • min d(i,j) = d(BA/NA/RM,FI) = 268 => merge BA/NA/RM and FI into a new cluster called BA/FI/NA/RM L(BA/FI/NA/RM) = 268 m=4 28 Hierarchical Clustering. .. = 138 and the new sequence number is m = 1 22 Hierarchical Clustering BA FI MI/T O NA RM BA 0 662 877 255 412 FI 662 0 295 468 268 MI/T O 877 295 0 754 564 NA 255 468 754 0 219 RM 412 268 564 219 0 23 Hierarchical Clustering • min d(i,j) = d(NA,RM) = 219 => merge NA and RM into a new cluster called NA/RM L(NA/RM) = 219 m=2 24 Hierarchical Clustering BA FI MI/TO NA/RM BA 0 662 877 255 FI 662 0 295 268...Classification of clustering algorithm • four of the most used clustering algorithms: – – – – K-means Fuzzy C-means Hierarchical clustering Mixture of Gaussians 11 K-Means • K-Means Algorithm Properties – There are always K clusters – There is always at least one item in each... 2 3 4 5 6 7 8 9 10 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 18 Hierarchical Clustering Step 0 a Step 1 Step 2 Step 3 Step 4 ab b abcde c cde d de e Step 4 agglomerative divisive Step 3 Step 2 Step 1 Step 0 19 Hierarchical Clustering • Start by assigning each item to a cluster, so that if you have N items • Find the closest (most similar) pair of clusters... steps 2 and 3 until all items are clustered into a single cluster of size N (*) 20 Hierarchical Clustering • Input distance matrix BA FI MI NA RM TO BA 0 662 877 255 412 996 FI 662 0 295 468 268 400 MI 877 295 0 754 564 138 NA 255 468 754 0 219 869 RM 412 268 564 219 0 669 TO 996 400 138 869 669 0 21 Hierarchical Clustering • The nearest pair of cities is MI and TO, at distance 138 These are merged into... norm:   i =1 m L1 ( x , y ) = ∑ xi − yi i =1 • Cosine Similarity:   x •y 1−   x ⋅y 14 K-Means Let d be the distance measure between instances Select k random instances {s1, s2,… sk} as seeds Until clustering converges or other stopping criterion: For each instance xi: Assign xi to the cluster cj such that d(xi, sj) is minimal (Update the seeds to the centroid of each cluster) For each cluster cj... d(BA/NA/RM,FI) = 268 => merge BA/NA/RM and FI into a new cluster called BA/FI/NA/RM L(BA/FI/NA/RM) = 268 m=4 28 Hierarchical Clustering BA/FI/NA/R M MI/TO BA/FI/NA/R M 0 295 MI/TO 295 0 29 Hierarchical Clustering • Finally, we merge the last two clusters at level 295 30 . distance- based clustering) ; 9 Classification of clustering algorithm • Exclusive Clustering • Overlapping Clustering • Hierarchical Clustering • Probabilistic Clustering 10 Classification of clustering. “close” according to a given distance. (called distance-based clustering. ) 5 What is clustering • Another kind of clustering is conceptual clustering: two or more objects belong to the same cluster. Learning Clustering Nguyen Thi Thu Ha Email: hantt@epu.edu.vn 2 What is clustering • Clustering can be considered the most important unsupervised learning problem; • An other definition of clustering

Ngày đăng: 03/07/2015, 15:27

TỪ KHÓA LIÊN QUAN

w