Bài 8 Slide Unsupervised Learning: K‐Means Gaussian Mixture Models. Unsupervised Learning K ‐Means Gaussian Mixture Models Unsupervised Learning K ‐Means Gaussian Mixture Models Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a.
Unsupervised Learning: K-‐Means & Gaussian Mixture Models Unsupervised Learning • Supervised learning used labeled data pairs (x, y) to learn a function f : X→Y – • • No labels = unsupervised learning Only some points are labeled = semi-‐supervised learning – • But, what if we don’t have labels? Labels may be expensive to obtain, so we only get a few Clustering is the unsupervised grouping of data points knowledge discovery It can be used for K-‐Means Clustering Clustering Data K-‐Means Clustering K-‐Means ( k , X ) • Randomly choose k cluster center locations (centroids) • Loop until convergence • Assign each point to the cluster of the closest centroid • Re-‐estimate the cluster centroids based on the data assigned to each cluster K-‐Means Clustering K-‐Means ( k , X ) • Randomly choose k cluster center locations (centroids) • Loop until convergence • Assign each point to the cluster of the closest centroid • Re-‐estimate the cluster centroids based on the data assigned to each cluster K-‐Means Clustering K-‐Means ( k , X ) • Randomly choose k cluster center locations (centroids) • Loop until convergence • Assign each point to the cluster of the closest centroid • Re-‐estimate the cluster centroids based on the data assigned to each cluster K-‐Means Animation Example generated by Andrew Moore using Dan Pelleg’s super- duper fast K-means system: Dan Pelleg and Andrew Moore Accelerating Exact k-means Algorithms with Geometric Reasoning Proc Conference on Knowledge Discovery in Databases 1999 K-‐Means Objective Function • K-‐means fnds a local optimum of the following objective function: X k X arg kx — µ k S i=1 i x2Si where S = {S1, , Sk} is a parti ti oning over S X = {x , , x and µ i = mean(S i ) n } s.t X = k i =1 Si Problems with K-‐Means • Very sensitive to the initial points – Do many runs of K-‐Means, each with different initial centroids – Seed the centroids using a better method than randomly choosing the centroids • e.g., Farthest-‐frst sampling • Must manually choose k – Learn the optimal k for the clustering • Note that this requires a performance measure Fitting a Gaussian Mixture Model (Optional) Expectation-Maximization for GMMs Iterate until convergence: On the t’th iteration let our estimates be Just evaluate a λt = { µ1(t), µ2(t) … µc(t) } Gaussian at xk E-step: Compute “expected” classes of all datapoints for each class p(x P(wi xk , λt )= k w ,i λ )Pt (w λ p(x k λ )t ) i t ( p x = w ,i µ (t ),i k µ i (t + ) = ∑ ( p x k w ,j µ Estimate µ given our data’s class membership distributions ∑ P(w k ∑ k ) i xk , λt xk P (w xi , λ k ) t ) I p (t) i c j =1 M-step: σ j (t ),σ ) I p j (t ) pi(t) is shorthand for E.M for General GMMs estimate of P(ωi) on t’th iteration Iterate On the t’th iteration let our estimates be λt = { µ1(t), µ2(t) … µc(t), Σ1(t), Σ2(t) … Σc(t), p1(t), p2(t) … pc(t) } Just evaluate a E-step: Compute “expected” clusters of all datapoints p(x P (wi xk , λt )= w i, λ )tP(w λ k p(x k ) i t p(xk wi , µi (t ), Σ i (t ) )pi (t ) = λ )t Gaussian at xk c ∑ p(x k w ,j µ j (t),Σ j ) (t) p j (t ) j=1 M-step: Estimate µ, Σ given our data’s class membership distributions ∑ P(w x , λ )x i µ i (t + 1) = k ∑ k k P (wi xk , λt t ∑ k Σ i (t + 1) = k P (w x , λ i )[x k t k ∑ ) − µ (ti + 1)][x P(w xi , λk ) k pi (t + 1) = ∑ P (w k i xk , λt R ) R = #records k t − µ (ti + 1)] T (End optional section) Gaussian Mixture Example: Start Advance apologies: in Black and White this example will be incomprehensible After first iteration After 2nd iteration After 3rd iteration After 4th iteration After 5th iteration After 6th iteration After 20th iteration Some Bio Assay data GMM clustering of the assay data Resulting Density Estimator ... constraint (must-‐link) (semi-‐supervised) Different-‐cluster constraint (cannot-‐link) Gaussian Mixture Models • Recall the Gaussian distribution: ✓ P ( x | µ, ⌃ ) = p (2⇡)d |⌃| ◆ exp — (x — µ ) |⌃ —1... Fitting a Gaussian Mixture Model (Optional) Expectation-Maximization for GMMs Iterate until convergence: On the t’th iteration let our estimates be Just evaluate a λt = { µ1(t), µ2(t) … µc(t) } Gaussian. . .Unsupervised Learning • Supervised learning used labeled data pairs (x, y) to learn a function f : X→Y – • • No labels = unsupervised learning Only some points