机器学习chap7-Clustering.pdf_三一文库31doc.com

资源描述

《机器学习chap7-Clustering.pdf》由会员分享，可在线阅读，更多相关《机器学习chap7-Clustering.pdf（36页珍藏版）》请在三一文库上搜索。

1、CHAPTER7: Clustering Outline ?What is clustering ?Application examples ?Data in clustering ?Partition methods: K-means ?Model based methods: EM ?Hierarchical methods: Agglomerative nesting ?Model selection ?Conclusion 2 Clustering ?Cluster: a collection of data objects ?Similar to one another within

2、 the same cluster ?Dissimilar to the objects in other clusters ?Clustering: ?Grouping a set of data objects into clusters ?Clustering is in an unsupervised manner (no sample category labels provided) ?Typical applications ?As a stand-alone tool to get insight into data distribution ?As a preprocessi

3、ng step for other algorithms 3 ?What is a natural grouping among these objects? ?Clustering is subjective Simpsons Family School Employees Females Males 4 What is similarity? 5 Defining distance measures 6 Examples of Clustering Applications ?Marketing: Help marketers discover distinct groups in the

4、ir customer bases, and then use this knowledge to develop targeted marketing programs ?Land use: Identification of areas of similar land use in an earth observation database ?Insurance: Identifying groups of motor insurance policy holders with a high average claim cost ?City-planning: Identifying gr

5、oups of houses according to their house type, value, and geographical location 7 Data in clustering ?Data matrix ?Dissimilarity matrix 8 np x. nf x. n1 x . ip x. if x. i1 x . 1p x. 1f x. 11 x 0.)2 ,()1 ,( : )2 , 3() .ndnd 0dd(3,1 0d(2,1) 0 Data example 9 . . . . . . . . . . . . . Major clustering ap

6、proaches ?Partitioning algorithms: Construct various partitions and then evaluate them by some criterion ?Hierarchy algorithms: Create a hierarchical decomposition of the set of data (or objects) using some criterion ?Model-based: A model is hypothesized for each of the clusters and the idea is to f

7、ind the best fit of that model to each other 10 11 k-Means Clustering ?Find k reference vectors (prototypes/codebook vectors/codewords) which best represent data ?Reference vectors, mj, j =1,.,k ?Use nearest (most similar) reference: ?Reconstruction error j t j i t mxmx=min () = = = = otherwise0 min

8、if 1 1 j t j i t t i ti i tt i k ii b bE mxmx mxmX 12 Squared error 13 k-means Clustering 14 15 16 17 18 Comments on the K-Means Method ?Strength ?Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t n. ?Often terminates at a local optimum. The g

9、lobal optimum may be found using techniques such as: deterministic annealing and genetic algorithms ?Weakness ?Applicable only when mean is defined, then what about categorical data? ?Need to specify k, the number of clusters, in advance ?Unable to handle noisy data and outliers ?Not suitable to dis

10、cover clusters with non-convex shapes 19 20 Mixture Densities where Githe components/groups/clusters, P ( Gi ) mixture proportions (priors), p ( x | Gi) component densities Gaussian mixture where p(x|Gi) N ( i, i ) parameters = P ( Gi ), i, i ki=1 unlabeled sample X=xtt(unsupervised learning) ( )()

11、() = = k i ii Ppp 1 |GGxx 21 Expectation-Maximization (EM) ?Log likelihood with a mixture model ?Assume hidden variables z, which when known, make optimization much simpler ?Complete likelihood, Lc( |X,Z), in terms of x and z ?Incomplete likelihood, L( |X), in terms of x ()() ()() = = = t k i ii t t

12、 t Pp p 1 |log |log| GG XL x x 22 E- and M-steps ?Iterate the two steps 1.E-step: Estimate z given X and current 2.M-step: Find new given z, X, and old . An increase in Q increases incomplete likelihood ()() () ll l C l E |maxarg:step-M |:step-E 1 Q X,ZX,LQ = = + ()()XLXL| 1ll + 23 EM in Gaussian Mi

13、xtures ?zti= 1 if xtbelongs to Gi, 0 otherwise (labels r tiof supervised learning); assume p(x|Gi)N(i,i) ?E-step: ?M-step: Use estimated labels in place of unknown labels ()() ()() () t i lt i j j l j t i l i t lt i hP Pp Pp ,zE = = ,G G,G G,G X x x x | | | () ()() + + + = = t t i T l i t t l i tt i

14、 l i t t i t tt i l i t t i i h h h h N h P 11 1 1 mxmx x m S G 24 25 26 27 28 29 30 31 Hierarchical Clustering ?Cluster based on similarities/distances 32 Distance measure ?Distance measure between instances xrand xs Minkowski (Lp) (Euclidean for p = 2) City-block distance ()() p/ d j p s j r j sr

15、m xx,d 1 1 = =xx () = = d j s j r j sr cb xx,d 1 xx 33 Agglomerative Clustering ?Start with N groups each with one instance and merge two closest groups at each iteration ?Distance between two groups Giand Gj: ?Single-link: ?Complete-link: ?Average-link, centroid ()() sr , ji ,d,d j s i r xx xxGG GG

16、 =min ()() sr , ji ,d,d j s i r xx xxGG GG =max 34 Dendrogram Example: Single-Link Clustering 35 36 Choosing k ?Defined by the application, e.g., image quantization ?Plot data (after PCA) and check for clusters ?Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances) ?Manual check for meaning ?Model selection criteria ?Automatic model selection methods

展开阅读全文