Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 32 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
32
Dung lượng
306,56 KB
Nội dung
arXiv:1205.2034v1 [stat.AP] May 2012 A Self-Updating Clustering Algorithm γ-SUP Based on γ-Divergence with Application to Cryo-EM Images Ting-Li Chena, Hung Hungb, I-Ping Tua∗ Pei-Shien Wua , Wei-Hau Changc and Su-Yun Huanga a b Institute of Statistical Science, Academia Sinica Institute of Epidemiology & Preventive Medicine National Taiwan University c Institute of Chemistry, Academia Sinica December 31, 2018 ∗ Corresponding author, iping@stat.sinica.edu.tw Abstract In the past decades, cryo-electron microscopy (cryo-EM) has become a powerful tool for obtaining high resolution three-dimension (3-D) structures of biological macro-molecules A cryo-EM data set usually contains at least thousands of 2-D projection images of free oriented particles The characteristics of these projections include having low signal-to-noise ratio, containing many misaligned images as outliers, and consisting of a large number of clusters due to free orientations Clustering analysis is a necessary step to group the similar orientation images for noise reduction In this article, we propose a clustering algorithm, called γ-SUP, by implementing a minimum γ divergence on a mixture of qGaussian family through a self-updating process γ-SUP copes well with the cryo-EM images by its advantages as follows (a) It resolves the sensitivity issue of choosing the number of clusters and cluster initials (b) It sets a hard influence range for each component in the mixture model and hence leads to a robust procedure for learning each of the local clusters (c) It performs a soft rejection by down weighting deviant points from cluster centers and further enhances the robustness (d) At each iteration, it shrinks the mixture model parameter estimates toward cluster centers, and improves the efficiency of mixture estimation Key words and phrases: clustering algorithm, cryo-EM images, γ-divergence, k-means, multilinear principal component analysis, q-Gaussian distribution, robust statistics, selfupdating process Introduction and motivating data example Cryo-electron microscopy (cryo-EM) has been emerging as a powerful tool for obtaining high resolution three-dimensional (3-D) structures of biological macro-molecules (Saibil, 2000; van Heel et al., 2000; Frank, 2002; Yu et al., 2008) Traditionally, an efficient 3-D structure determination is provided by X-ray crystallography when a large assembly of crystal can be obtained, allowing for signals to be recorded from the spots in the diffraction pattern However, not every molecule can form a crystal, and sometimes it is beneficial to study a molecule as a single particle rather than in a crystal In contrast to X-ray crystallography, cryo-EM does not need crystals but can view dispersed biological molecules embedded in a thin layer of vitreous ice (Dubochet, 2012) The electron beam transmitting through the specimen generates 2-D projections of these freely oriented molecules and the 3-D structure can be obtained by back projections provided the angular relationships among the 2-D projections are determined (DeRosier and Klug, 1968) As biological molecules are highly sensitive to electron beams, only very limited doses are allowed for imaging This yields a barely recognizable image for individual molecule (Henderson, 1995) Nevertheless, as the molecule is assembled by subunits in high order, for example, an icosahedra virus, the symmetry would facilitate the image processing and allow for attaining near-atomic resolution structure (Liu et al., 2010; Jiang et al., 2008; Grigorieff and Harrison, 2011) However, the de-noising for a particle of low or no symmetry is evidently challenging as it requires the image alignment and the data clustering on sufficient number of images in similar orientations for averaging (Chang et al., 2010) In this article, we focus on the clustering step and assume that the image alignment has been carried through At present, k-means based algorithms are probably the most popular clustering methods for cryo-EM images (Frank, 2006) Modification for k-means has been proposed to enforce balanced cluster sizes and to avoid excessively large or small clusters (Sorzano et al., 2010) However, there still exists the issue that initial cluster assignment is required, which drives the final clustering result to some degree Furthermore, k-means enforces every object to be clustered into a certain class, which may not be proper for cases when a non-negligible portion of image objects become outliers due to misalignment or data contamination This enforcement may lead to serious bias for estimating the cluster representatives On the contrary, the self-updating process (SUP, Chen and Shiu 2007; Shiu and Chen 2012) is a data-driven robust algorithm that allows extremely small-sized clusters consisting of only a few data points or even singleton to accommodate outliers Allowing outlier clusters is important for robust clustering SUP starts with each individual data point as a singleton cluster, so that neither random initials nor cluster number is required Next, it goes through a self-updating process of data merging and parting according to weighted averages over their local neighborhoods by defining a hard influence region, where the weight can be an arbitrary function that is proportional to the similarity The data points finally converge and stop at representative points as cluster centers In this article, we modify from the original SUP and propose an information-theoretic framework to formulate the clustering algorithm, named γ-SUP, as a minimum γ-divergence estimation (Fujisawa and Eguchi, 2008; Cichocki and Amari, 2010; Eguchi et al., 2011) of q-Gaussian mixture (Amari and Ohara, 2011; Eguchi et al., 2011) through the SUP implementation In this framework, we parameterize the weights in SUP through a family of γ divergence with γ > and also parameterize the hard influence region through a q-Gaussian family with q < As a comparison, the popular k-means is a special case of this framework with γ = and q = and with the number of classes k given but without the SUP implementation The use of q-Gaussian mixture model with q < sets a hard influence range for each component in the mixture model and rejects data influence from outside this range, and hence it leads to a robust procedure for learning each of the local clusters The minimum γ-divergence with γ > essentially performs a soft rejection by down weighting deviant points from cluster centers and further enhances the robustness At each iteration, the self-updating process shrinks the mixture model parameter estimates toward cluster centers, which acts as if the effective temperature, that is within cluster variance over the power parameter, is continuously decreasing iteratively so that such a shrinkage updating will improve the efficiency of mixture estimation For application, we design a simulation to investigate the performance of γ-SUP 6400 images with 100× 100 pixels are generated by simulating the 2-D projected cryo-EM images of a model molecule, RNA polymerase II, in 128 equally spaced (angle-wise) orientations with iid Gaussian noise N (0, 402 ) We assume two scenarios in the image alignment step: perfect alignment and 10% misalignment For perfect image alignment, the k-means reaches a correct rate 83%, and γ-SUP achieves 100% When allowing 10% misalignment, k-means drops its accuracy rate to 74%, while γ-SUP still keeps 100% The paper is organized as follows In Section 2, we have a brief review of γ-divergence and q-Gaussian mixture relevant for γ-SUP In Section 3, we formulate the γ-SUP clustering algorithm as a minimum γ-divergence estimation of q-Gaussian mixture with k-means as a special case In Section 4, we show γ-SUP’s stability to tuning parameter selection and its efficiency In Section 5, we apply γ-SUP to the simulated cryo-EM images In Section 6, we summarize our conclusions A brief review of γ-divergence and q-Gaussian In this section we briefly review the concepts of γ-divergence and q-Gaussian distribution, which are the key technical tools for our γ-SUP clustering algorithm 2.1 γ-divergence The most widely used distribution divergence is probably the Kullback-Leibler divergence due to its connection to maximum likelihood estimation (MLE) The γ-divergence is a generalization of Kullback-Leibler divergence indexed by a power parameter γ Let M be the collection of all the positive integrable functions defined on X ⊂ Rp M= f : f (x) ≥ 0, f (x)dx < ∞ Definition (Fujisawa and Eguchi 2008; Cichocki and Amari 2010; Eguchi et al 2011) For f, g ∈ M, define the γ-divergence Dγ (· ·) and γ-cross entropy Cγ (· ·) as follows:1 Dγ (f g) = Cγ (f g) − Cγ (f f ) with Cγ (f g) = − where g γ+1 γ(γ + 1) gγ (x) f (x)dx, g γγ+1 (1) = { gγ+1 (x)dx}1/(γ+1) is a normalizing constant so that the cross entropy enjoys the property of being projective invariant, i.e., Cγ (f cg) = Cγ (f g), ∀ c > For a given f , minimizing Dγ (f g) over g in a certain function class is equivalent to minimizing the following loss function Lγ,f (g) = − ln γ g γ (x) × f (x)dx + See Eguchi et al (2011) for feasible range for γ values ln γ+1 gγ+1 (x)dx (2) Note that in the limiting case, limγ→0 Dγ (· ·) = D0 (· ·) reduces to the Kullback-Leibler divergence The MLE which corresponds to D0 (· ·) has been shown to be optimal in parameter estimation in the sense of having minimum asymptotic covariance This optimality comes with the cost that MLE relies heavily on the correctness of model specification and, hence, is not robust against model deviation and outliers On the other hand, the minimum γ-divergence estimation is shown to be super robust (Fujisawa and Eguchi, 2008) against data contamination 2.2 q-Gaussian The q-Gaussian distribution is a generalization of Gaussian distribution by using the qexponential instead of the usual exponential function as defined below Let Sp denote the collection of all strictly positive definite p × p symmetric matrices Definition (modified from Amari and Ohara 2011; Eguchi et al 2011) For a fixed q ∈ −∞, 1+ 2p , define the p-variate q-Gaussian distribution with parameters θ = (µ, Σ) ∈ Rp × Sp to have the probability density function given by cp,q fq (x; θ) = √ expq {u(x; θ)} , x ∈ Rp , p ( 2π) |Σ| where u(x; θ) = − 12 (x− µ)T Σ−1 (x− µ), cp,q is a normalizing constant so that 1, and expq (u) = {1 + (1 − q)u}+1−q , with {x}+ = max{x, 0} Denote the q-Gaussian distribution with parameters (µ, Σ) by Gq (µ, Σ) (3) fq (x; θ)dx = The constant cp,q is given below (cf Eguchi et al., (1−q)p/2 Γ 1+ p2 + (1−q) , Γ 1+ 1−q cp,q = 1, (q−1)p/2 Γ q−1 , p Γ −2 q−1 2011): for − ∞ < q < 1, for q → 1, (4) for < q < + 2p When q → 1, q-Gaussian reduces to the usual Gaussian distribution, while it is the t- distribution for < q < + 2p For the usual Gaussian and t-distributions, the domain of X is the entire Rp and does not involve the index q The domain of X, however, depends on q when q < As will become clear later, our proposed choice of q < sets a hard influence 2 , then range and it can perform data rejection Note that if X ∼ Gq (µ, Σ) with q < + p+2 E(X) = µ and Cov(X) = 2.3 2+(p+2)(1−q) Σ Minimum γ-divergence for learning a q-Gaussian The γ-divergence is a discrepancy measure for two functions in M and its minimum can be used as a criterion to approximate an underlying probability density function f from a certain model class MΘ parameterized by θ ∈ Θ ⊂ Rm By (1) and (2), the estimate of f in the population level can be written as: f ∗ = argmin Dγ (f g) = argmin Cγ (f g) = argmin Lγ,f (g) g∈MΘ g∈MΘ (5) g∈MΘ In this study, we consider Mθ with θ = (µ, Σ) to be a family of q-Gaussian distributions Gq (µ, Σ) introduced in Definition For any fixed γ, q and the true density function f , the For q < + p+ν , it ensures the existence of the ν th moment of X loss function (2) becomes Lγ,f {fq (x; θ)} = − ln γ f (x) = − ln γ = − ln γ γ cp,q expq (u(x; θ)) f (x) dx + 2p |Σ| expq γ+1 dv γ γ+1 f (x) f γ+q (x; µ, Σ) γ+1 γ +1 ln γ+1 γ γ+1 (u(x; θ))γ+1 expq (u(v; θ)) cp,q expq (u(x; θ)) (γ+1) 2p |Σ| dx dx dx Hence, minimizing Lγ,f {fq (x; θ)} over possible values of θ is equivalent to maximizing γ ) − 12 ( γ+1 f (x)|Σ| expq {u(x; θ)} γ dx (6) For high dimensional data, however, it is unpractical to estimate the covariance matrix Σ and its inverse Note also that our main interest in this study is to find cluster centers We thus employ Σ = σ Ip as our working model By taking derivative of (6) with respect to µ, we get the stationarity for the maximizer µ∗ : µ∗ = xf (x)[expq {u(x; µ∗ , σ )}]γ−(1−q) dx f (x)[expq {u(x; µ∗ , σ )}]γ−(1−q) dx = xw(x; µ∗ , σ )dF (x) , w(x; µ∗ , σ )dF (x) (7) where w(x; µ∗ , σ ) = [expq {u(x; µ∗ , σ )}]γ−(1−q) is the weight function and F (x) is the cumulative distribution function corresponds to f (x) Notice that for q < 1, we have w(x; µ∗ , σ ) = 1− 1−q 2σ2 x− µ∗ 0, γ−(1−q) 1−q , x − µ∗ x − µ∗ < 2σ2 1−q , ≥ 2σ2 (8) 1−q Thus, adoption of q-Gaussian with q < completely rejects data far away from µ∗ , that is, it sets a hard influence range γ-SUP We are now in a position to introduce our clustering method, the γ-SUP, which minimizes the γ-divergence on the mixture of q-Gaussian distributions 3.1 The case with q < and γ > Suppose that f is a mixture of k components, i.e., k f (x) = πh fh (x) (9) h=1 The aim for a clustering method is to tell apart these k components For the purpose of robustness, we model each fh as a q-Gaussian distribution and use the minimum γ-divergence criterion to develop an estimation scheme for simultaneously learning all k components, where k is automatically determined during the estimation procedure Under this setting, when the initial value of µ is within the influence range of some fh , we are able to learn for the component fh via the stationary equation given by (7): µ∗h = xw(x; µ∗h , σ )dF (x) w(x; µ∗h , σ )dF (x) In Mollah et al (2010), a learning algorithm for local PCA based on projective power divergence (projective power divergence is another name for γ-divergence) is proposed In particular, their algorithm explores one local PCA structure at a time based on which cluster the initial point belongs to All the local PCA structures can be extracted sequentially by assigning proper initial points Turning to the clustering problem, we start from (7) to develop a clustering algorithm called γ-SUP by employing a self-updating process γSUP has the following key ingredients: 10 (ℓ) (0) Note the difference between Fˆn in (10) of γ-SUP and Fˆn in (18) of the γ-estimator We compare the γ-SUP estimator (10), the usual robust γ-estimator (18), sample mean and the MLE for t distribution It is known that sample mean will inherit large variation due to heavy tail probabilities of t distribution On the other hand, MLE is shown to be optimal when the model is correctly specified Thus, sample mean and MLE are used as references for the worst and the best scenarios Simulation results with (n, p) ∈ {(10, 100), (100, 100)} and 500 replicates are provided in Figure Reported are the MSE curves (times n) for methods, where MSE is defined by µ ˆ − µ 22 /p averaged over R replicate runs and thus n × MSE = n × R R r=1 µ ˆr − µ p 2 Not surprisingly, the best performer is MLE and the worst one is the sample mean, while γ-SUP and γ-estimator have intermediate performances γ-SUP performs very closely to the optimal estimator MLE for every setting of p and choice of s It indicates the superiority of γ-SUP which provides a robust estimation for location parameter, even when the model is not correctly specified Although both γ-SUP and γ-estimator adopt the same minimum γ-divergence criterion to estimate parameters, our simulation results show that γ-SUP that weights on model does uniformly perform better than γ-estimator that weights on data This observation reflects the potential of “weight on model” in alleviating poor influence due to outliers 18 p = 100, n = 100; based on 500 replicate runs 3.5 3 γ−sup estimator γ−estimator sample−mean−t mle−t 2.5 sample size * MSE sample size * MSE p = 10, n = 100; based on 500 replicate runs 3.5 1.5 γ−sup estimator γ−estimator sample−mean−t mle−t 2.5 1.5 0.5 0.2 0.3 0.4 0.5 0.6 0.7 0.5 0.8 0.2 0.3 s 0.4 0.5 0.6 0.7 0.8 s Figure 3: Efficiency comparison The γ-SUP algorithm, which is based on iteratively reweighted model average is more efficient than the γ-estimator that is iteratively reweighted data average MLE (light blue) and sample mean (red) are used as two references Application to cryo-EM images To evaluate the performance of γ-SUP, we use the image data created from a model molecule so that we can compare our result with the known solution A total of 128 distinct 2-D images with 100x100 pixels are generated by projecting the X-ray crystal structure of RNA polymerase II filtered to 20 Angstrom in equally spaced (angle-wise) orientations.4 Each image is then convoluted with electron microscopy contrast transfer function (defocus µm) Finally, 6400 images are randomly sampled with replacement from these 128 projections with iid Gaussian noise N (0, 402 ) added, to reflect the signal-to-noise ratio characteristic of experimental cryo-EM images We test two circumstances In the first case, the noisy image replicates of the same orientation are all perfectly aligned In the second case, 10% Data source: The X-ray model of RNA polymerase II is from Protein Data Bank (PDB: 1WCM) 19 of the images are misaligned Both Principal Component Analysis (PCA) and Multilinear Principal Component Analysis (MPCA, Lu et al 2008; Hung et al 2012) are used to reduce the dimension from 10000 to 100 In PCA, each image is presented by a vector with length of 10000 and thus a huge covariance matrix with dimension of 10000 by 10000 over 6400 data points is created and 9999 parameters are required to construct each component By contrast, MPCA models each component with a tensor structure as a column vector times a row vector, and becomes a powerful scheme for dimension reduction while still captures core information of each image (Hung et al., 2012) As such, the covariance matrix used in this algorithm is 100 by 100 and the required number of parameters for each component is 198, together greatly boosting the computation speed We apply k-means and γ-SUP on images reconstructed by PCA and MPCA and summary their accuracy rates in Table For k-means, we assign the correct cluster numbers and present the best result among 10 sets of random initials For γ-SUP , we choose the best tuning parameters The result shows MPCA outperforms PCA on all the situations Thus, we apply MPCA for dimension reduction for the rest of the analysis Table also shows that γ-SUP performs much better than k-means We have more detail analysis as follows 5.1 Clustering with perfectly-aligned images The parameter s needs to be specified for executing γ-SUP For this data set, the performance is quite robust when 0.01 ≤ s ≤ 0.03 and we choose s = 0.025 for the rest analysis on cry-EM images As mentioned before, Σ is modeled as σ Ip , where σ is determined by the scale parameter τ in (15) τ is proportional to the impact region (support region) of the 20 q-Gaussian distribution, which allows the users to tune the similarity level inside a cluster When the scale parameter τ is small enough, the result will always output 6400 clusters (each individual cryo-EM image forms one cluster) and when τ is too large, the result will output one cluster (all the images belong to the same cluster) We show the convergent cluster numbers for various values of τ in Figure The cluster number remains at 128, the true number, for quite a wide range: τ ∈ [83, 105] 130 Number of Clusters 125 120 115 110 105 80 90 100 110 120 130 140 Scale Parameter 150 160 170 Figure 4: The cluster numbers by γ-SUP with scale parameter τ ∈ [83, 160] We also observe a phase transition of γ-SUP When τ is below 83, γ-SUP outputs 6400 clusters, which means that each cluster contains one image When τ reaches 83, the cluster number becomes 128, a perfect result There exists no intermediate result between these two for this data set as shown in Figure Remind that γ-SUP updating ignores the influence of data outside a certain range determined by τ and this data set is constructed that each cluster has similar within-cluster distance When the corresponding scale parameter τ is small enough that there is no influence between any two images, then γ-SUP leads to 6400 21 clusters with each individual as one cluster On the other hand, when the scale parameter reaches a critical value, the images in the same cluster can start attracting each other and will finally merge This explains why a phase transition occurs We observe similar phase transition phenomena for various noise structures, of which some may not happen at the perfect cluster result, but not far from it Thus, this value, where the phase transition occurs, can be treated as a starting value for a reasonable range for the scale parameter τ 7000 6000 Number of Clusters 5000 4000 3000 2000 1000 82 82.5 83 Scale Parameters 83.5 84 Figure 5: A phase transition occurs when the scale parameter τ is 83 We compare the performance of γ-SUP versus k-means We assign three numbers of clusters for k-means: 128, 150 and 200 Similarly, we try three scale parameter values for γ-SUP: 100, 110, and 120 To compare the results, we standardize 200 output clusters for each method by allowing empty clusters The sizes of clusters sorted in ascending order are presented in Figure γ-SUP with scale parameter τ = 100 matches the true cluster size perfectly We further check carefully that each image is indeed perfectly clustered, not merely a cluster-size match Actually, for τ ∈ [83, 100], γ-SUP matches the truth perfectly Figure also shows that when the scale parameter increases, SUP might merge 22 Perfect Alignment Cryo−EM Data 180 kmeans−200 kmeans−150 kmeans−128 γ−SUP−120 γ−SUP−110 True, γ−SUP−100 160 140 Cluster Size 120 100 80 60 40 20 0 50 100 Sorted Cluster 150 200 Figure 6: The cluster size produced by k-means and γ-SUP with various parameter values For each algorithm, the horizonal-axis is the cluster-size index 72 empty clusters are added to the 128 data clusters for better view of comparison γ-SUP-100 perfectly matches the true data clusters and thus also labeled as true two or more clusters into one cluster For example, when the scale parameter equals 110, γ-SUP makes no other mistake except that it merges two clusters into one and the cluster number is reduced to 127 When the scale parameter equals 120, there are three cases that γ-SUP merges two clusters into one and the cluster number becomes 125 Other than that, it makes no mistakes On the other hand, k-means makes much more mistakes that it creates more mixture clusters with bigger sizes even when the true cluster number 128 is given It gets worse when the cluster number is wrongly assigned We further show, in 23 Figure 7, some images that γ-SUP successfully separates but k-means fails 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Figure 7: The three columns are three projections with different orientations which k-means merge them as one cluster and γ-SUP separates them perfectly The first three rows show the replicate images with iid Gaussian noises The fourth row shows the reconstructed images by the first 100 multilinear principal components 5.2 Clustering with misaligned images It is highly possible that cryo-EM images can not be well-aligned due to their low SNR Here, an experiment is designed to test the performance of these clustering algorithms when misaligned images exist From each of the 128 clusters, 10% (about to images per cluster) of the images are randomly chosen to be rotated in the order of 7.5, 15, 22.5, 24 30, 37.5 and 45 angular degrees (o ) clockwise Each rotated image is then different from images of its original cluster An ideal situation is to treat each of these misaligned images as a singleton cluster Including these singleton clusters, the total cluster number becomes 771, while the meaningful cluster number remains 128 We assign the cluster numbers for k-means to be 128, 500 and 771 to check its performance For γ-SUP, we follow those τ values used in perfect alignment case: 100, 110 and 120 The results are presented in Figure When the scale parameter equals 100, γ-SUP again matches the true one perfectly in both cluster number, i.e., k = 771, and the cluster contents, i.e., a perfect match for 128 clusters’ contents for well-aligned images and 643 singleton clusters for misaligned images γ-SUP with the other two scale parameter values also produce substantially better results than the k-means algorithm Conclusion γ-SUP provides SUP clustering algorithm an information-theoretic framework and formulates it as a minimum γ-divergence estimation for q-Gaussian mixture At the same time, γ-SUP also provides the minimum γ-divergence estimation for q-Gaussian mixture an implementation via SUP γ-SUP has some crucial advantages summarized as follows • γ-SUP does not require any initials, and the number of clusters k is data-driven • In γ-SUP, the parameters (γ, q, σ) involved in γ-divergence and q-Gaussian mixture are transformed to the scale parameter τ , see (15), and the power parameter s, see (13) and (14) We show that γ-SUP has robust performance over s in many scenarios Once s is chosen, the phase transition scheme may suggest a reasonable range for τ This transformation to (τ, s) and the observation of the phase transition greatly reduce the 25 10% Misalignment Cryo−EM Data 180 160 kmeans−771 kmeans−500 kmeans−128 γ−SUP−120 γ−SUP−110 True, γ−SUP−100 140 Cluster Size 120 100 80 60 40 20 0 100 200 300 400 500 Sorted Cluster 600 700 800 Figure 8: The cluster size produced by k-means and γ-SUP with various parameter values on rotated cyro-EM images For each algorithm, the x-axis is the cluster index rearranged by the cluster size 643 singleton clusters are added to the 128 data clusters to accommodate misaligned images γ-SUP-100 perfectly matches the true data clusters and thus also labeled as true difficulty in selecting the tuning parameters • γ-SUP adopts an iterative shrinkage estimation At each iteration, the updating process shrinks the mixture model estimation toward cluster centers It acts as if the effective temperature parameter in γ-SUP is continuously decreasing and leads to a more efficient estimation scheme In addition, we successfully apply γ-SUP to cluster the cryo-EM images The cryo-EM images have the characteristics of having low signal-to-noise ratio, containing many misaligned 26 images as outliers, and consisting of a large number of clusters due to free orientations γSUP overcomes these issues and outperforms the k-means algorithm, which is currently the most popular clustering method for analyzing cryo-EM images The progress on clustering made by γ-SUP is anticipated to facilitate cryo-EM to reach a better resolution for 3-D structure reconstruction References Amari, S and Ohara, A (2011) Geometry of q-Exponential Family of Probability Distributions Entropy, 13(6):1170–1185 Banerjee, A., Merugu, S., Dhillon, I S., and Ghosh, J (2005) Clustering with Bregman Divergences Journal of Machine Learning Research, 6:1705–1749 Basu, A., Harris, I R., Hjort, N L., and Jones, M C (1998) Robust and Efficient Estimation by Minimising a Density Power Divergence Biometrika, 85(3):549–559 Chang, W.-H., Chiu, M.-K., Chen, C.-Y., Yen, C.-F., Lin, Y.-C., Weng, Y.-P., J.-C., C., Wu, Y.-M., Cheng, H., Fu, J., and Tu, I.-P (2010) Zernike Phase Plate cryo-Eelectron Microscopy Facilitates Single Particle Analysis of Unstained Asymmetric Protein Complexes Structure, 18:17–27 Chen, T.-L and Shiu, S.-Y (2007) A Clustering Algorithm by Self-Updating Process JSM Proceedings, Statistical Computing Section, Salt Lake City, Utah; Amer- ican Statistical Association, pp:2034–2038 27 Cichocki, A and Amari, S (2010) Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities Entropy, 12(6):1532–1568 DeRosier, D J and Klug, A (1968) Reconstruction of Dimensional Structures from Electron Micrographs Nature, 217(5124):130–134 Dubochet, J (2012) Cryo-EMthe first thirty years Journal of Microscopy, 245(3):221–224 Eguchi, S., Komori, O., and Kato, S (2011) Projective Power Entropy and Maximum Tsallis Entropy Distributions Entropy, 13(10):1746–1764 Field, C and Smith, B (1994) Robust Estimation - A Weighted Maximum-Likelihood Approach International Statistical Review, 62(3):405–424 Frank, J (2002) Single-Particle Imaging of Macromolecules by Cryo-Electron Microscopy Annu Rev Biophys Biomol Struct, 31:303–319 Frank, J (2006) Three-Dimensional Electron Microscopy of Macromolecular Assemblies: Visualization of Biological Molecules in Their Native State Oxford University Press, Oxford; New York, 2nd edition Fujisawa, H and Eguchi, S (2008) Robust Parameter Estimation with A Small Bias Against Heavy Contamination Journal of Multivariate Analysis, 99(9):2053–2081 Grigorieff, N and Harrison, S (2011) Near-Atomic Resolution Reconstructions of Icosahedral Viruses from Electron Cryo-Microscopy Current Opinion in Structural Biology, 21:265–273 Henderson, R (1995) The Potential and Limitations of Neutrons, Electrons and X-Rays 28 for Atomic-Resolution Microscopy of Unstained Biological Molecules Quarterly Reviews of Biophysics, 28(2):171–193 Hung, H., Wu, P.-S., Tu, I.-P., and Huang, S.-Y (2012) On Multilinear Principal Component Analysis of Order-Two Tensors Biometrika, to appear Jiang, W., Baker, M., Jakana, J., Weigele, P., King, J., and Chiu, W (2008) Backbone Structure of the Infectious Epsilon15 Virus Capsid Revealed by Eelectron Cryomicroscopy Nature, 451:1130–1134 Liu, H., Jin, L., Koh, S., Atanasov, I., Schein, S., Wu, L., and Zhou, Z (2010) Atomic Structure of Human Adenovirus by Cryo-EM Reveals Interactions Among Protein Networks Science, 329:1038–1043 Lu, H., Plataniotis, K N., and Venetsanopoulos, A N (2008) MPCA: Multilinear Principal Component Analysis of Tensor Objects IEEE Transactions on Neural Networks, 19:18– 39 Mollah, M N H., Sultana, N., Minami, M., and Eguchi, S (2010) Robust Extraction of Local Structures by The Minimum Beta-Divergence Method Neural Networks, 23(2):226– 238 Saibil, H R (2000) Macromolecular Structure Determination by Cryo-Electron Mi- croscopy Acta Crystallographica Section D-Biological Crystallography, 56:1215–1222 Shiu, S.-Y and Chen, T.-L (2012) Clustering by Self-Updating Process arxiv:1201.1979 Sorzano, C., Bilbao-Castro, J., Shkolnisky, Y., Alcorlo, M., Melero, R., CaffarenaFernandez, G., Li, M., Xue, G., Marabini, R., and Carazo, J (2010) A Clustering 29 Approach to Multireference Alignment of Single-Particle Projections in Electron Microscopy Journal of Structural Biology, 171:197–206 van Heel, M., Gowen, B., Matadeen, R., Orlova, E V., Finn, R., Pape, T., Cohen, D., Stark, H., Schmidt, R., Schatz, M., and Patwardhan, A (2000) Single-Particle Electron CryoMicroscopy: Towards Atomic Resolution Quarterly Reviews of Biophysics, 33(4):307– 369 Windham, M P (1995) Robustifying Model Fitting Journal of the Royal Statistical Society Series B-Methodological, 57(3):599–609 Yu, X K., Jin, L., and Zhou, Z H (2008) 3.88 Angstrom Structure of Cytoplasmic Polyhedrosis Virus by Cryo-Electron Microscopy Nature, 453(7193):415–419 30 Table 1: γ-SUP clustering algorithm Data matrix X ∈ ℜn×p , n instances with p variables; Inputs: Tuning parameters (s, τ ) Number of clusters k and cluster centers {ˆ µh }kh=1; Outputs: Cluster membership assignment {ci }ni=1 for each of {xi }ni=1 begin Iter ← start with: µi ← xi /τ, i = 1, , n repeat for i = : n wij ← exp1−s − 2+(p+2)s µi − µj zi ← n j=1 wij n k=1 wik 2 µj (update every point) end for i = : n µi ← zi end Iter ← Iter + until convergence output distinct {τ · µi , ≤ i ≤ n} and cluster membership end Note: The parameter τ is linear proportional to the hard influence region radius that defines the similarity inside a cluster We observe a phase transition in cryo-EM image analysis that can suggest a reasonable region for τ 31 Table 2: Means (standard errors) of different methods in estimating µ = under different proportions of contamination π γ-SUP-I uses the center of the largest cluster as the mean parameter estimate γ-SUP-II resorts back to original data and uses the sample average of original data which have been assigned to the largest cluster Method \ contamination π = 0.1 π = 0.3 γ-SUP-I 0.29 × 10−4 (0.011) 0.70 × 10−4 (0.108) γ-SUP-II 0.42 × 10−4 (0.099) 0.63 × 10−4 (0.115) RMF 9.49 × 10−4 (0.115) 2.64 × 10−4 (0.246) Table 3: Methods comparison for both perfect alignment and 10% misalignment Dim reduction MPCA conventional PCA Algorithm γ-SUP k-means γ-SUP k-means Perfect s = 0.025 k = 128 s = 0.025 k = 128 Alignment τ = 100 random initials τ = 140 random initials Accuracy 0.8317 0.9609 0.7920 10% s = 0.025 k = 128 s = 0.025 k = 128 Misalignment τ = 100 Accuracy random initials τ = 131.6 random initials 0.7361 32 0.8636 0.7097 ... Proceedings, Statistical Computing Section, Salt Lake City, Utah; Amer- ican Statistical Association, pp:2034–2038 27 Cichocki, A and Amari, S (2010) Families of Alpha- Beta- and Gamma- Divergences:... formulate the clustering algorithm, named γ-SUP, as a minimum γ-divergence estimation (Fujisawa and Eguchi, 2008; Cichocki and Amari, 2010; Eguchi et al., 2011) of q-Gaussian mixture (Amari and Ohara,... edition Fujisawa, H and Eguchi, S (2008) Robust Parameter Estimation with A Small Bias Against Heavy Contamination Journal of Multivariate Analysis, 99(9):2053–2081 Grigorieff, N and Harrison, S