[精品论文]A New Local PCA SOM Algorithm.doc

资源描述

《[精品论文]A New Local PCA SOM Algorithm.doc》由会员分享，可在线阅读，更多相关《[精品论文]A New Local PCA SOM Algorithm.doc（20页珍藏版）》请在三一文库上搜索。

1、精品论文A New Local PCA SOM AlgorithmDong Huang Zhang Yi and Xiaorong PuComputational Intelligence Laboratory, School of Computer Science andEngineering, University of Electronic Science and Technology of China, Chengdu610054, P. R. China.E-mail: donnyhuang, zhangyi, .AbstractThis paper proposes a Local

2、 PCA-SOM algorithm. The new competition measure is computational efficient, and implicitly incorporates the Mahalanobis distance and the reconstruction error. The matrix inversion or PCA decomposition for each data input is not needed as compared to the previous models. Moreover, the local data dist

3、ribution is completely stored in the covariance matrix instead of the pre-defined numbers of the principal components. Thus, no priori information of the optimal principal subspace is required. Experiments on both the synthesis data and a pattern learning task are carried out to show the performance

4、 of the proposed method.Key words: Neural Networks; Unsupervised Learning; Local Principal ComponentAnalysis; Self-Organizing Mapping1 IntroductionPrincipal component analysis (PCA) has been wildly used in dimension re- duction of multi-variate data, data compression, pattern recognition and sta- ti

5、stical analysis. A PCA algorithm is designed to approximate the original high-dimensional pattern space by a low-dimensional subspace spanned by the principal eigenvectors of the data covariance matrix. In this way, the data distribution can be represented and reconstructed by the principal eigenvec

6、- tors and their corresponding eigenvalues. However, PCA is globally linear and1 This work was supported by National Science Foundation of China under Grant60471055 and Specialized Research Fund for the Doctoral Program of Higher Edu- cation under Grant 20040614017.Preprint submitted to Elsevier Sci

7、ence 28 September 20071inefficient for data distribution with non-linear dependencies 1. Several ex- tensions have been suggested to overcome this problem. These extensions fall into two main categories: the global nonlinear approach and locally linear de- scriptions. The former includes the Princip

8、al Curves 2 and Kernel PCA 3. We focus on the latter approaches in which the data distribution is represented by a collection of Local PCA units 78.When describing a data distribution with a mixture model, the question arises what kind of units should the mixture contain. In Gaussian mixture models

9、7, the local iso-density surface of a uniform Gaussian unit is a sphere. And a local PCA unit corresponds to a multivariate Gaussian with an ellipsoid iso-density surface. Despite its greater complexity, the local PCA is favorable over a sphere unit for the following reasons. An ellipsoid can descri

10、be a local structure for which many spheres are needed. Furthermore, data distributions are usually constrained locally to subspaces with fewer dimensions than the space of the training data. Thus, there are directions in which the distribution has locally zero variance (or almost zero because of no

11、ise). An ellipsoid unit representation can extend its minor components into the additional noise dimensions. But the computational cost increases over-proportionally with the number of principal components needed.In the recent work NGAS-PCA 4, the local PCA model is based on soft competition scheme

12、of Neural Gas algorithm and the RRLSA 10 online PCA learning in each local unit. The novelty of the NGAS-PCA is that its dis- tance measure for neuron competition is the combination of a normalized Mahalanobis distance and the squared reconstruction error. Note that storing the principal basis vecto

13、rs for every unit involves rather complicated updating procedures as in NGAS-PCA and ASSOM 9. Another recent model is called the PCA-SOM 5 which proposes an alternative way by storing the local in- formation in the covariance matrix. This is directly derived from on statistics theories and has great

14、 advantages over the ASSOM model both in computa- tion burden and reliability of the result. However, PCA-SOM only uses the reconstruction error in the principal subspace in its neuron competition. For this reason, PCA-SOM still need the PCA decomposition or updating with predefined numbers of the p

15、rincipal components when each training data is presented.In this paper, we propose a new computational efficient Local PCA algorithm that combines the advantages of NGAS-PCA and PCA-SOM. Each unit is associated with its mean vector and covariance matrix. The new competition measure implicitly incorp

16、orate the reconstruction error and distance between the input data and the unit center. In the proposed algorithm, the extra up- dating step of the principal subspaces is eliminated from the data distribution learning process. In addition, the local distribution of the input data is com- pletely sto

17、red in the covariance matrix instead of the predefined numbers of8精品论文the principal components. One potential application of the proposed model is the nonlinear pattern learning and recalling 11. In most models of associative memory, memories are stored as attractive fixed points at discrete locatio

18、ns in state space. Discrete attractors may not be appropriate for patterns with con- tinuous variability 13. For example, the images of a three-dimensional object from different viewpoints 12. After the training process, the data distribution is represented by a collection of local linear units. No

19、priori information for the optimal principal subspace is needed for the pattern representation.The rest of this paper is organized as follows. In section 2, the proposed compe- tition measure and neuron updating method are formulated. Section 3 presents the simulation results and discussions. Sectio

20、n 4 briefly reviews the NGAS- PCA and PCA-SOM algorithms. And their relations to the proposed method are also discussed in details in this section. Finally, the paper is concluded in Section 5.2 The Local PCA modelIn this section, we present the detailed formulation of the Local PCA models. The foll

21、owing notations are used throughout the paper. The input data setis X = xi |xi Rd , i = 1, , M , where d is the data dimension. For each local unit, there are the center of the unit cj and the covariance matrixCj where j = 1, , N . The principal subspace of Cj is characterized by W (p) = w1, , wp an

22、d (p) = diag1 , , p , where wi is the eigenvector and i the corresponding eigenvalue (i = 1, , p). Throughout the paper the eigenvalues are arranged in the descending order. Thus p is the number of underlying principal components. The rest (d p) eigenvectors spans theunderlying minor subspace.2.1 Th

23、e Proposed Competition MeasureWe seek to formulate a local PCA algorithm that can simplify both the neuron competition and updating. As in PCA-SOM, each unit in our model is also as- sociated with its center and the covariance matrix. Thus the data distribution is represented by a mixture of the ell

24、ipsoid local units. This is closely related to the anisotropic K-means clustering and kernel density estimation where we seek to minimizing the Mahalanobis distanceM ahalanobis(x) = xT C 1 x,or maximizing the normalized Gaussian density function:(x) = 1e x C x,1 T 121Z +(x)dx = 1,(1)2|C | 2 where |C

25、 | = 0. However, due to the matrix inversion C 1 , the computation complexity of the Mahalanobis distance is O(d!). We consider an alternativeneuron competition measure:N R(x, C ) =xT C xtr(C )(xT x)2,(2)dwhere the size of the hyper-ellipsoid is normalized by tr(C ) = X i . The newi=1measure is call

26、ed the Normalized Rayleigh ratio N R(x, C ). It is the normalizedscattering intensity of the data points. The computation burden of N R(x, C )is reduced to O(d2 ). The definition of N R(x, C ) is based on the assumptionthat xT x = 0. The following analysis start from the the Rayleigh quotient.Given

27、a Hermitian matrix C and nonzero vector x, the Rayleigh quotientR(x, C ) is defined by:R(x, C ) =xT C x xT x .Rayleigh-Ritz Theorem: If the eigenvalues of a Hermitian (symmetric) ma- trix C is ordered as max = 1 2 d1 d = min , thenmin xT C xxT x max ,(3)where the equalities hold when x is an eigenve

28、ctor of C . The gradient andHessian matrix of R(x, C ) can be computed as:R 2(x) =(C R I )x, (4)x xT x2R 2R T R THR (x) = xxT = xT x C x xwhere I is the identity matrix. x( x )x R I , (5)For the jth local PCA neuron unit with its center cj and covariance matrixCj , the Normalized Rayleigh ratios N R

29、(, Cj ) is then written asT C N R(, Cj ) = tr(C )(T )2 ,(6)jwhere = x cj . If N R(, Cj ) is used as the competition measure in the probabilistic SOM training process, the winner v = maxN R(x, cj (t), Cj (t).The final result of the training process is obtained when N R(x, cj (t), Cj ) (j =1, , N ) is

30、 maximized for x x|E(x) = cj , E(xxT ) = Cj . To analyze themaximum property of N R, the neuron index j is omitted.It is easy to know from (4) and (5) that () = 0 only at the critical points, where = wi (i = 1, , d). And at these critical pointsHR (wi )wj = 0j = i, (j i )wjj = i.In real applications

31、 the covariance matrix C are positive defined, i.e. rank(C ) =d. Note that x = 0, there is no local minima or maxima except for those onthe edge of the domain, i.e. the maxima at = wd , and the minima at = w1.TT This shows that the Rayleigh ratio C is only polarized by the directions ofthe input vec

32、tors.Take the logistic of N R(, C ) and compute the gradient,(ln(N R(, C )= (ln(T C ) 2ln(T )2C = T C 2=4xT C x T 2x T C .(7)T C T Without losing generality, we assume = x c is in the subspace spanned by the eigenvectors of the covariance matrix C .1/2 = 1 w1 + + d wd ,kkdiwhere X 2 = 1. The second

33、term in (7) is written asi=1C T 2 T C d dd d != kk3/2 X i wi wT X i wi 2 X i wi X i 2iiiiii d d j= kk3/2 X i i 2(X j 2)i wi .ijSince wi , i = 1, , d is a basis vector set in Rd , ln(N R)/ = 0 only ifdji i 2(X j 2)i = 0,jfor all i = 1, , d. However, this is not possible (note kk = 0). ThusN R(, C ) h

34、as no local maxima except on the edge of the domain. From theRayleigh-Ritz Theorem (3), we have1T C 1 1T T C 1 T max T min max minandminT C maxminT C maxtr(C ) tr(C )T tr(C ) . tr(C )T tr(C )(T )2 tr(C )T .In the probabilistic framework as SOM training, the iterative algorithm uses input data (x x|E

35、(x) = c, E(xxT ) = C ) over multiple pass. We considerthe property of the training results in the sense of mathematical expectation,E min(T )2 # #= E min T C 1 T= E(tr(C ) Pdi) = = i=1. (8)kk6=0T C kk6=0maxmaxmaxIt can be observed from (8) that maximizing the Normalized Rayleigh ratio (2)through the

36、 SOM training process is equivalent to minimizing the Mahalanobis distance xT C 1 x, with a constrain on the size of the hyper-ellipsoid tr(C ) =dX i .i=12.2 Updating of the NeuronsThe neuron centers and covariance matrices are updated similar to that of PCA-SOM. In practical situation, the Normaliz

37、ed Rayleigh ratio (6) is regu- lated as:TNR(, C ) =C + ,tr(C )(T )2 + jin case kk = 0 occurs, where = x cj (t) and 0 is a small constant. When a input data point,i.e. x, is randomly selected from the data set X , the winning neuron v is chosen as v = maxN R(x, cj (t), Cj (t) (j = 1, , N ).Then all t

38、he units (j = 1, , N ) are updated as follows:cj (t + 1) = cj (t) + (t)hv,j (t) x cj (t) ,(9)C j (t + 1) = (x cj (t + 1)(x cj (t + 1)T,(10)jCj (t + 1) = Cj (t) + (t)hv,j (t) hC (t + 1) Cj (t)i ,(11)where (t) is the learning rate, and hv,j (t) is the neighborhood function of jth neuron given the winn

39、ing neuron v. The training process stops when the changes of cj and Cj can be neglected. In the proposed method, the amplitudeof neuron updates (9)-(11) are not dependent on the iteration number, but on how well the neural network fits the input data 6. To achieve this, thelearning rate (t) is calcu

40、lated as follows:(t) = kx(t) cv (t)k2 , r(t)wherer(t) = maxkx(t) cv (t)k2, r(t 1),r(0) = kx(0) cv (0)k2.The neighborhood function depends on N R(cj (t), cv (t), Cv (t) and is calcu- lated ashv,j (t) = exp N R(cj (t), cv (t), Cv (t) !(t),(12)where is a constant that scales the size of the neighborhoo

41、d. In our experi-Nments in Section 3, it can be estimated as 1of the total variance of the datadistribution.The proposed method is an iterative deterministic algorithm similar to the neural networks for global PCA and SOM . The proposed method updates neurons and converges iteratively using the sequ

42、entially training data over multiple passes. New data can be incorporated to update the outputs by added to the previous training set. The iterations continue until converge. Here, convergence is reached when changes of neuron centers and covariance matrix can be neglected.3 Simulations and Discussi

43、onWe test the proposed local PCA method on synthetic data of two and three dimensions and on high-dimensional data in pattern completion learning and recalling tasks.3.1 Synthetic DataThe following setting was used in the training and demonstration: The initial neuron centers are randomly located at the data points. The initial covariance matrix were identity matrices multiplied with a scaler related to the spatial extension of

展开阅读全文