A Highly Efficient Parallel Algorithm for Computing the Fiedler Vector.pdf

资源描述

《A Highly Efficient Parallel Algorithm for Computing the Fiedler Vector.pdf》由会员分享，可在线阅读，更多相关《A Highly Efficient Parallel Algorithm for Computing the Fiedler Vector.pdf（22页珍藏版）》请在三一文库上搜索。

1、arXiv:1003.3689v1 cs.NA 18 Mar 2010 A Highly Effi cient Parallel Algorithm for Computing the Fiedler Vector Murat Manguoglu Purdue University March 22, 2010 Abstract The eigenvector correspondingto the second smallest eigenvalue of the Laplacian of a graph, known as the Fiedler vector, has a number

2、of applications in areas that include matrix reordering, graph parti- tioning, proteinanalysis, data mining,machinelearning,and web search. The computationofthe Fiedler vector has been regarded as an expensive process as it involves solving a large eigenvalue problem. We present a novel and effi cie

3、nt parallel algorithm for computingthe Fiedler vector of large graphs based on the Trace Minimization algorithm (Sameh, et.al). We compare the parallel performance of our method with a multilevel scheme, designed specifi cally for computing the Fiedler vector, which is implemented in routine MC73 Fi

4、edler of the Harwell Subroutine Library (HSL). In addition, we compare the quality oftheFiedlervectorfortheapplicationofweightedmatrixreorderingandprovidea metricformeasuring the quality of reordering. 1Introduction The second smallest eigenvalue and the corresponding eigenvector of the Laplacian of

5、 a graph have been used in a number of application areas including matrix reordering 11, 10, 9, 1, graph partitioning 14, 15, machine learning 13, protein analysis and data mining 5, 18, 8, and web search 4. The second smallest eigenvalue of the Laplacian of a graph is sometimes called the algebraic

6、 connectivity of the graph, and the corresponding eigenvector is known as the Fiedler vector, due to the pioneering work of Fiedler 3. For a given nn sparse symmetric matrix A, or an undirected weighted graph with positive weights, one can form the weighted-Laplacian matrix, Lw, as follows: Lw(i, j)

7、 = ? j|A(i, j)| if i = j, |A(i, j)|if i 6= j. (1) One can obtain the unweighted Laplacian by simply replacing each nonzero element of the matrix A by 1. In this paper, we focus on the more general weighted case; the method we present is also applicable to the unweighted Laplacian. Since the Fiedler

8、vector can be computed independently for disconnected graphs, we assume that the graph is connected. The eigenvalues of Lware 0 =1 0,Xk XkXconv(XT convXk); 8. if nconv= 0 then Solve the linear system LWk= Xkapproximately with relative residualinvia the PCG scheme using the diagonal preconditioner D;

9、 else Solve the linear system LWk= Xkapproximately with relative residualinvia the PCG scheme using the diagonal preconditioner D; 9. Form the Schur complement Sk XT kWk; 10. Solve the linear system SkNk= XT kXk for Nk; 11. Update Xk+1 Xkk= WkNk; Figure 1: TraceMin-Fiedler algorithm. 3 Algorithm 2:

10、Data: Lx = b, L is the nn Laplacian matrix defi ned in Eqn.(1) ,inis the stopping criterion for the |.|of the relative residual, b is the right hand side, and M is the preconditioner Result: x is solution of the linear system Solve the preconditioned system (M1/2LM1/2)(M1/2x) = (M1/2b); L = M1/2LM1/

11、2; b = M1/2b; x = (M1/2x); x0 0,.,0T; r0bL x0; p0 r0; for k = 1,2,. max it do 1.k rT k rk pT k L pk; 2. xk+1 xk+k pk; 3. rk+1 rkkL pk; 4. if | rk+1|/| r0|inthen exit 5.k rT k+1 rk+1 rT k rk ; 6. pk+1 rk+1+k pk; Figure 2: Preconditioned Conjugate Gradient Scheme for solving systems in the form Lx = b

12、 . We note, in TraceMin-Fiedler, that after the smallest eigenvector, which corresponds to the null space of L, has converged then in preconditioned CG in Figure 2, pT k L pk 0.(7) Observing that v b due to the defl ation step, the proof is given below. Theorem 2.1 Let L be symmetric positive semide

13、fi nite such that Lv = 0 (i.e.N (L) = spanv) and M = diag(L) ( M1/2LM1/2M1/2v = 0 L v = 0 where v = M1/2v ) . The following statement is true for the preconditioned conjugate gradient method in Figure 2: if v b then v pnand v rn Proof (by induction) The base case: vT p0= vT r0 = vT(bL x0),(note x0=

14、0) = vTb =vTM1/2M1/2b =vTb =0 4 Inductive hypothesis: Assume that v pkand v rkfor n = k. Inductive step: Then for step n = k+1, vT rk+1= vT( rkkL pk) =0 (by inductive hypothesis and vTL = 0) and vT pk+1= vT( rk+1+k pk) =0+k vT pk =0 (by inductive hypothesis) Therefore, we do not need to use a diagon

15、al perturbation after the smallest eigenvalue and the corresponding eigenvector have converged. We note that our algorithm can easily compute additional eigenvectors of the Laplacian matrix by setting p to be the number of desired smallest eigenpairs. Parallelism in the algorithm is achieved by part

16、itioning all vectors, Xk,Vk,Wk and the coeffi cient matrix L by block rows where each MPI process contains one block. The matrix and the vectors are partitioned into blocks of roughly equal size. The most time consuming operation in Figure 1 is the solution of the linear systems involving L. The dia

17、gonal preconditioner does not require any communucations. The sparse matrix-vector multiplications does require communication, however, with the amount of communication de- termined by the sparsity structure of the matrix. Therefore, the overall scalability of the algorithm is problem dependent. In

18、the implementation in this paper we only communicate the elements that are needed to com- plete the product via asynchronous point to point communication (i.e. using MPI ISEND and MPI IRECV). The remaining operations that require communication are the inner products that use MPI ALLREDUCE with vecto

19、rs of multiple columns. 3Numerical Results We implement the parallel TraceMin-Fiedler algorithm 12 in Figure 1 in parallel using MPI. We compare the parallel performance of MC73 Fiedler with TraceMin-Fiedler using a cluster with Infi niband intercon- nection where each node consists of two quad-core

20、 Intel Xeon CPUs (X5560) running at 2.80GHz (8 cores per node). For both solvers we set the stopping tolerance for the norm of the eigenvalue problem resid- ual to 105. In TraceMin-Fiedler we set the inner stopping criterion (relative residual norm for solving the linear systems using the preconditi

21、oned CG scheme) asin= 101out, and the maximum number of the preconditioned CG iterations to be 30. For MC73 Fiedler, we use all the default parameters. The set of test matrices are obtained from the University of Florida (UF) Sparse Matrix Collection 2. A search for matrices in this collection which

22、 are square, real, and which are of order 2,000,000 N 5,000,000 returns the four matrices listed in Table 1. If the adjacency graph of A has any disconnected single vertices, we remove them since those vertices are independent and have trivial solutions. We apply both MC73 Fiedler and TraceMin-Fiedl

23、er to the weighted Laplacian generated from the adjacency graph of the preprocessed matrix where the weights are the absolute values of matrix entries. After obtaining the Fiedler vector x2produced by each algorithm, we compute the corresponding eigenvalue2, 2= xT 2Lx2 xT 2x2 .(8) 5 Table 1: Matrix

24、size (N), number of nonzeros (nnz), and type of test matrices. Matrix Group/NameNnnzsymmetricapplication 1. Rajat/rajat314,690,00220,316,253nocircuit simulation 2. Schenk/nlpkkt3,542,40095,117,792yesnonlinear optimization 3. Freescale/Freescale13,428,75517,052,626nocircuit simulation 4. Zaoui/kktPow

25、er2,063,49412,771,361yes optimum power fl ow Table 2: Relative residuals kLxxk/kLkfor TraceMin-Fiedler and MC73 Fiedler whereout= 105. TraceMin-FiedlerMC73 Fiedler Matrix/Cores1816321 rajat311.110121.110121.110121.110123.03109 nlpkkt9.11069.11069.11069.11066.49107 Freescale17.510127.510127.510127.51

26、0121.03107 kktPower3.110243.110243.110243.110244.07108 We report the relative residuals |Lx22x2|/|L|in Table 2. The total time required by TraceMin-Fiedler using 1, 2, and 4 nodes with 8 MPI processes, i.e. using 8 cores, per node are presented in Table 3. We emphasize that the parallel scalability

27、results for TraceMin- Fiedler ispreliminary and that there ismoreroom forimprovement. SinceMC73 Fiedler is purely sequential we have used it on a single core. The speed improvements realized by TraceMin-Fiedler on 1, 8, 16, and 32 cores over MC73 Fiedler on a single core are depicted in Figure 3, wi

28、th the actual solve times and the speed improvement values are given in Tables 3 and 4. Note that on 32 cores, our scheme realizes speed improvements over MC73 Fiedler that range between 4 and 641 for our four test matrices. Next, we compute the Fiedler vector of a symmetric matrix of dimension 11,3

29、33,520 11,333,520 and 61,026,416 nonzeros. The matrix is obtained from a 3D Finite Volume Method (FVM) discretization of a MEMS device. MC73 Fiedler consumes 75.5 seconds on a single core. The speed improvement of TraceMin-Fiedler is given in Table 4. We note that the results using single core on a

30、node has a much more memory bandwidth available compared to 8 cores per node. Therefore, the speed improvement from 1 to 8 cores (all on a single node) is not ideal. TraceMin-Fiedler is 44.2 times faster than MC73 Fiedler using 256 cores. Table 3: Total time in seconds (rounded to the fi rst decimal

31、 place) for TraceMin-Fiedler and MC73 Fiedler and the average number of inner PCG iterations, number of outer TraceMin iterations for TraceMin-Fiedler. TraceMin-FiedlerMC73 Fiedler Matrix/Cores# Outer(Avg. Inner) its.1816321 rajat312(1)5.6s1.4s0.7s0.4s81.5s nlpkkt2(30)100.5s24.9s15.3s10.8s83.9s Free

32、scale12(30)61.5s23.5s16.0s12.5s52.8s kktPower2(1)4.8s1.0s0.7s0.5s341.6s 6 1 10 100 1000 321681 Speed Improvement (TMC73_Fiedler / TTRACEMIN-Fiedler) Number of Cores (8 Cores per Node) rajat31 nlpkkt120 Freescale1 kkt_power Figure 3: Speed improvement of TraceMin-Fiedler compared to uniprocessor MC73

33、 Fiedler for four test problems. Table 4: Speed improvement over MC73 Fiedler (TMC73 Fiedler/T). TraceMin-FiedlerMC73 Fiedler Matrix/Cores1816321 rajat3114.559.2116.5227.51.0 nlpkkt0.83.45.57.81.0 Freescale10.92.23.34.21.0 kktPower71.2332.3501.0641.41.0 7 0.1 1 10 100 25612864321681 Speed Improvemen

34、t (TMC73_Fiedler / TTRACEMIN-Fiedler) Number of Cores (8 Cores per Node) Figure 4: Speed improvement of TraceMin-Fiedler compared to uniprocessor MC73 Fiedler for fvm matrix, TMC73 Fiedler= 75.5s 4Using the Fiedler vector for permuting the elements of a matrix One of the applications of the Fiedler

35、vector is matrix reordering and bandwidth reduction. One can obtain the permutation to achieve reduction in the (weighted or nonweigted) bandwidth of the matrix by sorting the elements of the Fiedler vector (see 1, 10 for details). Inthis section wepropose ametrictomeasure the quality ofthe reorderi

36、ng, namelythe relative bandweight. We compare the quality of the Fiedler vector using this metric. We defi ne the relative bandweight of a specifi ed band of the matrix as follows: wk(A) = i,j:|ij|k|A(i, j)| i,j|A(i, j)| .(9) In other words, the bandweight of a matrix A, with respect to an integer k

37、, is equal to the fraction of the total magnitude of entries that are encapsulated in a band of half-width k. We randomly selected matrices with smaller dimension to be able to visualize the effect of reordering from the UF Sparse Matrix Collection in Table 5. The relative residuals for the Fiedler

38、vector computed by both methods and the number of iterations for TraceMin-Fiedler is give in Table 6. In 2 cases, namely bcsstk22 and cvxbqp1, out of 10, the relative residual of the Fiedler vector from MC73 Fiedler did not reach the stopping tolerance of 105. In Figures 5 and 12 , we depict the rel

39、- ative bandweight comparison for these two cases and the resulting reordered matrices.In both cases TraceMin Fiedler produces a better reordering. The relative residual of MC73 Fiedler (3.5 1010) is signifi cantly better than TraceMin Fiedler (2.3106) for sparsine. However, the quality of reorderin

40、g is better for TraceMin Fiedler using both our bandweight metric as well as the sparsity plots of the reordered 8 Table 5: Properties of test matrices. Matrixnnnzapplication bcsstk22138696structral mechanics problem14142,779FEMLAB test matrix rail 13571,3578,985heat transfer c-192,32721,817nonlinea

41、r optimization eurqsa7,24546,142economics tuma212,99249,365mine model smt25,7103,749,582structral mechanics cvxbqp150,000349,968nonlinear optimization sparsine50,0001,548,988structural optimization F271,5055,294,285structral mechanics matrices. For 6 cases out of 10, TraceMin Fiedler generated a bet

42、ter reordering based on the sparsity plots and bandweights (see Figures 14,13, 12,11, 7, and 5), while in 3 cases (see Figures 10, 8, and 6) both methods produces comparable quality reorderings. Finally, for eurqsa, even though the bandweight mea- sure indicates the reordering is slightly better if

43、one uses MC73 Fiedler, the sparsity plots indicate better clustering of large elements using TraceMin Fiedler. 9 Table 6: Relative residuals and the approximate eigenvalue(2). TraceMin-FiedlerMC73 Fiedler Matrix|L|Relative Residual2# Outer(Avg. Inner) its.Relative Residual2 bcsstk225.31064.71066.010

44、23(30)2.21032.8104 problem11.71016.71064.61023(30)2.71064.6102 rail 13579.11058.21062.81094(30)5.41062.9108 c-191.21041.61063.81013(29)8.21064.0101 eurqsa1.31075.31089.21012(30)2.91074.3101 tuma21.01012.61068.91048(30)9.51068.6104 smt1.81078.31074.91022(30)5.21062.0104 cvxbqp17.01056.21067.51002(30)

45、1.71029.4103 sparsine3.21062.31061.41034(23)3.510101.0105 F24.21071.51081.01043(30)8.81064.7102 10 0.7 0.75 0.8 0.85 0.9 0.95 1 0 20 40 60 80 100 120 140 relative bandweight k no reordering mc73 reordering TRACEMIN-Fiedler reordering (a) bandweight (b) original matrix(c) MC73 Fiedler(d) TraceMin-Fie

46、dler Figure 5: Sparsity plots of bcsstk22; red and blue indicates the largest and the smallest elements, respec- tively, in the sparsity plots 11 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 300 350 400 450 relative bandweight k no reordering mc73 reordering TRACEMIN-Fiedler reordering (a) bandwei

47、ght (b) original matrix(c) MC73 Fiedler(d) TraceMin-Fiedler Figure 6: Sparsity plots of problem1; red and blue indicates the largest and the smallest elements, respec- tively, in the sparsity plots 12 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0 200 400 600 800 1000 1200 1400 relative bandweight k no reo

48、rdering mc73 reordering TRACEMIN-Fiedler reordering (a) bandweight (b) original matrix(c) MC73 Fiedler(d) TraceMin-Fiedler Figure 7: Sparsity plots of rail 1357; red and blue indicates the largest and the smallest elements, respec- tively. 13 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 0 500 1000 1500

49、 2000 2500 relative bandweight k no reordering mc73 reordering TRACEMIN-Fiedler reordering (a) bandweight (b) original matrix(c) MC73 Fiedler(d) TraceMin-Fiedler Figure 8: Sparsity plots of c-19; red and blue indicates the largest and the smallest elements, respectively. 14 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1000 2000 3000 4000 5000 6000 7000 8000 relative bandweight k no reordering mc73 reordering TRACEMIN-Fiedler reordering (a) bandweight (b) original

展开阅读全文