Skip to main content
Log in

An evaluation of low-cost heuristics for matrix bandwidth and profile reductions

  • Published:
Computational and Applied Mathematics Aims and scope Submit manuscript

Abstract

Hundreds of heuristics have been proposed to resolve the problems of bandwidth and profile reductions since the 1960s. We found 132 heuristics that have been applied to these problems in reviews of the literature. Among them, 14 were selected for which no other simulation or comparison revealed that the heuristic could be superseded by any other algorithm in the analyzed articles with respect to bandwidth or profile reduction. We also considered the computational costs of the heuristics during this process. Therefore, these 14 heuristics were selected as potentially being the best low-cost methods to solve the bandwidth and/or profile reduction problems. Results of the 14 selected heuristics are evaluated in this work. For evaluation on the set of test problems, a metric based on the relative percentage distance to the best possible bandwidth or profile is proposed. The most promising heuristics for several application areas are identified. Moreover, it was found that the FNCHC and GPS heuristics showed the best overall results in reducing the bandwidth of symmetric and asymmetric matrices among the evaluated heuristics, respectively. In addition, the NSloan and MPG heuristics showed the best overall results in reducing the profile of symmetric and asymmetric matrices among the heuristics among the evaluated heuristics, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  • Benzi M (2002) Preconditioning techniques for large linear systems: a survey. J Comput Phys 182:418–477

    Article  MathSciNet  MATH  Google Scholar 

  • Benzi M, Szyld DB, Van Duin A (1999) Orderings for incomplete factorization preconditioning of nonsymmetric problems. SIAM J Sci Comput 20(5):1652–1670

    Article  MathSciNet  MATH  Google Scholar 

  • Bernardes JAB (2015) Gonzaga de Oliveira, S.L.: A systematic review of heuristics for profile reduction of symmetric matrices. In: Koziel S, Leifsson L, Lees M, Krzhizhanovskaya VV, Dongarra J, Sloot PM (eds) International conference on computational science, ICCS 2015, vol 51, pp 221–230. Elsevier, Reykjavik, Iceland

  • Burgess DA, Giles M (1997) Renumbering unstructured grids to improve the performance of codes on hierarchial memory machines. Adv Eng Softw 28(3):189–201

    Article  Google Scholar 

  • Burgess IW, Lai PKF (1986) A new node renumbering algorithm for bandwidth reduction. Int J Numer Methods Eng 23:1693–1704

    Article  MATH  Google Scholar 

  • Chagas GO, Gonzaga de Oliveira SL (2015) Metaheuristic-based heuristics for symmetric-matrix bandwidth reduction: a systematic review. In: Koziel S, Leifsson L, Lees M, Krzhizhanovskaya VV, Dongarra J, Sloot PM (eds) International conference on computational science, ICCS 2015, vol 51, pp 211–220. Elsevier, Reykjavik, Iceland

  • Cuthill EH (1972) Several strategies for reducing the bandwidth of matrices. In: Proceedings of a symposium on sparse matrices and their applications, pp 157–166. The IBM Research Symposia Series, New York, USA

  • Das R, Mavriplis DJ, Saltz JH, Gupta SK, Ponnusamy R (1992) The design and implementation of a parallel unstructured euler solver using software primitives. Tech. Rep. AD-A249 437, Institute for Computer Applications in Science and Engineering—NASA, Virgina, USA

  • Davis TA, Hu Y (2011) The university of florida sparse matrix collection. ACM Trans Math Softw 38(1):1–25

    MathSciNet  MATH  Google Scholar 

  • Duff IS, Grimes RG, Lewis JG (1989) Sparse matrix test problems. ACM Trans Math Softw 15(1):1–14

    Article  MathSciNet  MATH  Google Scholar 

  • Duff IS, Meurant GA (1989) The effect of ordering on preconditioned conjugate gradients. BIT Numer Math 29(4):635–657

    Article  MathSciNet  MATH  Google Scholar 

  • Esposito A, Catalano MSF, Malucelli F, Tarricone L (1998) A new matrix bandwidth reduction algorithm. Oper Res Lett 23:99–107

    Article  MathSciNet  MATH  Google Scholar 

  • Everstine GC (1979) A comparison of three resequencing algorithms for the reduction of matrix profile and wavefront. Int J Numer Methods Eng 14:837–853

    Article  MATH  Google Scholar 

  • Felippa CA (1975) Solution of linear equations with skyline-stored symmetric matrix. Comput Struct 5(1):13–29

    Article  MATH  Google Scholar 

  • Fenves SJ, Law KH (1983) A two-step approach to finite element ordering. Int J Numer Methods Eng 19(6):891–911

    Article  MATH  Google Scholar 

  • George, A.: Computer implementation of the finite element method. Ph.D. thesis, Stanford University, Stanford, USA (1971)

  • George A, Liu JW (1981) Computer solution of large sparse positive definite systems. Prentice-Hall, Englewood Cliffs

    MATH  Google Scholar 

  • George A, Liu JWH (1979) An implementation of a pseudoperipheral node finder. ACM Trans Math Softw 5(3):284–295

    Article  MATH  Google Scholar 

  • Gibbs NE, Poole WG, Stockmeyer PK (1976) An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM J Numer Anal 13(2):236–250

    Article  MathSciNet  MATH  Google Scholar 

  • Gibbs NE, Poole WG, Stockmeyer PK (1976) Comparison of several bandwidth and profile reduction algorithms. ACM Trans Math Softw 4(1):322–330

    Article  MATH  Google Scholar 

  • Gonzaga de Oliveira, S.L., Chagas, G.O.: A systematic review of heuristics for symmetric-matrix bandwidth reduction: methods not based on metaheuristics. In: Proceedings of the Brazilian symposium on operations research (SBPO 2015). Sobrapo, Pernambuco, Brazil (2015)

  • Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49(36):409–436

    Article  MathSciNet  MATH  Google Scholar 

  • Hu Y, Scott JA (2001) A multilevel algorithm for wavefront reduction. SIAM J Sci Comput 23(4):1352–1375

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson D (2002) A Theoretician’s guide to the experimental analysis of algorithms. In: Goldwasser M, Johnson DS, McGeoch CC (eds) Proceedings of the 5th and 6th DIMACS implementation challenges. American Mathematical Society, Providence

  • Kaveh A, Sharafi P (2012) Ordering for bandwidth and profile minimization problems via charged system search algorithm. Iran J Sci Technol Trans Civil Eng 36(2):39–52

    Google Scholar 

  • King IP (1970) An automatic reordering scheme for simultaneous equations derived from network systems. Int J Numer Methods Eng 2(4):523–533

    Article  Google Scholar 

  • Koohestani, B., Poli, R.: A hyper-heuristic approach to evolving algorithms for bandwidth reduction based on genetic programming. In: Research and development in intelligent systems XXVIII, pp 93–106. Springer London, London, UK (2011)

  • Kumfert G, Pothen A (1997) Two improved algorithms for envelope and wavefront reduction. BIT Numer Math 37(3):559–590

    Article  MathSciNet  MATH  Google Scholar 

  • Lanczos C (1952) Solutions of systems of linear equations by minimized iterations. J Res Natl Bur Stand 49(1):33–53

    Article  MathSciNet  Google Scholar 

  • Lewis JG (1982) Implementations of the Gibbs–Poole–Stockmeyer algorithms and Gibbs–King algorithms. ACM Trans Math Softw 8:180–189

    Article  MATH  Google Scholar 

  • Lim A, Rodrigues B, Xiao F (2003) A new node centroid algorithm for bandwidth minimization. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 1544–1545

  • Lim A, Rodrigues B, Xiao F (2004) A centroid-based approach to solve the bandwidth minimization problem. In: Proceedings of the Hawaii international conference on system sciences. Institute of electrical and electronics engineers Inc., Big Island, HI, United States, vol 37, pp 1203–1208

  • Lim A, Rodrigues B, Xiao F (2007) A fast algorithm for bandwidth minimization. Int J Artif Intell Tools 3:537–544

    Article  Google Scholar 

  • Lin YX, Yuan JJ (1994) Profile minimization problem for matrices and graphs. Acta Math Appl Sin 10(1):107–122

    Article  MathSciNet  MATH  Google Scholar 

  • Martí R, Laguna M, Glover F, Campos V (2001) Reducing the bandwidth of a sparse matrix with tabu search. Eur J Oper Res 135(2):450–459

    Article  MathSciNet  MATH  Google Scholar 

  • Medeiros SRP, Pimenta PM, Goldenberg P (1993) Algorithm for profile and wavefront reduction of sparse matrices with a symmetric structure. Eng Comput 10(3):257–266

    Article  Google Scholar 

  • Mladenovic N, Urosevic D, Pérez-Brito D, García-González CG (2010) Variable neighbourhood search for bandwidth reduction. Eur J Oper Res 200:14–27

    Article  MathSciNet  MATH  Google Scholar 

  • Papadimitriou CH (1976) The NP-completeness of bandwidth minimization problem. Comput J 16:177–192

    MathSciNet  MATH  Google Scholar 

  • Piñana E, Plana I, Campos V, Martí R (2004) GRASP and path relinking for the matrix bandwidth minimization. Eur J Oper Res 153(1):200–210

    Article  MathSciNet  MATH  Google Scholar 

  • Reid JK, Scott JA (1999) Ordering symmetric sparse matrices for small profile and wavefront. Int J Numer Methods Eng 45(12):1737–1755

    Article  MATH  Google Scholar 

  • Reid JK, Scott JA (2006) Reducing the total bandwidth of a sparse unsymmetric matrix. SIAM J Matrix Anal Appl 28(3):805–821

    Article  MathSciNet  MATH  Google Scholar 

  • Rodriguez-Tello E, Kao HJ, Torres-Jimenez J (2008) An improved simulated annealing algorithm for bandwidth minimization. Eur J Oper Res 185:1319–1335

    Article  MATH  Google Scholar 

  • Saad Y, Schultz MH (1986) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J Sci Stat Comput 7:856–869

    Article  MathSciNet  MATH  Google Scholar 

  • Sloan SW (1989) A Fortran program for profile and wavefront reduction. Int J Numer Methods Eng 28(11):2651–2679

    Article  MATH  Google Scholar 

  • Snay RA (1976) Reducing the profile of sparse symmetric matrices. Bull Geod 50(4):341–352

    Article  MathSciNet  Google Scholar 

  • STFC: The Science and Technology Facilities Council. HSL. A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.uk. Accessed Dec 2015

  • The MathWorks, I.: MATLAB. http://www.mathworks.com/products/matlab/index.html (1994–2016). Accessed Mar 2016

  • Torres-Jimenez J, Izquierdo-Marquez I, Garcia-Robledo A, Gonzalez-Gomez A, Bernal J, Kacker RN (2015) A dual representation simulated annealing algorithm for the bandwidth minimization problem on graphs. Inf Sci 33:33–49

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, C., Xu, C., Lisser, A.: Bandwidth minimization problem. In: 10th International conference on modeling, optimization and simulation (Mosim14). Nancy, France (2014)

  • Wang Q, Guo YC, Shi XW (2009) A generalized GPS algorithm for reducing the bandwidth and profile of a sparse matrix. Prog Electromagn Res 90:121–136

    Article  Google Scholar 

Download references

Acknowledgements

This work was undertaken with the support of the FAPEMIG—Fundação de Amparo à Pesquisa do Estado de Minas Gerais (Minas Gerais Research Support Foundation, Brazil). We would like to thank Prof. Dr. Dragan Urosevic, from the Mathematical Institute SANU, for sending us the VNS-band executable programs. We would like also to thank Prof. Dr. Fei Xiao, from Beepi, for providing us the with the C programming language source code of the FNCHC heuristic. In addition, we would like to thank the reviewers for their valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanderson L. Gonzaga de Oliveira.

Additional information

Communicated by Jose Alberto Cuminato.

Appendices

Appendix A: Implementation of the heuristics, testing, and calibration

This appendix shows the testing and calibration performed to compare our implementations with the codes used by the original proposers of the 14 heuristics to ensure the codes we implemented were comparable to the algorithms that were originally proposed. Regarding the VNS-band heuristic, 32-bit and 64-bit executable programs of this heuristic (which were kindly provided by one of the heuristic’s authors) were used. Mladenovic et al. (2010) developed a heuristic based on variable neighborhood search meta-heuristic to solve the bandwidth minimization problem. This meta-heuristic investigates increasingly distant neighbors of a solution. An initial solution is generated through applying a random breadth-first search procedure. Then, a main loop carries out three steps: shaking, local search, and neighborhood change. First, the neighborhood index k is initialized with \(k_{{\mathrm{min}}}\), which is the size of the neighborhood. Second, in the shaking step, the meta-heuristic produces a solution S within the current neighborhood. Using the solution S, a local-search method produces a local-optimum solution \(O_l\). Third, \(O_l\) replaces the current solution if the bandwidth of the solution \(O_l\) is narrower than the bandwidth of the current solution, and k is set to \(k_{{\mathrm{min}}}\); otherwise, neighborhood is expanded incrementing k while \(k < k_{\mathrm{max}}\). The VNS-band heuristic terminates when the execution time exceeds a timeout as previously established.

The FNCHC-heuristic source code was also kindly provided by one of the heuristic’s authors. With this, the source code was converted in this present work to the C++ programming language. The FNCHC heuristic (Lim et al. 2003, 2004, 2007) is a variation of the NCHC heuristic (Lim et al. 2003, 2004, 2007). The FNCHC heuristic is based on a direct node adjustment method with hill climbing for solving the bandwidth minimization problem.

We asked all the other heuristics’ authors for the sources and/or executables of their algorithms. However, some authors responded that they no longer had the code, some authors did not answer, and some authors explained that the programs could not be provided. Then, the remaining 12 heuristics were also implemented in the C++ programming language so that the computational costs of the heuristics could be compared appropriately. Specifically, the g++ version 4.8.2 compiler was used.

It should be noted that these heuristics are simple to implement:

  • the RCM–GL (George and Liu 1981) (based on breadth-first search),

  • Snay’s heuristic (Snay 1976) (based on RCM method),

  • Sloan’s algorithm (Sloan 1989),

  • the Medeiros–Pimenta–Goldenberg (MPG) heuristic (Medeiros et al. 1993) (based on Sloan’s algorithm),

  • the Normalized Sloan (NSloan) heuristic (Kumfert and Pothen 1997) (based on Sloan’s algorithm),

  • Sloan’s algorithm with pseudo-peripheral vertex given by the modified GPS algorithm (Sloan–MGPS) (Reid and Scott 1999) (based on Sloan’s algorithm),

  • the heuristic based on genetic programming hyper-heuristic (hGPHH) (Koohestani and Poli 2011) (based on RCM method).

Details about the heuristics implemented are given below. Sections A.1 and A.2 show the testing and calibration performed with regard to the RCM–GL and hGPHH-GL methods, and the GPS algorithm, respectively.

Sloan’s (1989), the NSloan (Kumfert and Pothen 1997), Sloan–MGPS (Reid and Scott 1999), and Charged System Search for bandwidth reduction (CSS-band) (Kaveh and Sharafi 2012) heuristics have important parameters that may change the results. Sections A.3 and A.4 show the testing and calibration performed with regard to Sloan’s, Nsloan, and Sloan–MGPS heuristics, and the MPG heuristic, respectively. Section A.5 shows the testing and calibration performed with regard to the CSS-band heuristic. On the other hand, the other heuristics do not have parameters that affect the results.

To our knowledge, results of Snay’s (1976), Burgess and Lai (1986), wonder bandwidth reduction algorithm (WBRA) (Esposito et al. 1998), Generic GPS (GGPS) (Wang et al. 2009), and CSS-band (Kaveh and Sharafi 2012) heuristics have only been published in their original papers and, unfortunately, we did not find the instances where these five heuristics were applied. On the other hand, since an efficient implementation comes at a cost in programming effort, we strictly observed the descriptions of the algorithms provided by their authors to obtain reasonably efficient implementations of (all) these heuristics.

It should be noted that it was not our intention that the results of the C++ programming language versions of the heuristics outperform results of the original implementations. Our intention was to implement reasonably efficient implementations of the heuristics tested to make it possible an appropriate comparison of the results of the 14 heuristics.

1.1 A.1: The RCM–GL and hGPHH-GL methods

In short, the RCM is a breadth-first search-based procedure and the vertices on each level are labeled in ascending order of degree number. Then, the final labeling is reverted. Similarly, the hGPHH is a breadth-first search-based procedure and the vertices on each level are labeled with a specific formula provided by the genetic programming hyper-heuristic (GPHH) (Koohestani and Poli 2011).

These two heuristics are highly dependent on the starting vertex and since Koohestani and Poli (2011) provided no description of the pseudo-peripheral vertex finder used, we employed the George–Liu algorithm (George and Liu 1979) to find a pseudo-peripheral vertex; hence, we named this method as hGPHH-GL, i.e., it is an RCM-based heuristic developed through a genetic programming hyper-heuristic. Thus, in both algorithms, the starting vertex is given here by the George–Liu algorithm.

Table 5 exhibits results of our implementation of the RCM–GL and hGPHH-GL methods applied to 11 instances of the Harwell-Boeing sparse-matrix collection. These results are compared to the results of the GHH (Koohestani and Poli 2011) and the RCM method contained in the MATLAB library (RCMM) (MATLAB 2016), which is a SPARSPAK-based highly enhanced version of the RCM method described by George and Liu (1981). Specifically, Koohestani and Poli (2011) published these results. In general, one can consider that the results are similar, although differences can be explained by the use of the George–Liu algorithm (George and Liu 1979) to find pseudo-peripheral vertices.

Table 5 Results of the C++ programming language versions of the RCM–GL (George and Liu 1981) and hGPHH-GL methods and results of the RCMM (MATLAB 2016) and GHH (Koohestani and Poli 2011) methods in bandwidth reduction applied to 11 instances of the Harwell-Boeing sparse-matrix collection

1.2 A.2: The GPS algorithm

In short, the GPS algorithm (Gibbs et al. 1976) creates a level structure rooted at a pseudo-peripheral vertex. It also considers an end pseudo-peripheral vertex when building this structure. The final labeling is given starting by the end node of the rooted level structure (RLS) and the vertices on each level are labeled in decreasing order of degree number. Let \(G = (V, E)\) be a connected and simple graph. Given a vertex \(v\in V\), the level structure rooted at v, with depth (or the eccentricity of v) \(\ell (v)\), is a partitioning \(\mathscr {L}(v)\) \(=\) \(\{L_0(v), L_1(v),\ \dots , L_{\ell (v)}(v) \}\), where \(L_0(v) = \{v\}\), \(L_i(v) = \mathrm{Adj}(L_{i-1}(v)) - \bigcup _{j=0}^{i-1}L_j(v)\), for \(\ i = 1, 2,\ 3,\ \dots , \ell (v)\) and \(\mathrm{Adj}(.)\) returns the adjacent vertices of the argument. The width \(b(\mathscr {L}(v))\) of \(\mathscr {L}(v)\) is defined as \(b(\mathscr {L}(u)) = \max \nolimits _{0 \le i \le \ell (u)}|L_i(u)|\).

Results of the GPS algorithm applied to the Harwell-Boeing sparse-matrix collection were provided by Martí et al. (2001), Piñana et al. (2004), Lim et al. (2007), and Rodriguez-Tello et al. (2008). These researchers compared the results of their heuristics with the results of the GPS algorithm applied to instances of this sparse-matrix collection. Their papers show the average bandwidth over the instances in each dataset. Nevertheless, a comparison of the average bandwidth (over the instances in each set) of the results obtained using the (C++ programming language version of the) GPS algorithm to the results shown in these publications is not adequate because the datasets used here contain instances with very different sizes, i.e., a heuristic that returns a very large bandwidth in a small instance would not be penalized appropriately.

Then, we closely followed the recommendations given by Lewis (1982) in a Fortran programming language code to implement the GPS algorithm (Gibbs et al. 1976) in the C++ programming language. One of the main results of Lewis (1982) was to present a low-cost implementation of the GPS algorithm (Gibbs et al. 1976). Lewis (1982) did not explicitly present results of the GPS algorithm with respect to bandwidth and profile reductions. However, this author showed results of the Gibbs–King heuristic with regard to bandwidth and profile reductions and stated that the results of the GPS algorithm were as good as the results of the Gibbs–King algorithm. On the other hand, Lewis (1982) presented the GPS algorithm computational cost and showed that the Gibbs–King heuristic is much slower than the GPS algorithm. This author presented these results because the bandwidth and profile reductions obtained using the GPS algorithm were the same results obtained by Everstine (1979). On the other hand, Everstine (1979) showed only results concerning wavefront and profile reductions. Thus, Table 6 shows results with respect to bandwidth reductions obtained using the Gibbs–King heuristic (the same results obtained using the GPS algorithm) (Lewis 1982), results with regard to profile reductions attained by the GPS algorithm (Everstine 1979; Lewis 1982), and results concerning bandwidth and profile reductions achieved by the C++ programming language version of the GPS algorithm applied to 13 instances of the Harwell-Boeing sparse-matrix collection.

Table 6 Results with respect to bandwidth (Lewis 1982) and profile (Everstine 1979; Lewis 1982) reductions obtained using the Fortran programming language version of GPS algorithm and results concerning bandwidth and profile reductions achieved by the C++ programming language version of the GPS algorithm applied to 13 instances of the Harwell-Boeing sparse-matrix collection

In general, the C++ programming language version of the GPS algorithm obtained better results than the Fortran programming language version of this algorithm (Lewis 1982) in relation to bandwidth reductions. On the other hand, although the C++ programming language version of the GPS algorithm obtained the largest number of best results (in eight instances) in this set of instances, the Fortran version of this algorithm (Lewis 1982) was the best one for reducing profile when considering the \(\rho _p\) metric. Nevertheless, the difference in the \(\rho _p\) metric is small and it can be explained by choices of the pseudo-peripheral vertices in these runs. Therefore, one can consider that the C++ programming language version used here is a reasonably efficient implementation of the GPS algorithm (Gibbs et al. 1976).

1.3 A.3: The Sloan, NSloan, and Sloan–MGPS heuristics

Similar to the GPS algorithm (Gibbs et al. 1976), the first step of Sloan’s algorithm (Sloan 1989) returns a starting (s) and an end (e) pseudo-peripheral vertices. Then, Sloan’s algorithm (Sloan 1989) creates a level structure rooted at s. The final labeling is given starting by the vertex s and the vertices are labeled using a max-priority queue according to the priority \(p(v)=w_1d(v,e)-w_2(\mathrm{deg}(v)+1)\), where \(w_1\) and \(w_2\) are integer weights, d(ve) is the distance of vertex v to the vertex e, and \(\mathrm{deg}(v)\) is the degree of vertex v. Sloan (1989) suggested to assign these two weights as \(w_1=1\) and \(w_2=2\). Setting \(w_1 \gg w_2 \ge 1\) places more emphasis on the global criterion of distance from the target end vertex and forces the vertices to be labeled level by level, and the vertex-labeling algorithm is the method of King (King 1970). When setting \(w_1=0\) and \(w_2=1\), the vertex-labeling algorithm is similar to Snay’s heuristic (Snay 1976).

One can realize that the RCM method (George 1971) obtains better bandwidth results than Sloan’s algorithm (Sloan 1989) (and variations) because, in the RCM method, if \((u,v)\in E\), then \(|s(u)-s(v)|\) is small, since the vertices are labeled level by level in relation to the level structure rooted at the starting vertex. In turn, Sloan’s algorithm (Sloan 1989) may provide a labeling with small \(|s(u)-s(v)|\), but large d(uv) when placing more emphasis on the current degree as the principal measure of priority (i.e., \(w_2>w_1\)), which is a local criterion. On the other hand, one can realize that Sloan’s algorithm (Sloan 1989) (and variations) obtains better profile results than the RCM method (George 1971), because the RCM method may produce \(\beta _i\) (\(1\le i\le |V|\)) similar to \(\beta (A)\) and, consequently, producing a labeling with large profile. By its turn, Sloan’s algorithm (Sloan 1989) (and variations) produces small \(\beta _i\) (\(1\le i\le |V|\)) by choosing to label firstly vertices with smaller degrees so that it produces a labeling with small profile [although in each iteration Sloan’s algorithm (Sloan 1989) (and variations) forces to label vertices level by level—similarly to the RCM method—by increasing \(w_2\) to the priority of the candidate vertices to be labeled].

We strictly observed the recommendations given by Sloan (Sloan 1989) and also the recommendations described in http://www.hsl.rl.ac.uk/archive/specs/mc40 (STFC 2015) in a Fortran programming language code to implement this heuristic in the C++ programming language. Likewise, we studied the Fortran programming language code of Sloan–MGPS available at http://www.hsl.rl.ac.uk/catalogue/mc60.html (STFC 2015) to implement this heuristic in the C++ programming language.

Sloan’s algorithm (Sloan 1989) for finding pseudo-peripheral vertices follows three main steps. First, Sloan’s algorithm (Sloan 1989) for finding pseudo-peripheral vertices selects a vertex of minimum degree v in the graph. Second, it scans \(L_{\ell (v)}(v)\) and then observes a set containing only one vertex of each degree. Finally, considering the eccentricity and width of RLSs, this algorithm returns a starting (s) and an end (e) pseudo-peripheral vertices. This algorithm is a modified version of the algorithm of Gibbs et al. (1976) (which considers only the eccentricity of vertices) and is also applied together with the MPG (Medeiros et al. 1993) (see Sect. A.4) and NSloan (Kumfert and Pothen 1997) heuristics.

Reid and Scott (1999) proposed two modifications in Sloan’s algorithm (Sloan 1989) for finding pseudo-peripheral vertices: it scans \(L_{\ell (v)}(v)\) and observes a set containing only five vertices which are not adjacent to each other (instead of observing a set containing only one vertex of each degree) and, before returning the pseudo-peripheral vertices, it changes s by e if \(\ell (e)>\ell (s)\). Therefore, the Sloan–MGPS heuristic (Reid and Scott 1999) is Sloan’s vertex-labeling algorithm with pseudo-peripheral vertices given by this modified GPS algorithm.

Sloan (1989) proposed to employ two weights to label the vertices of the instance: \(w_1\), associated with the distance d(ve) of the vertex v to the target end vertex e in \(\mathscr {L}(s)\), and \(w_2\), associated with the degree of each vertex. To provide specific detail, the priority \(p(v)=w_1d(v,e)-w_2(\mathrm{deg}(v)+1)\) employed in Sloan’s algorithm (Sloan 1989) presents different scales for both criteria. The value of \(\mathrm{deg}(v)+1\) ranges from 1 to \(m+1\) (where m is the maximum degree of the graph \(G=(V,E)\)), and d(ve) ranges from 0 (when \(v=e\), and \(d(v,e)>0\) for \(d\ne e\)) to the eccentricity \(\ell (e)\) (of the target end vertex e). Regarding Sloan’s (1989), NSloan (Kumfert and Pothen 1997), and Sloan–MGPS (Reid and Scott 1999) heuristics, we established the two weights as these parameters were described in the original papers. When more than one pair of values have been suggested in the original papers, we performed exploratory investigations to determine the pair of values that obtains the best profile results. Then, the two weights are assigned as \(w_1=1\) and \(w_2=2\) for the Sloan–MGPS heuristic (Reid and Scott 1999), and as \(w_1=2\) and \(w_2=1\) for the NSloan heuristic (Kumfert and Pothen 1997). To give more specific detail, although assigning these two weights as \(w_1=1\) and \(w_2=2\), Sloan’s (1989), MPG (Medeiros et al. 1993) (see Sect. A.4), and Sloan–MGPS (Reid and Scott 1999) heuristics place more emphasis on the distance d(ve) (related to \(\mathscr {L}(s)\)) when applied to sparse graphs because the scale of this criterion is larger than the scale of the other criterion. One can realize that, in sparse graphs, \(\ell (e)\) is larger than m so that Kumfert and Pothen (1997) normalized these two criteria to propose the Normalized Sloan (NSloan) heuristic.

Results of Sloan’s (Sloan 1989) and Sloan–MGPS (Reid and Scott 1999) heuristics applied to 17 symmetric instances of the University of Florida sparse-matrix collection were provided by Reid and Scott (1999). Table 7 exhibits results of the C++ programming language versions of Sloan’s algorithm and the Sloan–MGPS heuristic, and the results of these heuristics provided by Reid and Scott (1999) applied to 17 symmetric instances of the University of Florida sparse-matrix collection. In this dataset composed of 17 symmetric instances, one can verify that the C++ programming language versions of the two heuristics obtained better profile results when considering the \(\rho _p\) metric. Disregarding the skirt instance (which Reid and Scott (1999) show larger profiles of 2 magnitudes in relation to the C++ programming language versions), one finds \(\sum \rho _p=0.84\) for Sloan’s algorithm (Sloan 1989), against \(\sum \rho _p=0.58\) resulted by the C++ programming language version of this heuristic, and the \(\rho _p\) metric in both versions of the Sloan–MGPS is also similar when disregarding this instance.

Table 7 Results of implementations in the C++ programming language of Sloan’s (1989), NSloan (Kumfert and Pothen 1997), and Sloan–MGPS (Reid and Scott 1999) heuristics and results of these heuristics provided by Kumfert and Pothen (1997) (KP97) and Reid and Scott (1999) (RS99) in profile reductions applied to 17 symmetric instances of the University of Florida sparse-matrix collection

Kumfert and Pothen (1997) provided results of the NSloan heuristic applied to instances of the University of Florida sparse-matrix collection. Kumfert and Pothen (1997) normalized the results of the NSloan heuristic in relation to the RCM method (George 1971) with pseudo-peripheral vertex given by the method of Duff et al. (1989a), which is similar to Sloan’s algorithm for finding pseudo-peripheral vertices (Sloan 1989). Table 7 replicates these results. In addition, Table 7 shows results of the C++ programming language version of the NSloan heuristic normalized in relation to the results obtained using the C++ programming language version of the RCM–GL method (George and Liu 1981) applied to 17 symmetric instances of the University of Florida sparse-matrix collection. In this dataset composed of 17 symmetric instances, one finds \(\sum \rho _p=0.13\) for the original NSloan heuristic (Kumfert and Pothen 1997), against \(\sum \rho _p=1.64\) resulted by the C++ programming language version of this heuristic. One can observe that results of both implementations are similar and differences in the results can be explained by the use of different algorithms for finding pseudo-peripheral vertices.

1.4 A.4: The MPG heuristic

The MPG heuristic (Medeiros et al. 1993) employs two max-priority queues: t contains vertices that are candidate vertices to be labeled, and q contains vertices belonging to t and also vertices that can be inserted to t. Similar to Sloan’s algorithm (Sloan 1989) (and variations), the current degree of a vertex is the number of adjacencies to vertices that neither have been labeled nor belong to q. A main loop performs three steps. First, a vertex v is inserted into q to maximize a specific priority function. Second, the current degree \(\mathrm{currdeg}_v\) of each vertex \(v\in t\) is observed: the algorithm labels a vertex v if \(\mathrm{currdeg}_v = 0\), and the algorithm removes from t a vertex v (i.e., \(t\leftarrow t - \{v\}\)) if \(\mathrm{currdeg}_v > 1\). Third, if t is empty, the algorithm inserts into t each vertex \(u\in q\) with priority \(p_u \ge p_{\mathrm{max}}(q)-1\), where \(p_{\mathrm{max}}(q)\) returns the maximum priority among the vertices in q.

Table 8 exhibits results of the C++ programming language version of the MPG heuristic and the results of this heuristic provided by Medeiros et al. (1993) applied to 30 symmetric instances of the Harwell-Boeing sparse-matrix collection. In this dataset composed of 30 instances, one can verify that the C++ programming language version of the MPG heuristic obtained better profile results than the original version of this heuristic.

Table 8 Results of an implementation in the C++ programming language of the MPG heuristic (Medeiros et al. 1993) and results of this heuristic provided by Medeiros et al. (1993) in profile reductions applied to 30 instances of the Harwell-Boeing sparse-matrix collection

1.5 A.5: The CSS-band heuristic

Regarding the charged system search for bandwidth reduction (CSS-band) heuristic (Kaveh and Sharafi 2012), 500 iterations were used as a termination criterion, and 1 charged particle per 100 vertices was used in the tests performed. These values are the same used by Kaveh and Sharafi (2012). This heuristic was also tested with less iterations and less charged particles. This considerably decreased the computational cost of the heuristic; however, the bandwidth and profile reductions also decreased substantially. In addition, tests with more iterations and more charged particles were performed. Although it slightly improved the bandwidth and profile reductions, the computational cost increased considerably.

Appendix B: Results

The first and second parts of Table 9 contain the results of 14 heuristics applied to reduce the bandwidth and profile of 15 symmetric instances of the Harwell-Boeing sparse-matrix collection, respectively. Numbers in bold face are the best results. Table 10 shows the average execution times of the 14 heuristics applied to 15 symmetric instances of the Harwell-Boeing sparse-matrix collection. Similar to the other tables related to average execution times shown below, Table 10 also shows the number of best results, the rank order according to the \(\sum \rho _t\) metric, and the mode of the order of magnitude of these 14 heuristics. In particular, to achieve the maximal precision of the metrics, we set up several decimal cases to avoid draws.

Table 9 Results of 14 heuristics applied to reduce bandwidth and profile of 15 symmetric instances of the Harwell-Boeing sparse-matrix collection
Table 10 Average execution times, in seconds, of 14 heuristics applied to 15 symmetric instances of the Harwell-Boeing sparse-matrix collection
Table 11 Results of 14 heuristics applied to reduce bandwidth of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection
Table 12 Results of 14 heuristics applied to reduce profile of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection
Table 13 Average execution times, in seconds, of 14 heuristics applied to 35 symmetric instances of the Harwell-Boeing sparse-matrix collection

Tables 11 and 12 contain the results of 14 heuristics applied to reduce the bandwidth and profile of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection, respectively. In particular, Table 11 contains the results of the VNS-band heuristic set at 7 and 8 s. This was done to show that 8 s was the lowest timeout required for the VNS-band heuristic to obtain better results than the FNCHC heuristic, which was the second best in reducing bandwidth, according to the \(\rho _b\) metric. Table 12 also contains the results of the VNS-band heuristic set with 500 s. Even so, when considering profile reductions, the VNS-band heuristic was dominated by the MPG (Medeiros et al. 1993), Sloan–MGPS (Reid and Scott 1999), Sloan’s (1989), and NSloan (Kumfert and Pothen 1997) heuristics in this set of 35 symmetric instances. Table 13 shows the average execution times of the 14 heuristics applied to 35 symmetric instances of the Harwell-Boeing sparse-matrix collection.

Table 14 Results of 14 heuristics applied to reduce bandwidth and profile of 18 asymmetric instances of the Harwell-Boeing sparse-matrix collection
Table 15 Average execution times, in seconds, of 14 heuristics applied to 18 asymmetric instances of the Harwell-Boeing sparse-matrix collection
Table 16 Results of 14 heuristics applied to reduce bandwidth of 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection

Table 14 contains the results of 14 heuristics applied to reduce the bandwidth and profile of 18 asymmetric instances of the Harwell-Boeing sparse-matrix collection. In particular, in the GENT113 instance, it was only possible to apply the 64-bit executable program of the VNS-band heuristic with a limit of 6 s. Therefore, the bandwidth and profile values presented for this instance are obtained using the VNS-band (6 s) heuristic. Table 15 shows the average execution times of the 14 heuristics applied to 18 asymmetric instances of the Harwell-Boeing sparse-matrix collection.

Table 17 Results of 14 heuristics applied to reduce profile of 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection
Table 18 Average execution times, in seconds, of 14 heuristics applied to 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection
Table 19 Results of 10 heuristics applied to reduce bandwidth and profile of 17 symmetric instances of the University of Florida sparse-matrix collection

Tables 16 and 17 contain the results of 14 heuristics applied to reduce the bandwidth and profile of 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection, respectively. In particular, it was not possible to apply the 64-bit executable program of the VNS-band heuristic in the SHERMAN4 instance. In this instance, the values are shown after being obtained using the 32-bit executable program of the VNS-band heuristic, considering the same runtime limit. Table 18 shows the average execution times of the 14 heuristics applied to 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection.

Table 19 contains the results of 10 heuristics applied to reduce the bandwidth and profile of 17 symmetric instances of the University of Florida sparse-matrix collection. Table 20 shows the average execution times of the 10 heuristics applied to 17 symmetric instances of the University of Florida sparse-matrix collection.

Table 21 contains the results of 10 heuristics applied to reduce the bandwidth and profile of five asymmetric instances of the University of Florida sparse-matrix collection. Table 21 also shows the average execution times of the 10 heuristics applied to five asymmetric instances of the University of Florida sparse-matrix collection.

Table 20 Average execution times, in seconds, of 10 heuristics applied to 17 symmetric instances of the University of Florida sparse-matrix collection
Table 21 Results of 10 heuristics applied to reduce bandwidth and profile of five asymmetric instances of the University of Florida sparse-matrix collection

Appendix C: Application areas

Tables 22 and 23 show several instances of the University of Florida and Harwell-Boeing sparse-matrix collections divided by application area, respectively.

Table 22 Twenty two instances of the University of Florida sparse-matrix collection divided by application area
Table 23 One-hundred and thirteen instances of the Harwell-Boeing sparse-matrix collection divided by application area

Appendix D: The most promising low-cost heuristics for bandwidth and profile reductions of symmetric and asymmetric matrices

Table 24 summarizes the results of the heuristics that showed the best overall performance on the six sets of test matrices. Additionally, Table 24 shows the lowest-cost heuristics tested in these sets of instances. Furthermore, in spite of the small number of executions for each heuristic in each instance, Table 24 shows the largest standard deviation \(\sigma \) and coefficient of variation attained in relation to the execution times of the 14 heuristics for these six datasets tested.

Table 24 Better heuristics in four sets of instances of the Harwell-Boeing and in two sets of instances of the University of Florida sparse-matrix collections

It should be noted that when execution times are crucial for the application, a reasonable bandwidth or profile reduction by a very fast heuristic may be better than a very large bandwidth or profile reduction provided by a heuristic with high computational cost. Then, Table 24 shows the rank order of heuristics and their order of magnitude regarding their computational costs.

Unlike heuristics such as the RCM–GL, hGPHH-GL, GPS, and GGPS, in the heuristics designed for profile reduction tested here, vertices are not exclusively labeled level by level in relation to the level structure rooted at a starting vertex. This favors a profile reduction by selecting a vertex of small degree number to be labeled before other vertices. This is carried out in Snay’s (1976), Sloan’s (1989), MPG (Medeiros et al. 1993), NSloan (Kumfert and Pothen 1997), and Sloan–MGPS (Reid and Scott 1999) heuristics.

Sloan’s algorithm (Sloan 1989) showed lower computational cost than the Sloan–MGPS heuristic (Reid and Scott 1999) in all tests performed [e.g., Sloan’s algorithm (Sloan 1989) achieved its results with a lower computing cost of 1 magnitude in relation to the Sloan–MGPS heuristic (Reid and Scott 1999) when applied to the 17 symmetric instances of the University of Florida sparse-matrix collection (see Table 20)]. It should be noted that Sloan’s (1989) and MPG (Medeiros et al. 1993) heuristics were implemented with a linked list and the NSloan (Kumfert and Pothen 1997) and Sloan–MGPS (Reid and Scott 1999) heuristics were implemented with a binary heap (as proposed in the original algorithms). Specifically, a version of Sloan’s algorithm (Sloan 1989) implemented here with a binary heap was more expensive than the version implemented with a linked list. The number of vertices in the max-priority queue in each iteration may be small so that the operations within the linked list are less expensive than the operations within the max heap (which would show a different asymptotic behavior for a sufficiently large number of vertices). On the other hand, Sloan’s algorithm (Sloan 1989) was dominated by the Sloan–MGPS heuristic (Reid and Scott 1999) in relation to bandwidth and profile reductions in all tests performed: the MGPS method (Reid and Scott 1999) may have selected pseudo-peripheral vertices with larger eccentricity (or a smaller width of the corresponding RLS was generated) than the pseudo-peripheral vertices given by Sloan’s algorithm (Sloan 1989) (for this task).

Sections D.1 and D.2 describe the best heuristics for bandwidth reduction of symmetric and asymmetric matrices, respectively. Finally, Sect. D.3 addresses the best low-cost heuristic for profile reduction of symmetric and asymmetric matrices.

1.1 D.1: Best heuristic for bandwidth reduction of symmetric matrices

When setting low VNS-band timeouts and their results compared with low-cost heuristics, the VNS-band (8 s) heuristic provided better bandwidth results in the set composed of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection (see Fig. 3c). The FNCHC heuristic was the second best in reducing bandwidth in this dataset, according to the \(\rho _b\) metric (see Fig. 3a and Table 11). The VNS-band heuristic achieved these results with a higher computing cost of two magnitudes in relation to the FNCHC heuristic (see Table 13).

Additionally, in the tests presented in this paper, the FNCHC heuristic obtained the best results in 12 instances in the set composed of 15 symmetric instances (see Fig. 1c). Moreover, the FNCHC heuristic obtained the best results in the 17 symmetric instances of the University of Florida sparse-matrix collection (see Figs. 9a, c in Sect. 4.2.1 and the first part of Table 19). Therefore, the FNCHC (Lim et al. 2003, 2004, 2007) heuristic was considered the algorithm that achieved the most promising (overall) results for reducing bandwidth of symmetric matrices.

1.2 D.2: Best heuristic for bandwidth reduction of asymmetric matrices

Although the FNCHC and VNS-band heuristics obtained the best results in relation to bandwidth reduction in the sets composed of 18 and 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection (see Figs. 5a, 7a, c, the first part of Tables 14, and 16), respectively, when applied to the dataset composed of five asymmetric instances of the University of Florida sparse-matrix collection, the GPS algorithm obtained the best results according to the \(\rho _{\beta }\) metric (see Fig. 11a, the first part of Table 21, and Sect. 4.2.2). Moreover, in general, the GPS algorithm achieved the third (see Sects. 4.1.1 and 4.1.2) and forth (see Sects. 4.1.3 and 4.1.4) best results according to the \(\rho _{\beta }\) metric when applied to the four datasets composed of symmetric and asymmetric instances of the Harwell-Boeing sparse-matrix collection. The GPS algorithm was outperformed by the FNCHC and VNS-band (with its timeout set at 8, 12, and 500 s) heuristics when applied to the 113 instances of the Harwell-Boeing sparse-matrix collection (see Figures 1a–c, 3a–c, 5a, and 7a–c, and Tables 11, 14, and 16). On the other hand, the GPS algorithm outperformed the VNS-band (500 s) when applied to the 22 instances of the University of Florida sparse-matrix collection (see Figs. 9a and 11a; Tables 19, 21).

It should be noted that the FNCHC and VNS-band heuristics generate several solutions and employ local-search procedures to provide local-optimum solutions. The GPS algorithm generates only one solution. Moreover, a local-search procedure is not employed within the GPS algorithm.

The FNCHC and VNS-band heuristics achieved their results with a higher computing cost of 1 magnitude in relation to the GPS heuristic in the set composed of five asymmetric instances of the University of Florida sparse-matrix collection (see Table 21). Thus, the GPS algorithm was considered the most promising low-cost heuristic for bandwidth reduction of asymmetric matrices.

1.3 D.3: Best low-cost heuristic for profile reduction of symmetric and asymmetric matrices

The MPG (Medeiros et al. 1993) and NSloan (Kumfert and Pothen 1997) heuristics obtained the best profile results in 11 and 9 instances in the set composed of 35 symmetric instances of the Harwell-Boeing sparse-matrix collection, respectively (see Fig. 3d; Table 12). Moreover, the NSloan (Kumfert and Pothen 1997) obtained the best profile results in the set composed of 15 symmetric instances of the Harwell-Boeing sparse-matrix collection (see Fig. 1b, d and the second part of Table 9) and in the set composed of 17 symmetric instances of the University of Florida sparse-matrix collection (see Fig. 9b, d and Table 19). Therefore, the NSloan (Kumfert and Pothen 1997) heuristic was considered the best low-cost heuristic for profile reduction of symmetric matrices.

In relation to asymmetric instances, the MPG (Medeiros et al. 1993) heuristic obtained the best results in the sets composed of 18 and 45 asymmetric instances of the Harwell-Boeing sparse-matrix collection (see Figs 5b, 7b, and 7d; Tables 14, 17) and in the set composed of five asymmetric instances of the University of Florida sparse-matrix collection (see Fig. 11b and the second part of Table 21). Therefore, the MPG heuristic (Medeiros et al. 1993) was considered the best heuristic for profile reduction of asymmetric matrices.

Usually, in sparse graphs, \(\ell (e)\) is larger than m. Thus, Sloan’s (1989) and Sloan–MGPS (Reid and Scott 1999) heuristics place more emphasis on the distance of a vertex v to the target end vertex (a global criterion) than the degree of v (a local criterion). Thus, these two heuristics tend to label vertices level by level in a level structure rooted at a starting vertex. By normalizing the priority of these two criteria, the NSloan heuristic (Kumfert and Pothen 1997) places similar emphasis on these two criteria. On the other hand, a vertex-labeling algorithm that labels vertices in an ascending degree number produces poor bandwidth reduction at a high computational cost because it considers many candidate vertices for each iteration (we tested it here within exploratory investigations). Thus, the MPG heuristic (Medeiros et al. 1993) controls the number of candidate vertices and (although assigning the two weights as \(w_1=1\) and \(w_2=2\)) offers a trade-off between the two criteria mentioned. These characteristics of the NSloan (Kumfert and Pothen 1997) and MPG (Medeiros et al. 1993) heuristics can explain their best results in profile reduction of symmetric and asymmetric instances when compared to the other heuristics for this task, respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gonzaga de Oliveira, S.L., Bernardes, J.A.B. & Chagas, G.O. An evaluation of low-cost heuristics for matrix bandwidth and profile reductions. Comp. Appl. Math. 37, 1412–1471 (2018). https://doi.org/10.1007/s40314-016-0394-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40314-016-0394-9

Keywords

Mathematics Subject Classification

Navigation