HPC acceleration of large (min, +) matrix products to compute domination-type parameters in graphs

The computation of the domination-type parameters is a challenging problem in Cartesian product graphs. We present an algorithmic method to compute the 2-domination number of the Cartesian product of a path with small order and any cycle, involving the (min,+)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\min ,+)$$\end{document} matrix product. We establish some theoretical results that provide the algorithms necessary to compute that parameter, and the main challenge to run such algorithms comes from the large size of the matrices used, which makes it necessary to improve the techniques to handle these objects. We analyze the performance of the algorithms on modern multicore CPUs and on GPUs and we show the advantages over the sequential implementation. The use of these platforms allows us to compute the 2-domination number of cylinders such that their paths have at most 12 vertices.


Introduction
The (min, +) matrix algebra [1], also called tropical algebra, replaces addition and multiplication with minimization and addition, respectively. The use of this algebra is currently in expansion and it is involved in several disciplines of great interest, for instance finite automata [1], statistics [2], phylogenetics [3], optimization of graph parameters [4], integer programming [5], and other optimization problems [6]. However, the computational demands of such computations are unapproachable when the dimensions of the corresponding matrices are large. To overcome this drawback the modern multicore CPUs and GPUs can be exploited as High-Performance Computing (HPC) platforms to accelerate and widen the dimensions of such operations. In this work, the analysis of the domination-type parameters in graphs is chosen as an interesting example where sequences of large (min, +) matrix products are involved.
The use of graphs as a tool to model problems in networks has been widely studied. Among such problems, the efficient location of resources in a network can be approached by means of the domination-type parameters in graphs. A dominating set in a graph G is a vertex subset S such that each vertex not in S has at least one neighbor in it. The domination number of G, denoted by (G) , is the cardinal of a minimum dominating set. We refer to [7] for general information about these topics and, in particular, about their applications to network problems. Among the variations of this concept that can be found in the literature, we focus on the 2-domination. A 2-dominating set is vertex subset S ⊆ V(G) such that each vertex not in S has at least two neighbors in it. The 2-domination number 2 (G) is the minimum cardinal of a 2-dominating set of G [8]. Some interesting applications of the 2-domination in graphs such as the optimization of fault tolerant sensor networks, the facility location problem and the data collection problem can be found in [9]. Given a graph G and a positive integer k ≤ |V(G)| , the decision problem "Is there a dominating set of G with at most k vertices?" is NP-complete [10], even in bipartite and chordal graphs. However, it has been shown to be polynomial in trees and interval graphs [7]. In a similar way, the 2-domination decision problem is to decide whether G has a 2-dominating set of cardinal at most k ≤ |V(G)| . It is known that it is an NP-complete problem [11], again even in bipartite and chordal graphs [12]. Moreover, linear-time algorithms to compute this parameter in trees and series-parallel graphs can also be found in [11].
A family of interest for the domination-type parameters are the Cartesian product graphs since the Vizing's conjecture was formulated [13]. This conjecture proposes a general inequality that relates the domination number of both a Cartesian product graph and its factors. This conjecture is still open and a survey about this subject can be found in [14], while a recent new approach is in [15]. Recall that the Cartesian product of two graphs G□H is the graph with vertex set V(G) × V(H) and such that two vertices (g 1 , h 1 ), (g 2 , h 2 ) are adjacent in G□H if either g 1 = g 2 and h 1 , h 2 are adjacent in H, or g 1 , g 2 are adjacent in G and h 1 = h 2 . We refer to [16] as a general reference about this topic. It is well known that domination-type parameters are difficult to handle in Cartesian product graphs and there is no general relationship between the value of such parameters in the product graph and its factor graphs. Even in the simplest cases of the Cartesian product of two graphs, that is, two paths 1 3 (grid), a path and a cycle (cylinder) and two cycles (torus) specific procedures are needed to compute such parameters.
The domination-type parameters in Cartesian product graphs are among the variety of graph parameters that can be computed by using matrix operations. This approach appeared for the first time in [4] and has been used in different Cartesian products, such as grids and cylinders, and also in different parameters, such as domination, independent domination and Roman domination (see for instance [17][18][19][20][21]). Unlike other parameters, those of domination-type do not use the usual matrix product but the (min, +) matrix product, which is also called the tropical product [1]. The (min, +) matrix product is defined over the semi-ring of tropical numbers P = (ℝ ∪ {∞}, min, +, ∞, 0) in the following way: Graph algorithms involving tropical algebra operations can be found in the literature [22]. The computational side of this approach leads to interesting challenges bearing in mind the large size of the matrices involved in such algorithms and both, special properties of the matrices and regular structures of the graphs, have been taken into account in order to reduce the complexity of the matrix computations [23,24]. Moreover, optimal implementations of the matrix operations in multicore and GPU platforms have proven to be suitable for these problems [25][26][27].
A contribution to the problem of the computation of the 2-domination number in cylinders can be found in [28], where this parameter was obtained in cylinders with a small cycle and any path, by using algorithms involving the (min, +) matrix-vector product. We now focus the complementary problem of computing this parameter in cylinders with a small path and any cycle, which is unknown. The technique we use here requires performing the (min, +) matrix-matrix product, which has higher computational requirements.
The goal of this work is twofold. From the computational point of view, efficient routines to compute (min, +) matrix products on multicore CPUs and GPUs are developed. Moreover, the matrices involved in the analysis of domination-type parameters in graphs are used to evaluate such implementations on modern HPC platforms. It is relevant to underline that, beyond this particular graph analysis, these efficient implementations are useful to accelerate the wide range of applications which are expressed in terms of (min, +) matrix products. To allow the scientific community to access to these efficient implementations of (min, +) matrix products, they are available at https://github.com/hpcjmart/2domination.
From the perspective of the graph analysis, our objective is to conjecture a formula for the 2-domination number in cylinders with path and cycle of unbounded order. Obtaining the value of the 2-domination number in cylinders with one small factor, either the path or the cycle, is the first step to addressing the general case. The reason is the regular behavior that is expected, except for the smallest cases. Making such regularity apparent provides the key information to look for the general formula.
In Sect. 2, we present the theoretical results that give support to the algorithms shown in Sect. 3 along with their computational analysis. Such algorithms will provide the desired values of the 2-domination number in cylinders with small path and any cycle, which we present in Sect. 4, as well as our conclusions from the computational point of view.

3
HPC acceleration of large (min, +) matrix products to compute…

The 2-domination number in cylindrical graphs with small paths
In this section, we describe our approach to compute the 2-domination number of cylinders P m □C n with small paths. Such approach, involving the (min, +) matrixmatrix product has also been used to obtain similar results for the Roman domination number [20]. We first describe the general ideas involved in this method and then, we particularize the case of 2 .

General construction
We focus on the following result from [29], that we quote from [4] in the version related to the (min, +) matrix product.
Let D be a digraph with vertex set V(D) = {v 1 , v 2 , … , v s } together with a labeling function which assigns an element of the semi-ring P = (ℝ ∪ {∞}, min, +, ∞, 0) to every arc of the digraph D . A path of length n in D is a sequence of n consecutive The labeling can be easily extended to paths: The application of these results to the computation of domination-type parameters in Cartesian product graphs follows a common approach which uses the fact that these kinds of parameters are defined as the minimum cardinal of a set having a certain property. We now describe this general procedure.
Let G be a graph and let a(G) be a parameter defined as the minimum cardinal of a vertex subset of G having a certain property A. First of all, we have to define a direct graph D such that there exists a bijective correspondence between the vertex subsets U ⊆ V(G) having the property A and the closed paths Q of D with fixed length n, that we denote by U ↔ Q . As a second step, we have to define a labeling of the arcs of D such that if U ↔ Q then, |U| = (Q) . With such digraph and its associated labeling we can now use Theorem 1 to obtain A restriction that occurs when using this approach to compute a parameter a(G) is that graph G needs some structure that allows us to identify the vertex subsets U ⊆ V(G) having the property A and the closed paths Q of D with fixed length n. The Cartesian products of paths and cycles have such structure, as we now briefly sketch. The be a vertex subset having the property A and let us consider U j the j − th column of P m □C n , taking into account whether or not its vertices belong to U (by using a labeling of the vertices). The vertices of the digraph D are all possible U j obtained in such way, for every vertex subset having property A. Moreover, there is an arc from U r to U r+1 , that is, there is an arc from a vertex of D to another one if they are consecutive columns in P m □C n for the same vertex subset U having property A. Then, U can be identified with the closed path The key point of the construction above is the column structure of the cylinder P m □C n and additional requirements are needed in such construction depending on the studied parameter a(G). In this paper we focus on 2-domination number 2 of the cylinder P m □C n and a suitable digraph D will be defined. The (min, +) powers of the matrix A(D) have to be computed and this matrix is expected to be quite large, to such an extent as digraph D is much larger than the cylinder P m □C n . Indeed, the matrix size exponentially grows with the order of the cylinder and for this reason, this approach is useful just in cylinders P m □C n with small enough values of both m and n. An additional procedure involving well-known properties of the (min, +) matrix product allows the removal of one of such size restrictions.

Specific construction for the 2-domination number
Let P m □C n be a cylinder and let S ⊆ V(P m □C n ) a 2-dominating set. We label the vertices in the cylinder according to the following rules: and v has at least 2 neighbors in S in its column or the previous one, and v has just 1 neighbor in S in its column or the previous one.
We now identify each column with a word p = (p 1 , p 2 , … , p m ) with length m in the alphabet {0, 1, 2} and containing neither the sequences 020, 111, 211, 112, 212 in any position, nor the sequences 11, 12 at the beginning (that is, for the letters p 1 p 2 ) nor the sequences 11, 21 at the end (that is, for the letters p m−1 p m ). These restrictions come from the fact that S is a 2-dominating set and from the definition of the labeling. We call correct m-words to words of length m in the alphabet {0, 1, 2} fulfilling all the conditions above. We define the vertex set of the digraph D m as the set of all correct m-words.
We now focus on the definition of the arcs in the digraph D m . Given two correct m-words p = (p 1 , p 2 , … , p m ) and q = (q 1 , q 2 , … , q m ) , we say that p can follow a q if they can be consecutive columns (in the order qp) in some 2-dominating set, that is, they follow the rules of the labeling: , q i is equal to 0 and if i = m then exactly one among p i−1 , q i is equal to 0), • if p i = 1 then at least two among p i−1 , p i+1 , q i is equal to 0 (the same comment as above for cases i = 1 and i = m).
Finally, there is an arc from a word q to a word p if and only if p can follow q. This concludes the construction of the digraph D m , and it is clear that every 2-dominating set S of P m □C n is univocally identified with a closed path Q of length n, that is, We now need to define a labeling of the arcs of D m fulfilling that if S ↔ Q then, |S| = (Q) . To this end, for an arc (q, p) we define its label as (q, p) =number of zeros of p, which obviously gives the desired property. We illustrate the definitions above with an example. Fig. 1 a 2-dominating set of P 4 □C 5 is shown (black vertices). Moreover, the list of correct words representing the columns of such 2-dominating sets are in Fig. 1.

Example 1 In
Clearly p i+1 can follow p i for i ∈ {1, 2, 3, 4} and p 1 can follow p 5 so Q = (p 1 , p 2 ), (p 2 , p 3 ), (p 3 , p 4 ), (p 4 , p 5 ), (p 5 , p 1 ) is a closed path in the digraph D 4 . The label of each arc of Q is the number of zeros in the second word, that is, (p 1 , p 2 ) = 2, (p 2 , p 3 ) = 2, (p 3 , p 4 ) = 2, (p 4 , p 5 ) = 1, (p 5 , p 1 ) = 3 . Hence (Q) = 2 + 2 + 2 + 1 + 3 = 10 , that reflects that the 2-dominating set has 10 vertices.   Roughly speaking, Theorem 1 says that the entry (i, j) of the matrix A(D m ) n gives the minimum label among all paths in D m with length n, beginning in p i and ending in p j . Therefore, the entry (i, i) on the main diagonal shows the minimum label among all closed n-paths that begin and end in p i . Each closed path represents a 2-dominating set of P m □C n and its label is the cardinal of such set (see Fig. 1). Hence, Theorem 2 says that the minimum entry of the main diagonal gives the minimum cardinal among all 2-dominating sets, that is, the 2-dominating number.
Using Theorem 2 to compute the 2-domination number of P m □C n is subject to certain restrictions for both m and n. On the one hand, the path order m determines the number of correct m-words and therefore, the size of the matrix A(D m ) that is expected to be of the order of 3 m . On the other hand, the cycle order n is the number of (min, +) matrix powers that have to be computed to obtain the value of the 2-domination number. The first limitation is intrinsic to this approach. However, there are some properties of the (min, +) matrix product that can avoid the second one.
Assuming that m is small enough to apply Theorem 2 and that n 0 , a, b have been obtained for m then, the boundary values of the finite difference equation above, that is, 2 (P m □C n ) for n 0 ≤ n ≤ n 0 + a − 1 can be computed by using Theorem 2 and the finite difference equation can be easily solved to obtain the formula for the 2-domination number 2 (P m □C n ) , for n ≥ n 0 . Moreover, the remaining values 2 (P m □C n ) for n < n 0 , if any, can also be computed by Theorem 2. Thus, if m is small enough to apply Theorem 2 and the conditions of Theorem 3 hold, then 2 (P m □C n ) can be obtained for any n ≥ 3.

Algorithms and computational analysis
In this section, we present the algorithms we have used to compute the 2-domination number of P m □C n , with 2 ≤ m ≤ 12 and n ≥ 3 . We also study the performance of such algorithms in sequential and parallel implementations on a CPU AMD EPYC Rome 7642 with 48 cores and, in addition, on a GPU NVIDIA Tesla V100-PCIE with 32 GB of memory, 80 multiprocessors with 128 cores in each multiprocessor (10240 cores CUDA). Algorithms from 1 to 4 come from Theorem 3 and they allow us to pose the finite difference equation involving the 2-domination number of P m □C n , with m small enough. Moreover, Theorem 2 provides Algorithm 5 to compute the boundary values of the finite difference equations. Our first target is to obtain the suitable values a m , b m , n m 0 to pose such equation for each m ∈ {2, … , 12} and first of all, we compute the matrix A(D m ) in Algorithm 1. In order to obtain the set C m of all correct m-words, we first obtain all the m-element variations of 3-elements 0, 1, 2, with repetition allowed. Then, we select those of them not containing the forbidden sequences of the correct m-words.
Algorithm 1 is only useful for small values of m. As we said before, the size of the matrix A(D m ) is expected to exponentially grow with m, as do the necessary computational resources to get and manage such matrix.
In Table 1 we show the matrix sizes and the memory requirements in cases 2 ≤ m ≤ 13 , by using 16 bits arithmetic types of integers. The memory size of the matrix in the case m = 13 makes it unfeasible to allocate it into the GPU memory, which is the processor we have used to accelerate our algorithms. This is the reason we have analyzed, in this paper, the cases 2 ≤ m ≤ 12 . We have run Algorithm 1 in the CPU and it takes 2 minutes in the larger case m = 12 . This running time is small compared with the following algorithms and moreover, the algorithm does not use any matrix operations whose analysis is our objective. Therefore, we have not parallelized this process and the matrix A(D m ) is an input data for the remaining algorithms.
We now need enough (min, +) powers of the matrix A(D m ) in order to look for the recurrence relationship. We obtain the desired powers with Algorithm 2. There exist sufficient but not necessary conditions ensuring that the hypotheses in Theorem 3 are true (see [30]). However, such conditions provide a non-minimum value for n 0 in the order of the square of the matrix size that is not practical. We have run Algorithm 2 with K = 50 , which has proven to be enough in cases 2 ≤ m ≤ 12.
Due to the high requirements to sequentially compute the powers, we have modified this routine in two ways to accelerate it on modern multicore CPU and GPUs. On the one hand, we have used the directives of OpenMP [31] to parallelize the (min, +) matrix multiplication on multicore CPUs. Specifically, we use the OpenMP directives to accelerate the computation of each product, so the outer loop that iterates through the rows of the first matrix of the product is parallelized. This technique is straightforward, and it allows to efficiently develop the (min, +) matrix product to leverage the resources of the CPU multicore processors. Moreover, the performance achieved is enough for the purpose of our work when the dimensions of the matrices are moderated.
On the other hand, the powers have also been carried out by a modification of the routine MatrixMul, available in the NVIDIA CUDA TOOLKIT 11 [32] and described in the CUDA C Programming Guide (see [33], Chapter 3), to adapt it to the (min, +) multiplication. In this case, we use a different parallelization strategy than the one used in OpenMP. It is based on a tiled matrix multiplication to optimize the GPU hierarchy memory management. So, this method takes advantage of the lower latency, the higher bandwidth shared memory within GPU thread blocks and the number of slow accesses to memory device, which are minimized. For details of the memory access pattern of MatrixMul see Chapter 3 of [33].
We show in Table 2 the running times of Algorithm 2 in cases 7 ≤ m ≤ 12 while in the remaining cases the algorithm needs less than 1 second, even with the sequential implementation. Table 2 shows that the running time of computing 50 (min, +) powers of matrix A(D m ) exponentially grows as the matrix size increases. In order to address large cases in reasonable time we have run an OpenMP parallel implementation with 48 cores/threads. Such implementation provides small running times in cases m = 8 and m = 9 but it grows fast for m ≥ 10 . In order to increase the efficiency of this algorithm, we have run a version of the (min, +) matrix product in CUDA for NVIDIA GPU and we have obtained a significant improvement in terms of running times compared to the sequential and the parallel OpenMP versions.
The following step to apply Theorem 3 is to find the appropriate recurrence relationship between two powers of matrix A(D m ) . Even though such matrix is sparse, we have noted that its powers become dense, that is, with no infinite entries, from the third one. Therefore, the hypothesis in Theorem 3, that is, A(D m ) n 0 +a = b ⊠ A(D m ) n 0 is equivalent to A(D m ) n 0 +a − A(D m ) n 0 being a constant matrix with entries equal to b m . We use this fact in Algorithm 3. The results are shown in Table 3 ( K = 50).
It is expected that the values of r m 0 are not minimum because we have found a recurrence relationship with r m 0 + a m = 50 , for every m. But in any case, we have confirmed that matrix A(D m ) meets the hypothesis of Theorem 3 and the finite difference equation can be posed for n ≥ r m 0 . We now show how to obtain the minimum value n m 0 such that A(D m ) n+a m = b m ⊠ A(D m ) n for every n ≥ n m 0 , in Algorithm 4 . Finding this optimal value could be interesting in order to try to reduce the number of (min, +) powers required to ensure the hypothesis of Theorem 3.  2  4  2  2  8  25  3  10   3  7  6  8  9  22  3  11   4  9  8  14  10  21  3  12   5  31  7  15  11  24  3  13   6  19  11  28  12  26  3  14   7  23 18 53 We show the values of n m 0 obtained with Algorithm 4 in Table 4, together with the values of a m , b m shown before. Such values provide the finite difference equation Finally, we compute the boundary values needed to solve the finite difference equations and to obtain the formulae of the 2-domination number in the studied cases, with Algorithm 5, by using Theorem 2.
Algorithm 5 uses the minimization operation over the main diagonal of the matrix A(D m ) i , which can be seen as a vector with a length of the number of rows of the matrix. This matrix operation is less computationally demanding given that the number of the operations needed here is on the order of the number of rows of the matrix while in Algorithms 3 and 4 the order is the square of that number. Indeed, the CPU needs less than 1 second if m ≤ 11 and 11.8 seconds in the largest case m = 12 . Our program to compute the 2-domination number of cylindrical graphs with small paths consists of consecutive run Algorithms from 2 to 5 and we have implemented it in four ways. The first one runs every algorithm on the CPU and we have here completed the computation of cases m ≤ 10 , due to high running times of Algorithm 2.
In the second version, we have used the OpenMP directives to parallelize the execution of the (min, +) matrix product routines in Algorithm 2 and the matrix difference in Algorithm 3 and 4 because they are the most computationally demanding matrix operations. We have computed until case m = 11 with 48 cores and although the speedup for Algorithm 2 is over 40 in the last case, the running time is still huge. The third program runs Algorithms 2, 3 and 4 on the GPU and cases m ≤ 12 have been obtained. Algorithm 2 presents here a very noticeable improvement in terms of running time, but the huge matrix size does not allow us to approach large cases given that from m = 13 the matrix cannot be allocated on the GPU memory.
In order to test the goodness of the implementation of Algorithms 3 and 4 on the CPU compared to the GPU, we have done the fourth version that uses the GPU just in Algorithm 2 and the OpenMP parallelization for Algorithms 3 and 4. This is slightly faster than version 3 because of the communication costs to allocate matrices on the GPU memory to perform matrix operations with little computational cost. The total running times of the four versions are shown in Table 5.

Conclusions
According to Theorem 3, values in Table 4 allow us to pose the finite difference equation 2 (P m □C n+a m ) − 2 (P m □C n ) = b m , n ≥ n m 0 , for each 2 ≤ m ≤ 12 . The boundary values 2 (P m □C n ) , n m 0 ≤ n ≤ n m 0 + a m − 1 , have been obtained with Algorithm 5. Therefore, the solution is 2 (P m □C n ) = ⌈ b m ⋅n a n ⌉ + m k , where n ≡ k (mod a m ) and m k depends on the boundary values for each m. Moreover, the remaining values of 2 (P m □C n ) , for 3 ≤ n < n m 0 , have also been computed with Algorithm 5 and most of them follow the general formula.
In the same way as in other domination parameters in grids and cylinders (see [17,18]), these results show a non-regular behavior for the smallest values of m, but it becomes regular for m ≥ 8 . Note that if 8 ≤ m ≤ 12 then, a m = 3 and b m = m + 2 . In such cases 2 (P m □C n ) = ⌈ (m+2)n 3 ⌉ + m k , where n ≡ k (mod 3) and m k again depends on the boundary values 2 (P m □C n m 0 +k ) . In order to complete the formulae, in Table 6 we show the values of m k , for each m ∈ {2, … , 12} and k ∈ {0, … , a m − 1}.
The only exceptions are n = 5 , for m ∈ {8, 10, 12} , where m k = 2 . This value is coherent with the results obtained in [28]: In spite of obtaining that m k ≤ 2 for m ≤ 12 , we think that such numbers will increase for some values of n as m grows because they would depend on m in some way. Our results cover the cases 2 ≤ m ≤ 12 , 3 ≤ n ≤ 15 already studied in [28], and all the results match. In addition, for 8 ≤ m ≤ 12 and n ≡ 0 (mod 3) we have shown that 2 (P m □C n ) = (m+2)n 3 . The same formula for n = 3, 6, 9, 12, 15 and m ≥ 8 is obtained in [28] and we have now extended this result to every n ≡ 0 (mod 3) , for 8 ≤ m ≤ 12 . Also note that our formulae for m ≤ 7 and n ≡ 0 (mod 3) show that such small cases do not follow the same formula, in general. Our results together with those in [28] give us support to conjecture that 2 (P m □C n ) = (m+2)n 3 , if m ≥ 8 and n ≡ 0 (mod 3).
Regarding the computational point of view, our main target was to develop efficient routines to compute (min, +) matrix products on multicore CPUs and GPUs. Such routines have application to the computation of the 2-domination number of cylindrical graphs with small paths of order m. Our approach has as a limitation the size of the involved matrices that exponentially grows as m does. This condition has led us to focus on cases 2 ≤ m ≤ 12 that meet the requirements of our computational resources on both the CPU and the GPU.
Once we have obtained the matrices for cases 2 ≤ m ≤ 12 , we have divided the routines in Algorithms from 2 to 5 and three of them, Algorithms 3, 4 and 5, can be run on the CPU in a reasonable time. Moreover, the OpenMP parallelization with 48 cores slightly improves such running times, which are negligible compared to the total ones. However, the CPU has shown to be non-sufficient to run Algorithm 2 in the most interesting cases, which are the largest ones, to find the desired regular behavior of the 2-domination number. The matrix operation used by this algorithm is the (min, +) matrix product and we explore two improvement options to reduce its running time: a parallelization of the algorithm with OpenMP with 48 cores and an implementation of this matrix product in CUDA for NVIDIA GPU. The OpenMP parallel version with 48 cores of Algorithm 2 has shown a speedup over 40 regarding the sequential version in case m = 10 . However, the running times are so high that the parallelization is not enough for m ≥ 11 , where more than 6 hours are needed. In contrast, the GPU version computes 50 powers of the matrix A(D m ) in considerably less time, with a speedup over 60 compared to the OpenMP version for m = 12.
We think it would be possible to improve the efficiency of Algorithm 2 by reducing the number of computed powers while the finite difference equation can still be solved. In addition, some parallelization of the (min, +) product allowing to distribute the product of two matrices in small sets of rows and columns would give the opportunity of computing some cases larger than m = 12 . Such improvements would perhaps allow us to conjecture a general formula of the 2-domination number of the cylinder P m □C n with n ≡ 1, 2 (mod 3).
To sum up, we have solved the graph problem of computing the 2-domination number of some cylinders with a small path in a reasonable time by exploiting the benefits of the GPU's to run algorithms involving the (min, +) matrix product while the rest of matrix operations involved, such as the matrix difference or the minimization of the main diagonal of a matrix, demand fewer computational resources and they can be addressed on the multicore CPU in a short time. Finally, we have conjectured that 2 (P m □C n ) if n ≡ 0 (mod 3).