1 Introduction

Biclustering is a data analysis technique that searches for interesting submatrices of a given matrix. The resultant submatrix, referred to as a bicluster, can be defined as an ordered pair, consisting of a subset of rows and a subset of columns of the given matrix. This approach was first used in the 1970s by Hartigan [1].

Boolean reasoning [2] is a paradigm computational task solving. Typically, the original issue becomes encoded into a Boolean formula and results of its transformation may be decoded into solutions of the original problem. Such an approach is widely applied in Rough Set Theory [2, 3]; however, it is also used for decision tree induction [4].

Among many others [2, 5,6,7], a new approach to biclustering based on Boolean reasoning was presented [8] in 2018. Promising results with discrete and binary data took effect also with continuous data biclustering methods development [9].

The primary disadvantage of methods based on Boolean reasoning is high computational complexity, due to Boolean function satisfiability checking. This problem has given rise to the use of heuristics to accelerate the computations. In the paper [10] the proposition of a simple, sequential covering strategy is presented. The approach searches for the set of biclusters that together contain all ones in the binary data and requires the modified Johnson’s strategy of prime implicant approximation [11]. Some scenarios, however, may require wider biclusters, potentially including those which overlap one another, that are more general and do not necessarily contain all ones in the binary data.

The need for wide bicluster searches has engendered new development of bicluster induction heuristics. The above-mentioned modified version of Johnson’s strategy uses single prime implicant approximation in order to produce an iterative, sequential coverage of ones in a binary matrix. In general, searching only among the uncovered ones of a generated bicluster provides smaller biclusters (i.e., biclusters with fewer rows or columns) as the number of iterations increases.

This paper presents a new hierarchical, heuristic strategy for binary data biclustering. The heuristics used is that of the modified version of Johnson’s strategy of prime implicant approximation [10]. The use of the term hierarchical refers to the “tabu search” [12] paradigm: following the discovery of a solution (node), solutions that are similar but not equivalent are found (subnodes). This process then iterates. The similarity condition is satisfied by random data modification: several subnodes are invoked, and for each of them a different element of the input data is altered from one to zero (only ones in a bicluster discovered by the given node are altered in this manner).

This paper is organized as follows: it begins with a brief review of existing approaches to biclustering; following this, the essential notions of Boolean-reasoning-based biclustering are defined and presented; the subsequent section develops the central concept of the hierarchical, heuristic strategy for binary biclustering induction, and provides abstract examples and pseudo-code; the penultimate section reports the results of experimental tasks using artificial data; and the final section offers conclusions and a perspective on possible further work in this area.

2 Related works

Biclustering was first used in the 1970’s [1]. Following this, the technique has been applied to many disciplines, including biomedical data analysis [13] and text mining [14], leading to the development of multiple approaches to biclustering.

Tanay et al. [5] describe several biclustering methods including Cheng and Churs’s algorithm [15], coupled two-way clustering [16], iterative approaches [17, 18], SAMBA [19], spectral approaches [20], and plaid models [21]. Pontes et al. [22] provide a complex classification of biclustering methods, divided into:

  • greedy strategies [1, 15, 23],

  • stochastic approaches [24, 25],

  • meta-heuristics [26, 27],

  • clustering-based approaches [28, 29],

  • graph-based approaches [19, 30],

  • one-way clustering-based approaches [16, 31],

  • probabilistic models [21, 32],

  • linear-algebra-based models [17, 20], and

  • row and column reordering approaches [33].

Beyond methods dedicated strictly to biclustering, other data analysis paradigms share similar characteristics. For example, the search for an inclusion-maximal bicluster of ones in a binary matrix is comparable to the extraction of the concept lattice for a given context [34]. In the domain of basket analysis, the generation of a frequent itemset corresponds to the generation of an exact bicluster [35].

3 Boolean-reasoning-based biclustering

This section defines objects and concepts that will be used throughout the paper as well as provides backgrounds of Boolean-Reasoning-Based biclustering.

3.1 Definitions

Definition 1

(Bicluster) Let M be a matrix with rows R and columns C. The bicluster \({\mathcal {R}}{\mathcal {C}}\ \equiv ({\mathcal {R}},{\mathcal {C}})\) is an ordered pair of a susbset of rows \({\mathcal {R}}\subseteq R\) and a subset of columns \({\mathcal {C}}\subseteq C\).

Definition 2

(Exact bicluster) Let \({\mathcal {R}}\) \({\mathcal {C}}\) be a bicluster. \({\mathcal {R}}\) \({\mathcal {C}}\) is exact iff

$$\begin{aligned} \forall _{r_i, r_j \in {\mathcal {R}}}\ \forall _{c_u, c_v \in {\mathcal {C}}}\ M(r_i, c_u) = M(r_j, c_v), \end{aligned}$$

where M(rc) is the element (cell) of matrix M in row r and column c.

Definition 3

(Inclusion–maximality of exact bicluster) Let M be a binary matrix and let \({\mathcal {R}}\) \({\mathcal {C}}\) be an exact bicluster derived from M. The bicluster is inclusion-maximal if and only if there exists no row \(r \in R \setminus {\mathcal {R}}\) or column \(c \in C \setminus {\mathcal {C}}\) such that any of the extended biclusters

$$\begin{aligned} ({\mathcal {R}}\cup r){\mathcal {C}}\ \text {or}\ {\mathcal {R}}({\mathcal {C}}\cup c)\ \text {or}\ ({\mathcal {R}}\cup r)({\mathcal {C}}\cup c) \end{aligned}$$

are also exact biclusters.

Definition 4

(Implicant) Let \(f(a_1, \ldots , a_n)\) be a Boolean function of n Boolean variables. The expression

$$\begin{aligned} P_f(A=\{a_m, \ldots , a_{p}\}) = a_m \wedge \ldots \wedge a_{p},\ \ A \subseteq \{a_1, \ldots , a_n\}, \end{aligned}$$

such that

$$\begin{aligned} P_f(A) = 1 \Rightarrow f(a_1, \ldots , a_n) = 1, \end{aligned}$$

is an implicant of the function f.

Definition 5

(Prime implicant) Let \(f(a_1, \ldots , a_n)\) be a Boolean function of n Boolean variables and let \(P_f(A)\) be an implicant of f. \(P_f(A)\) is a prime implicant if and only if for all \(A' \subset A\) \(P_f(A')\) is not an implicant.

Definition 6

(Row/column corresponding variable) Let M be a matrix with rows R and columns C. Each row r (column c) has a corresponding Boolean variable \(r'\) (\(c'\)).

Definition 7

(Implicant and bicluster correspondence) Let M be a matrix with rows R and columns C. Bicluster \({\mathcal {R}}\) \({\mathcal {C}}\) and implicant \(P_f(A)\) correspond to one another if and only if

$$\begin{aligned} A = \{a' : a \in (R \cup C)\setminus ({\mathcal {R}}\cup {\mathcal {C}})\}. \end{aligned}$$

That is to say, the implicant and bicluster correspond if and only if the implicant contains Boolean variables that correspond to rows and columns that are not elements of the bicluster. Such a corresponding implicant is denoted as

$$\begin{aligned} P_f(A) = {\mathcal {R}}'{\mathcal {C}}'. \end{aligned}$$

3.2 Boolean reasoning in binary biclustering

Michalak and Ślȩzak [8] provide the mathematical background for bicluster induction with discrete and binary data in the context of Boolean reasoning. There exist two theorems that bind biclusters of binary matrices and implicants of precisely defined (and data dependent) Boolean formulas. As written here the definition and theorems are used to find exact biclusters of ones among a background of zeros in a matrix. Replacing zero with one in the text, and vice versa, generates a definition and theorems for finding exact biclusters of zeros in binary data.

Definition 8

(Zero-encoding Boolean function) Let M be a binary matrix with rows R and columns C. The zero-encoding Boolean function is the conjunction of disjunctions of the corresponding variables of row \(r \in R\) and column \(c \in C\), such that \(M(r, c) = 0\):

$$\begin{aligned} f(M) = \bigwedge \left( r' \vee c'\right) , \ \ M(r,c) = 0. \end{aligned}$$

Following the above function definition Michalak and Ślȩzak [8] prove two theorems. Here, the theorems are stated. The first theorem details the correspondence between implicants of f(M) and exact biclusters of M.

Theorem 1

(Exact bicluster and implicant correspondence theorem) Let M be a binary matrix with rows R and columns C. Bicluster \({\mathcal {R}}\) \({\mathcal {C}}\) is an exact bicluster of ones in M if and only if \({\mathcal {R}}'{\mathcal {C}}'\) is an implicant of f(M).

The second theorem demonstrates the correspondence between exact, inclusion-maximal biclusters of ones in M and prime implicants of f(M).

Theorem 2

(Exact, inclusion-maximal bicluster and prime implicant correspondence theorem) Let M be a binary matrix with rows R and columns C. Bicluster \({\mathcal {R}}\) \({\mathcal {C}}\) is an exact, inclusion-maximal bicluster of ones in M if and only if \({\mathcal {R}}'{\mathcal {C}}'\) is a prime implicant of f(M).

Consider the binary matrix M presented in Table 1. The goal is to find all exact, inclusion-maximal biclusters of ones. The formula f(M) can be expressed at the logical multiplication of two-literal clauses. A given two-literal clause consists of the Boolean variables that correspond to the row and column indices of a zero value element of M (Table 1). Consider the matrix element \(M(1,b)=0\). The corresponding two-literal clause has the form:

$$\begin{aligned} (1 \vee b). \end{aligned}$$

Note that the same notation is used for both rows and columns and the Boolean variables that correspond to those rows and columns. However, the meaning is context-dependent. For example, b can represent either an index, as in M(1, b), or a Boolean variable, as in \(1\vee b\).

The formula that encodes all zeros in the matrix M has the following form:

$$\begin{aligned} f(M) = ( 1\vee b)\wedge ( 2\vee b)\wedge ( 2 \vee c) \end{aligned}$$

Transforming this into a function that consists only of prime implicants gives:

$$\begin{aligned} f(M) = ( 1\wedge 2)\vee ( 2\wedge b)\vee ( b \wedge c) \end{aligned}$$

The result is a function of three prime implicants, each of which corresponds (via Theorem 2) to an exact, inclusion-maximal bicluster of ones. A visualization of the biclusters corresponding to the f(M) prime implicants is presented in Table 2. Note that none of the biclusters contain a zero, and neither may they be extended by row or column without subsequently containing a zero.

Table 1 An example binary matrix M
Table 2 Prime implicants of the f(M) function and their corresponding biclusters

4 Heuristic and hierarchical search of wide biclusters in binary data

The above approach to binary data biclustering has a high degree of computational complexity due to the satisfiability problem of Boolean formulas. From Theorem 1, each implicant of a Boolean formula f(M) encodes an exact bicluster of the matrix M. By exploiting this relationship heuristic strategies can be applied to find implicants of Boolean formulas.

A popular approach to prime implicant approximation searching, based on the frequency with which literals occur, is Johnson’s strategy [11]. However, Michalak et al. [10] prove that this strategy may induct implicants for which the corresponding biclusters are empty (i.e., biclusters with rows but no columns, or vice versa). To avoid such situations, Michalak et al. [10] propose a new heuristic for implicant induction. In addition, they present an approach for sequential coverage of bicluster induction that covers all ones in the data.

The sequential coverage strategy ensures that all ones in the data are eventually covered by a bicluster. However, as the process progresses, the size of newly found biclusters decreases. As a result, only biclusters found in the initial phase of the process may be generalizable. The new heuristic presented in this work adopts a different approach to finding biclusters.

Consider searching a binary matrix for biclusters of ones that are as wide as possible in both directions. The heuristics of Michalak et al. [10] provide a Boolean function implicant that encodes one exact bicluster of the data. This is the widest possible bicluster that the heuristics can find. Now consider the effect of replacing a one in the bicluster with a zero, and invoking the heuristics again. The result would not be the same bicluster as was originally found; the zero inside the original bicluster would violate the exactness condition. The visualization of two iterations of such an approach is presented in Table 3. It assumes a given heuristic to search for an exact bicluster of ones (not necessarily the heuristic above).

Table 3 Modifying data to invoke another iteration of bicluster searching: a original binary data; b bicluster found by a given heuristic; c a single one in the bicluster is replaced by a zero; d from the altered data a new bicluster is found

The example shown in Table 3 proceeds as follows. From the binary data (a) a bicluster is found using a given heuristic (b). An arbitrarily chosen one (third row, fourth column) is replaced by a zero (c). From the modified data the same heuristic is used to find another bicluster (d). The newly found bicluster can not be the same as was previously found.

For a bicluster consisting of r rows and c columns up to \(r \cdot c\) different modifications to the original data can be made, and up to \(r \cdot c\) new biclusters can be found heuristically. Moreover, each of the biclusters found in the modified data can be used as an input for further processing. This forms the general hierarchical strategy of bicluster induction.

Intuitively, such a recursive strategy can be executed until a given stop criterion is fulfilled. We can consider no fewer than four stop criteria:

  • a maximum number of iterations (recursive invocations),

  • a maximum number of found biclusters,

  • a maximum total coverage of the data,

  • a minimum assumed coverage of the data.

The pseudocode for this heuristic is presented in Algorithm 1.

figure b

The queue (Algorithm 1, line 4) is used as part of the breadth–search strategy: each iteration of the while loop adds new data based on the found biclusters. The stop condition continue can take the form of one of the above criteria. The FindSingleBicluster method is an implementation of the heuristics from [10]. To avoid the need for postprocessing of the biclustering results (i.e., the removal of biclusters that are fully covered by others) the AlreadyCovered method (line 11) tests each newly found bicluster to determine if it is a subset (by rows and columns) of the union of a bicluster in the list of biclusters. If this is not the case, the newly found bicluster is appended to the list of biclusters, and it is inserted into the queue as new data.

In practice, in addition to stop conditions, limitations on tree generation are required. The first limitation deals with non-exhaustive subtree induction: this limits the percentage of bicluster ones that are replaced by zeros and processed further. The second limitation handles maximal tree depth. The pseudocode for this limited heuristics is presented in Algorithm 2.

figure c

Following the preparation of a new task in the main while loop, if the queue is not empty the depth of the task is checked. Tasks with a depth greater than a given threshold are omitted. (line 11). If the queue is empty a new root with zero depth is built from the remaining, uncovered original data (line 13) and appended to the queue.

5 Experiments

Experiments were performed on an example data set. The data set took the form of three binary matrices, presented in Fig. 1. The matrices are those used by Michalak and Ślȩzak [8]. The three binary matrices were derived from a single data matrix (Fig. 1, left) which contained data of three discrete values. Each binary matrix was assigned one of the discrete values. If the value of a given element in the original data matrix was equal to one of the three discrete values, the corresponding element in the relevant binary matrix was set to one. All other elements of the binary matrices were set to zero.

Fig. 1
figure 1

The discrete data matrix (left) and the three binary matrices derived from it. The binary matrices were created using three discrete values: #0 (left center), #77 (right center) and #237 (right)

The set of all exact, inclusion-maximal biclusters in each of three binary matrices can be found by using the BiMax algorithm [2] or an exhaustive Boolean reasoning strategy [8]. The results of applying these methods are presented in Table 4.

Table 4 The results of applying an exhaustive strategy for binary bicluster induction to an example data set

The total coverage is the ratio of the number of ones that are covered by at least one bicluster and the total number of ones. This value must be equal to unity as both methods used are exhaustive: they find all inclusion-maximal biclusters, which necessitates that every one in the data is covered by at least one bicluster. The overlap is the ratio of the summed size of all biclusters (the number of rows multiplied by the number of columns) and the total number of ones in the data. It represents the average number of biclusters that cover a single one in the data.

Table 5 The results of applying a modified version of Johnson’s strategy for binary bicluster induction to an example data set

To provide a comparison to the exhaustive search and hierarchical heuristics strategies, Table 5 presents the results of sequential covering using a modified version of Johnson’s strategy.

The application of a hierarchical heuristics to the data was carried out with the following assumptions and parameter settings:

  • the total coverage of ones in the data should be approximately 90%,

  • the maximal depth of the search tree is set as three, to enforce search outside of the root bicluster, and

  • up to 1% of matrix elements are selected from a newly found bicluster (Algorithm 2, line 20), but no fewer than three subiterations are invoked.

Originally, 10 experiments were performed on each binary matrix, with a generated bicluster value between 100 and 1000, in steps of 100. However, due to the experiments with 1000 biclusters providing unsatisfactory results, an additional three experiments per matrix were performed, with a generated bicluster value between 1,100 and 1300, in steps of 100.

Table 6 The results of applying a hierarchical heuristics for binary bicluster induction

The results of using the exhaustive strategy (Table 4) provide a reference for subsequent results. The sequential coverage strategy (Table 5) generated a set of biclusters covering all ones; however, the size of the newly found biclusters decreased as the coverage increased. The results of using the hierarchical heuristics strategy, presented in Table 6, show a compromise between high generalization (biclusters that are wide in both directions) and computation period.

Figures 2, 3 and 4 present histograms of bicluster area for each of the three binary matrices, when using each of the three bicluster induction strategies. The histograms show the comprise between high generalization and computation period more clearly.

Fig. 2
figure 2

Histograms of bicluster area for the #0 data: exhaustive strategy (left), modified version of Johnson’s strategy (center) and hierarchical heuristics (right)

Fig. 3
figure 3

Histograms of bicluster area for the #77 data: exhaustive strategy (left), modified version of Johnson’s strategy (center) and hierarchical heuristics (right)

Fig. 4
figure 4

Histograms of bicluster area for the #237 data: exhaustive strategy (left), modified version of Johnson’s strategy (center) and hierarchical heuristics (right)

For each set of data the same observations can be made:

  • the exhaustive strategy finds biclusters with a wider range of areas,

  • the sequential coverage strategy finds biclusters with smaller areas,

  • the hierarchical heuristics strategy finds bicluster with a wider range of areas than those found by the sequential coverage strategy.

These observations demonstrate that the hierarchical heuristics strategy fulfills its expectations—the strategy can find more general biclusters in less time, compared to the modified version of Johnson’s strategy.

Figure 5 presents the relationship between total coverage and number of biclusters generated, when using the hierarchical heuristics strategy.

Fig. 5
figure 5

Total coverage as a function of the number of biclusters generated by the hierarchical heuristics strategy

As the hierarchical heuristics strategy implements sequential coverage (only newly found biclusters that cover previously uncovered data are added to the final set), coverage increases as the number of generated biclusters increases.

Fig. 6
figure 6

The area of the bicluster as the function of the iteration number

Figure 6 presents the relationship between bicluster area and iteration number, when using the hierarchical heuristics strategy. The observed relationship validates the central concept of the strategy. When using the modified version of Johnson’s strategy, all covered ones are replaced with zeros, and the updated data is used as an input for further analysis. When using the hierarchical heuristics strategy, only a small number of ones are replaced with zeros. Invoking the process recurrently allows for the induction of biclusters that can cover both previously covered and previously uncovered ones, increasing bicluster generalization.

The results of applying the hierarchical heuristics strategy to the #237 data (Fig. 6, bottom) provides further insights. The strategy generates biclusters with small areas between iterations 200 and approximately 400. At approximately iteration 400 a steep increase in the area of the generated biclusters can be observed. A similar situation occurs at approximately iteration 800. This is caused by the strategy “jumping” to unexploited regions of the matrix (as a new root bicluster is generated) and inducting matrices from those uncovered areas.

The hierarchical strategy is capable of a total coverage value of unity. Table 7 presents the number of iterations required for this, in addition to the overlap.

Table 7 The results of hierarchical heuristics strategy for binary bicluster induction with the complete coverage of ones

6 Conclusions and further works

Exhaustive search generates the widest possible exact biclusters; however, such an approach has high computational complexity (with regards to both processing time and memory). In the paper [10] it was attempted to decrease the computation time while retaining the theoretical background of the approach by modifying Johnson’s strategy of prime implicants approximation. Although this was successful, the sequential coverage approach limited the area of the generated biclusters. Based on the modified version of Johnson’s strategy, the hierarchical heuristics strategy, introduced in this work, is capable of more general bicluster induction. Experimental results when using the hierarchical heuristics strategy are promising, demonstrating an ability to find widespread biclusters covering a substantial section of the binary data.

Further modifications to the strategy could be considered. The use of a task queue provides a straightforward method to further decrease computation time. As all tasks are independent (with regards to single task analysis), they can be executed concurrently on different processor cores or computing nodes. Moreover, the initial results when using the hierarchical heuristics strategy could be postprocessed in order to remove small and insignificant biclusters. This confirms that the application of the Boolean reasoning paradigm to binary data biclustering continues to provide challenges to be solved.