Triadic Formal Concept Analysis and triclustering: searching for optimal patterns
Abstract
This paper presents several definitions of “optimal patterns” in triadic data and results of experimental comparison of five triclustering algorithms on realworld and synthetic datasets. The evaluation is carried over such criteria as resource efficiency, noise tolerance and quality scores involving cardinality, density, coverage, and diversity of the patterns. An ideal triadic pattern is a totally dense maximal cuboid (formal triconcept). Relaxations of this notion under consideration are: OACtriclusters; triclusters optimal with respect to the leastsquare criterion; and graph partitions obtained by using spectral clustering. We show that searching for an optimal tricluster cover is an NPcomplete problem, whereas determining the number of such covers is #Pcomplete. Our extensive computational experiments lead us to a clear strategy for choosing a solution at a given dataset guided by the principle of Paretooptimality according to the proposed criteria.
Keywords
Formal Concept Analysis Triclustering Triadic data Multiway set Tripartite graphs Pattern mining Suboptimal solutions1 Introduction and related work

Bibsonomy data from bibsonomy.org (Benz et al. 2010) capturing a ternary relation among three sets: (i) users, (ii) bookmarks, (iii) tags (topics);

Movies database IMDb (www.imdb.com) capturing, say, a binary relation of “relevance” between a set of movies and a set keywords or a ternary relation between sets of movies, keywords and celebrities;

job banks comprising at least four itemsets (jobs, job descriptions, job seekers, seeker skills).
Obviously, biclusters form a set of clumps in the data so that further learning can be organized within them. The biclustering techniques and Formal Concept Analysis machinery are being developed independently in independent communities using different mathematical frameworks. Specifically, the mainstream in Formal Concept Analysis is based on order structures, lattices and semilattices (Ganter and Wille 1999; Poelmans et al. 2013a), whereas biclustering is based more on conventional optimization approaches and matrix algebra frameworks (Madeira and Oliveira 2004; Eren et al. 2013). Yet these different frameworks considerably overlap in applications. Among those: finding coregulated genes over gene expression data (Madeira and Oliveira 2004; Besson et al. 2005; Barkow et al. 2006; Tarca et al. 2007; Hanczar and Nadif 2010; Kaytoue et al. 2011; Eren et al. 2013), prediction of biological activity of chemical compounds (Blinova et al. 2003; Kuznetsov and Samokhin 2005; DiMaggio et al. 2010; Asses et al. 2012), summarization and classification of texts (Dhillon 2001; Cimiano et al. 2005; Banerjee et al. 2007; Ignatov and Kuznetsov 2009; Carpineto et al. 2009), structuring websearch results and browsing navigation in Information Retrieval (Carpineto and Romano 2005; Koester 2006; Eklund et al. 2012; Poelmans et al. 2012), finding communities in twomode networks in Social Network Analysis (Duquenne 1996; Freeman 1996; Latapy et al. 2008; Roth et al. 2008; Gnatyshak et al. 2012) and Recommender Systems (BoucherRyan and Bridge 2006; Symeonidis et al. 2008; Ignatov and Kuznetsov 2008; Nanopoulos et al. 2010; Ignatov et al. 2014).
It is worth noting that Formal Concept Analysis helped to algebraically rethink several models and methods in Machine Learning such as version spaces (Ganter and Kuznetsov 2003), learning from positive and negative examples (Blinova et al. 2003; Kuznetsov 2004), and decision trees (Kuznetsov 2004). It was also shown that concept lattice is a perfect search space for learning globally optimal decision trees (Belohlávek et al. 2009). However, since early 90s both supervised and unsupervised machine learning techniques and application based on Formal Concept Analysis were introduced in the machine learning community. For example in Carpineto and Romano (1993), Carpineto and Romano (1996) there were reported results on the concept lattice based clustering in GALOIS system that suited for information retrieval via browsing. Fu et al. (2004) performed a comparison of seven FCAbased classification algorithms. Rudolph (2007) and Tsopzé et al. (2007) propose independently to use FCA to design a neural network architecture. In Outrata (2010), Belohlávek et al. (2014) FCA was used as a data preprocessing technique to transform the attribute space to improve the results of decision tree induction. Visani et al. (2011) proposed Navigala, a navigationbased approach for supervised classification, and applied it to noisy symbol recognition. Latticebased approaches were also successfully used for finding frequent (closed) itemsets (Pasquier et al. 1999; Kuznetsov and Obiedkov 2002; Zaki and Hsiao 2005) as well as on data with complex descriptions such as graphs or trees for classification (Kuznetsov and Samokhin 2005; Zaki and Aggarwal 2006) and sequential pattern mining (Zaki 2001; Buzmakov et al. 2013). Recent survey on theoretical advances and applications of FCA can be found in (Poelmans et al. 2013a, b).
Bicluster and tricluster from Mirkin and Kramarenko (2011)
Clump  Moviekeywordgenre 

Bicluster  {12 Angry Men (1957), To Kill a Mockingbird (1962), Witness for the Prosecution (1957)}, {Murder, Trial}, {n/a } 
Tricluster  {12 Angry Men (1957), Double Indemnity (1944), Chinatown (1974), The Big Sleep (1946), Witness for the Prosecution (1957), Dial M for Murder (1954), Shadow of a Doubt (1943) }, { Murder, Trial, Widow, Marriage, Private detective, Blackmail, Letter}, {Crime, Drama, Thriller, Mystery, FilmNoir} 
Therefore, it can be useful to extend the concepts and techniques for bicluster and Formal Concept Analysis to data of relation among more than two datasets. A few attempts in this direction have been published in the literature. For example, Zhao and Zaki (2005) proposed Tricluster algorithm for mining biclusters extended by time dimension to realvalued gene expression data. A triclustering method was designed in Li and Tuck (2009) to mine gene expression data using blackbox functions and parameters coming from the domain. In the Formal Concept Analysis framework, theoretic papers (Wille 1995; Lehmann and Wille 1995) introduced the socalled Triadic Formal Concept Analysis. In KrolakSchwerdt et al. (1994), triadic formal concepts apply to analyze small datasets in a psychological domain. Paper (Jäschke et al. 2006) proposed rather scalable method for mining frequent triconcepts in Folksonomies. Simultaneously, a less efficient method on mining closed cubes in ternary relations was proposed by Ji et al. (2006). There are several recent efficient algorithms for mining closed ternary sets (triconcepts) and even more general algorithms than Trias. Thus, DataPeeler (Cerf et al. 2009) is able to mine \(n\)ary formal concepts and its descendant mines faulttolerant \(n\)sets (Cerf et al. 2013); the latter was compared with DCE algorithm for faulttolerant \(n\)sets mining from Georgii et al. (2011). The paper (Spyropoulou et al. 2014) generalises \(n\)ary formal concept mining to multirelational setting in databases.
The goal of this paper is to investigate the extensions of the concepts of bicluster and formal concept to the case of data representing a yes/no relation among three, rather than two, sets of entities. Specifically, in this paper we consider the case of data on yes/no relation among three sets of entities and the concepts of tricluster and formal triconcept. This allows us to bring forward both latticebased and linear algebraic approaches, Formal Concept Analysis using lattices of closed sets (see Ganter and Wille 1999; Lehmann and Wille 1995; Jäschke et al. 2006) and density/approximation based methods from linear algebra (see Mirkin and Kramarenko 2011). The formal triconcepts refer to such subsets of each of the three sets of entities that all the withintriples are in “yes” relation, whereas the algebraic methods allow some of the triples be not related. Each of the approaches has its advantages and disadvantages, but they have never been compared experimentally.
 1.
Evaluation criteria In our study we use the following six criteria: the average density, the coverage, the diversity and the number of triclusters, and the computation time and noise tolerance for the algorithms.
 2.
Benchmark datasets We use triadic datasets from publicly available internet data as well as synthetic datasets with various noise models.
The remainder is organised as follows. In Sect. 2 we give main definitions of Formal Concept Analysis and describe Trias algorithm for triadic concept generation. Section 3 introduces the notion of OAC tricluster, as a relaxation of the triadic formal concept, and presents two associated OACtriclustering methods OACbox and OACprime. Section 4 introduces the notion of box tricluster based on the conventional leastsquares criterion and describes the TriBox approach. In Sect. 5 we present SpecTric triclustering approach based on the adaptation of spectral clustering to the triadic setting. Section 6 describes the evaluation criteria for tricluster collections and comparison of the algorithms. It also contains results on the complexity of a related problem, the optimal tricluster cover search. Section 7 describes the datasets selected or generated for our experiments. Section 8 presents the results obtained in the experimentation section and their discussion. The last section concludes the paper and indicates some further research directions.
2 Triadic Formal Concept Analysis and TRIAS method
2.1 Binary and nary contexts
First, we recall some basic notions from Formal Concept Analysis (FCA) Ganter and Wille (1999). Let \(G\) and \(M\) be sets, called the set of objects and attributes, respectively, and let \(I\) be a relation \(I\subseteq G\times M\): for \(g\in G, \ m\in M\), \(gIm\) holds iff the object \(g\) has the attribute \(m\). The triple \(\mathbb {K}=(G,M,I)\) is called a (formal) context.
A triadic context \(\mathbb {K}=(G,M,B,Y)\) consists of sets \(G\) (objects), \(M\) (attributes), and \(B\) (conditions), and ternary relation \(Y\subseteq G \times M \times B\) (Lehmann and Wille 1995). An incidence \((g, m, b) \in Y\) shows that object \(g\) has attribute \(m\) under condition \(b\).
An \(n\) adic context is an \((n + 1)\)tuple \(\mathbb {K}= (X_1,X_2, \ldots ,X_n, Y)\), where \(Y\) is an \(n\)ary relation between sets \(X_1, \ldots , X_n\) (Voutsadakis 2002).
2.2 Concept forming operators and formal concepts
The pair \((A,B)\), where \(A\subseteq G\), \(B\subseteq M\), \(A' = B\), and \(B' = A\) is called a (formal) concept (of the context \(K\) ) with extent \(A\) and intent \(B\) (in this case we have also \(A'' = A\) and \(B'' = B\)).
The concepts, ordered by \((A_1,B_1)\ge (A_2,B_2) \iff A_1\supseteq A_2\) form a complete lattice, called the concept lattice \(\underline{{\mathfrak B}}(G,M,I)\).
2.3 Formal concepts in triadic and in nary contexts
For convenience, a triadic context is denoted by \((X_1,X_2,X_3,Y)\). A triadic context \(\mathbb {K}=(X_1,X_2,X_3,Y)\) gives rise to the following diadic contexts
\(\mathbb {K}^{(1)}=(X_1, X_2\times X_3, Y^{(1)})\), \(\mathbb {K}^{(2)}=(X_2, X_1\times X_3, Y^{(2)})\), \(\mathbb {K}^{(3)}=(X_3, X_1\times X_2, Y^{(3)})\),
One may introduce \(n\)adic formal concepts without \(n\)ary concept forming operators. The \(n\)adic concepts of an \(n\)adic context \((X_1, \ldots ,X_n, Y)\) are exactly the maximal \(n\)tuples \((A_1, \ldots , A_n)\) in \(2^{X_1} \times \cdots \times 2^{X_n}\) with \(A_1 \times \cdots \times A_n \subseteq Y\) with respect to componentwise set inclusion (Voutsadakis 2002). The notion of \(n\)adic concept lattice can be introduced in the similar way to the triadic case (Voutsadakis 2002).
2.4 NextClosure algorithm extended
Trias (Jäschke et al. 2006) is a method for finding (frequent) triadic formal concepts, that are closed 3sets. Since we consider triadic formal concepts as starting point of our search of optimal tripatterns and absolutely dense triclusters, this method was added to the study.
Formally, Trias solves the following problem:
Problem 1
(Mining all frequent triconcepts) Let \(\mathbb {K}=(G, M,B, I )\) be a triadic context, and let \(g\)minsup, \(m\)minsup, \(b\)minsup \(\in [0, 1]\). The task of mining all frequent triconcepts consists in determining all triconcepts \((X,Y,Z)\) of \(\mathbb {K}\) with \(X \le \tau _G\), \(Y \le \tau _M\), and \(Z \le \tau _B\), where \(\tau _G = G \cdot g\)minsup, \(\tau _M= M \cdot m\)minsup, and \(\tau _B = B \cdot b\)minsup.
Trias is based on the NextClosure algorithm (Ganter 1987; Ganter and Wille 1999) that enumerates all formal concepts of the dyadic context in lectic order, the lexicographic order on bit vectors describing subsets of objects (attributes, respectively).
In Trias this approach is extended to the triadic case and minimal support constraints are added (triclusters with too small extent, intent or modus are skipped).
The Trias algorithm was designed to mine socalled folksonomies (Vander Wal 2007) in resource sharing systems, e.g. in social bookmarking systems like delicious and bibsonomy.
Formally, a folksonomy is a tricontext \(\mathbb {F}=(U,T,R,H)\), \(U \times T \times R \subseteq H\), where \(U\) is a set of users, \(T\) is a set of tags, and \(R\) is a set of resources. A triple \((u,t,r) \in H\) means that the user \(u\) assigned the tag \(t\) to the resource \(r\).
Trias has a precursor, Tripat algorithm (KrolakSchwerdt et al. 1994), for analysing triadic data from psychological studies.
The Trias algorithm uses two other functions FirstFreqCon and NextFreqCon as subroutines. First it composes the new binary relation \(\tilde{I}:=\{(g,(m,b)){\,\,}(g,m,b)\in I\}\) (line 2) and then finds the first frequent concept in the corresponding formal context \((G,M\times B,\tilde{I})\) (line 3) w.r.t. lectic order on concept extents and minimal support \(\tau _G\).
The main advantages of the Trias algorithm are as follows: It does not generate the same triconcept more than once and it uses the main memory space almost only for the input data storage.
Let us discuss the time complexity of the Trias algorithm. The function \(NextFreqCon\) \(((X,J),(G,M\times B,\tilde{I}),\tau _G=0)\) produces the set of all concepts of \(\mathbb {K}_{\tilde{I}}\) in time \(O(G^2MBL_{\tilde{I}})\) with polynomial delay \(O(G^2M)\) and \(NextFreqCon((Y,Z),\) \((M,B,J),\tau _M=0)\) produces the set of all concepts of \(\mathbb {K}_{J}\) in time \(O(M^2BL_{J})\) with polynomial delay \(O(M^2B)\), where \(L_{\tilde{I}}\) and \(L_{J}\) are the sets of all concepts of corresponding contexts \(\mathbb {K}_{\tilde{I}}\) and \(\mathbb {K}_{J}\) respectively. These worstcase bounds are based on those of NextClosure algorithm reported in Kuznetsov and Obiedkov (2002). Note that the upper bound values of \(L_{\tilde{I}}\) and \(L_{J}\) are \(2^{\min \{G,MB\}}\) and \(2^{\min \{M,B\}}\) for the case where each of these lattices is isomorphic to a Boolean lattice of the corresponding size. However this case is a rare one taking into account high sparsity of real datasets.
In paper (Biedermann 1998) the upper bound size of concept trilattice \(\mathfrak {T}(X,X,X,Y_X)\) is provided when \(Y_X= X\times X \times X \setminus (x,x,x)\), where \(x \in X\): \(\mathfrak {T}=3^{X}\). Hence, the worstcase upper bound for an arbitrary tricontext \(\mathbb {K}=(G, M, B, I)\) is \(\mathfrak {T}=3^{\min \{G,M,B\}}\).
3 Relaxed objectattributecondition patterns: OAC triclusters
Guided by the idea of finding scalable and noisetolerant triconcepts, we had a look at triclustering paradigm in general for a triadic binary data, i.e. for tricontexts as input datasets.
3.1 Ternary patterns and their density
Let \(\mathbb {K}=(G,M,B,I)\) be a triadic context, where \(G\), \(M\), and \(B\) are sets, and \(I\) is a ternary relation: \(I\subseteq G\times M\times B\).
Suppose \(X\), \(Y\), and \(Z\) are some subsets of \(G\), \(M\), and \(B\) respectively.
Definition 1
Suppose \(\mathbb {K}=(G,M,B,I)\) is a triadic context and \(Z \subseteq G\), \(Y \subseteq M\), \(Z \subseteq B\). A triple \(T=(X,Y,Z)\) is called an OACtricluster. Traditionally, its components are called (tricluster) extent, (tricluster) intent, and (tricluster) modus, respectively.
Definition 2
The tricluster \(T\) is called dense iff its density is not less than some predefined threshold, i.e. \(\rho (T)\ge \rho _{min}\).
The collection of all triclusters for a given tricontext \(\mathbb {K}\) is denoted by \(\mathcal {T}\).
Since we deal with all possible cuboids in Cartesian product \(G\times M\times B\), it is evident that the number of all OACtriclusters, \(\mathcal {T}\), is equal to \(2^{G\cdot M\cdot B}\). However not all of them are supposed to be dense, especially for real data which are often quite sparse. Below we discuss two possible OACtricluster definitions, which give us an efficient way to find within polynomial time a number of (dense) triclusters not greater than the number \(I\) of triples in the initial data.
3.2 Bounding operator box
Here let us define the box operators and describe box OACtriclustering. We use a slightly different introduction of the main TCA notions because of their further technical usage.
Definition 3
Suppose \(\mathbb {K}=(G,M,B,I)\) is a triadic context. For a triple \((g,m,b)\in I\) a triple \(T=(g^\square ,m^\square ,b^\square )\) is called a box operator based OACtricluster. Traditionally, its components are respectively called extent, intent, and modus.
The proposed OACtricluster definition has a useful property (see Proposition 1): for every triconcept in a given tricontext there exists a tricluster of the same tricontext containing the triconcept. It means that there is no information loss, since we keep all the triconcepts in the resulting tricluster collection.
Proposition 1
(Ignatov et al. 2013) Let \(\mathbb {K}=(G,M,B,Y)\) be a triadic context and \(\rho _{min}=0\). For every \(T_c=(A_c,B_c,C_c) \in \mathfrak {T}(G,M,B,Y)\) there exists a box OACtricluster \(T=(A,B,C) \in \mathbf {T}_{\square }(G,M,B,Y)\) such that \(A_c \subseteq A,B_c \subseteq B, C_c \subseteq C\).
3.3 Prime operator applied to pairs
The second author of the paper proposed Prime OACtriclustering which extends the biclustering method from Ignatov et al. (2012) to the triadic case. It uses prime operators (Eq. 6) to generate triclusters.
Definition 4
Suppose \(\mathbb {K}=(G,M,B,I)\) is a triadic context. For a triple \((g,m,b)\in I\) a triple \(T=\left( (m,b)^\prime ,(g,b)^\prime ,(g,m)^\prime \right) \) is called a prime operator based OACtricluster. Its components are called respectively extent, intent, and modus.
A similar property holds for the prime based OACtriclusters:
Proposition 2
Let \(\mathbb {K}=(G,M,B,Y)\) be a triadic context and \(\rho _{min}=0\). For every \(T_c=(A_c,B_c,C_c) \in \mathfrak {T}(G,M,B,Y)\) there exists a prime OACtricluster \(T=(A,B,C) \in \mathbf {T}_{\prime }(G,M,B,Y)\) such that \(A_c \subseteq A,B_c \subseteq B, C_c \subseteq C\).
3.4 Tricluster generating algorithms
3.4.1 OACtriclustering based on box operators
The idea of box OACtriclustering is to enumerate all triples of the ternary relation \(I\) for a context \(\mathbb {K}\) generating a box operator based tricluster for each. If generated tricluster \(T\) was not added to the set of all triclusters \(\mathcal {T}_{\Box }\) on previous steps, then \(T\) is added to \(\mathcal {T}_{\Box }\). It is possible to implement hash functions for triclusters in order to significantly decrease computation time by simplifying the comparison of triclusters. A minimal density threshold can be used as well.
Proposition 3
For a given formal context \(\mathbb {K}=(G,M,B,I)\) and \(\rho _{min}\ge 0\) the largest number of box OACtriclusters is equal to \(I\); all OACtriclusters can be generated in time \(O(I\cdot (MB+GB+GM))\) if \(\rho _{min}= 0\) or \(O(IGMB)\) if \(\rho _{min}>0\).
Note that a postprocessing step of elimination of duplicate triclusters would require an additional time \(Ilog(I)\) to be added to the time estimates in the Proposition 3.
3.4.2 OACtriclustering based on primes of pairs
To avoid duplicate tricluster generation we suggest the usage of hash functions.
A similar property can be proved.
Proposition 4
For a given formal context \(\mathbb {K}=(G,M,B,I)\) and \(\rho _{min}\ge 0\) the largest number of box OACtriclusters is equal to \(I\); all prime OACtriclusters can be generated in time \(O(I\cdot (G+M+B))\) if \(\rho _{min}= 0\) or \(O(IGMB)\) if \(\rho _{min}> 0\).
So, from the time complexity point of view the prime OACtriclustering may have an advantage in comparison with the box OACtriclustering at \(\rho _{min}=0\).
4 Approximate triclusters: TriBox method
4.1 Individual tricluster approximation model
The TriBox method (Mirkin and Kramarenko 2011) implements an optimization approach for tricluster generation. Suppose \(\mathbb {K}=(G,M,B,I)\) is a triadic context. The idea is to select some triple of \(I\), take it for the initial tricluster, and then to modify its extent, intent, and modus so that they covered a significant part of the context while maintaining high density. TriBox aims at finding a set of triclusters \(\mathcal {T}=\{T_t=(X_t,Y_t,Z_t)\}\) that maximize criterion 12.
 1.
\(\lambda _t\) is a parameter (some measure for the tricluster \(\mathcal {T}_t\))
 2.
\([(g, m, b) \in X_t \times Y_t \times Z_t]\) equals to \(1\), if \((g, m, b) \in X_t \times Y_t \times Z_t\) is true, and to \(0\) otherwise
 3.
\(\lambda _0\) is a constant, \( 0 \le \lambda _{0} \le 1\), plays the role of an intercept in linear data models
 4.
\(\varepsilon _{g,m,b}\) is a residual
4.2 Equivalent criterion and parameters
Its optimization will lead to \(\lambda =\rho (T)  \lambda _0\) (density of \(T\) minus \(\lambda _0\)). Therefore, \(f(T)\) in criterion (12) is a particular case of \(g(T)\) when \(\lambda _0=0\).
4.3 Local optimization: TriBox method
Fitting model (14) can be done by applying algorithm TriBox starting from each of the triples and retaining only different and most contributing solutions. Let us remind that the contribution of a box bicluster is but the value of criterion (19).
At \(\lambda _0=0\) the value of \(\lambda \) can be interpreted as a box tricluster density. The resulting box tricluster is provably rather contrast:
Proposition 5
If box tricluster \((X,Y,Z)\) is found with the TriBox algorithm then, for any entity outside the box, its average density on the two counterpart entity sets from \(\{X, Y, Z\}\), is less than the half of the withinbox density \(\lambda \); in contrast, for any entity belonging to the box, its average density on the counterpart entity sets is greater than or equal to the half of the withinbox density \(\lambda \).
Proposition 6
For a given formal context \(\mathbb {K}=(G,M,B,I)\) and \(\lambda _0=0\) the largest number of box triclusters found by TriBox is equal to \(I\), all box triclusters can be generated in time \(O(I\cdot ((G+M+B)^2GMB))\).
Once again, by using hash functions to avoid duplicate triclusters an additional time complexity item \(O(IlogI)\), for the last loop, should be added.
In comparison to Trias, and box and prime OACtriclustering, Tribox is able to work with realvalued data without sufficient modifications (one only needs to change the initialization step at line 3).
5 Spectral approach extended to triclustering: SpecTric method
5.1 Adjacency matrix and Laplace transformation for the triadic hypergraph
Spectral triclustering method (Ignatov et al. 2013) is based on the spectral graph partition approach. The idea is to represent the given triadic context as a tripartite graph and then recursively divide it into partitions minimizing some objective function through the solution of a corresponding eigenvalue problem. To find an optimal partitioning spectral clustering uses the second smallest eigenvector of the Laplacian matrix (Fiedler 1973).
Let us elaborate on this technique. Suppose \(\mathbb {K}=(G,M,B,I)\) is a triadic context. First we need to transform \(\mathbb {K}\) into tripartite graph \(\varGamma =\langle V,E\rangle \). Since \(I\) is a ternary relation it is only possible to represent \(\mathbb {K}\) as a tripartite hypergraph without the information loss. The following transformation technique is considered: \(V:=G\sqcup M\sqcup B\), for each triple \((g,m,b)\in I\) edges \(\{g,m\}\), \(\{g,b\}\) and \(\{m,b\}\) are added to \(E\) to form an undirected nonweighted tripartite graph with the adjacency matrix \(\mathbf {A}\).
As the result some additional triples will be added to \(I\) after inverse transformation. However, these triples will be added only in “dense” areas of \(I\) thus possibly filling missing values and “smoothing” tricontext for methods aiming at finding formal triconcepts. Thus this technique is acceptable for the problem.
The second minimal eigenvector of \(L\) is an optimal solution to a relaxed version of the optimal partition problem for \(\varGamma \) (finding the minimal set \(\tilde{E}\subseteq E\) so that the graph \(\tilde{\varGamma }=(V,E\setminus \tilde{E})\) is not connected). The sign of each component of this vector indicates one of the 2 new connected components. The solution vector \(v\) is used to partition the graph by placing the nodes with greater than zero \(v_i\) values into one partition and those with less than zero values into another.
Every partition can then be recursively split by solving a new eigenvalue problem for the corresponding submatrix.
Also, some minimum size constraint can be used to avoid too deep partitioning. Since spectral triclustering is not able to generate the same tricluster more than once, it is not necessary to use hash functions to avoid duplicates.
5.2 The spectral triclustering algorithm
The pseudocode for SpecTric is provided (Algorithm 7).
 1.
\(C_{void}\) constraint: in the tricluster \(T=(X,Y,Z)\) corresponding to \(\mathbf {A}\) at least one of the parts, extent, intent or modus is empty
 2.
\(C_{\lnot size}\) constraint: \(Size(T) < s_{min}\), where \(Size(X,Y,Z)=\frac{X+Y+Z}{G+M+B}\) or the other tricluster sizerelated measure.
The method recursively splits the input graph (matrix, context) into two parts, checks the constraints for both parts, if one of them is false, then the previous subgraph (submatrix, subcontext) is added as a tricluster to \(\mathcal {T}\) and the corresponding branch is cancelled.
The standard matrix diagonalization methods require \(O(V^3)\) operations, where \(V\) is the number of nodes in the graph, and impractical for large datasets. However, we can take advantage of the sparsity of the graph using iterative methods (Lanczoc or Arnoldi algorithms Golub and van Loan 1989), especially since only one vector should be computed. The complexity of Lanczos type algorithm is only \(O(k E)\), where \(E\) is the number of edges in the graph and \(k\) is the number of iterations required for the convergence (see Shi and Malik 2000; Golub and van Loan 1989). In practice, usually \(k \ll \sqrt{V}\).
Taking this into account, the worst case complexity of the SpecTric algorithm vary from \(O(kEV)\) to \(O(EV^3)\), or in terms of formal tricontext entities, from \(O(kI(G+M+B))\) to \(O(I(G+M+B)^3)\), depending on eigenvalue problem solver and data sparsity. Since we deal with a recursive partition scheme, the number of generated triclusters cannot be greater than \(I\), however, in the worst case, the number of cuts performed is \(I1\) since SpecTric cannot guarantee equally sized triclusters at each split.
6 Criteria for evaluation of triclusters
6.1 Criteria for cluster sets
To evaluate the quality of the whole tricluster collection obtained by a triclustering method, we propose using the following four criteria: the number of triclusters, average density, coverage and the diversity.
Cardinality and Density For a given tricluster collection \(\mathcal {T}\) cardinality is trivially the number of its members \(\mathcal {T}\). The average collection density is \(\rho _{av}(\mathcal {T})=\frac{1}{\mathcal {T}}\sum \nolimits _{T\in \mathcal {T}}\rho {(T)}\).
Diversity is an important measure in Information Retrieval for diversified search results and in Machine Learning for ensemble construction (Tsymbal et al. 2005).
Coverage is defined as a fraction of the triples of the context (alternatively, objects, attributes or conditions) included in at least one of the triclusters of the resulting set.
6.1.1 Complexity of an optimal tricluster set
The discrete optimization task of “finding an optimal tricluster solution” can be formalized in the following way:

\((1) \quad \forall T \in \mathcal {T}_{cov}: \rho (T)\ge \rho _{min}, \)

\((2) \quad \forall (g,m,b) \in I \quad \exists (X,Y,Z) \in \mathcal {T}_{cov}: (g,m,b)\in X\times Y\times Z\)

or

\((2') \quad coverage(\mathcal {T}_{cov})\ge \alpha , \text{ where } 0 \le \alpha \le 1,\)

\((3) \quad \forall (X,Y,Z) \in \mathcal {T}_{cov}: X\ge minsup_{G}, Y\ge minsup_{M}, Z\ge minsup_{B}.\)
 1.
To devise an algorithm that tries to find directly an optimal triclustering solution (w.r.t. a particular tricluster definition) for a given tricontext.
 2.
To reduce the resulting collection obtained by one of the triclustering methods to some of its subsets from the corresponding Pareto set.
Assume that we already have a tricluster collection \(\mathcal {T}\) for a given tricontext \(\mathbb {K}\) obtained by some of the discussed triclustering techniques. We can also assume that the triclusters are dense and large enough, but their collection has an excessive size because of triclusters overlapping and can be reduced without violation condition (2). For the sake of simplicity we omit the second optimized criteria, Diversity of the tricluster collection, thus coming to the problem of optimal tricluster cover, the computational complexity of which is discussed below.
Let us recall some decision problems and introduce auxiliary constructions.
A vertex cover of a graph \(\varGamma =(V,E)\) is a subset of vertices \(V_1 \subseteq V\) such that for every edge \((u,v) \in E\) we have \(v \in V_1\) and/or \(u \in V_1\).
Definition 5
For an arbitrary graph \(\varGamma =(V,E)\) the associated bipartite graph is a graph \(\varDelta =(X \cup Y, E_1)\), where \(X=V\), vertices from \(X\) are in onetoone correspondence to vertices from \(V\) and vertices from \(Y\) are in onetoone correspondence to edges from \(E\); \((x_i,y_j) \in E_1\) if the vertex \(v_i \in V\) is incident to the edge \(e_j \in E\).
We say that in a bipartite graph \(\varDelta =(X \cup Y, E)\) a set of vertices \(X_1 \subseteq X\) dominates vertices from \(Y_1 \subseteq Y\) if each vertex from \(Y_1\) is adjacent to a vertex from \(X_1\).
Lemma 1
Let \(\varGamma =(V,E)\) be a graph and \(\varDelta \) be its associated bipartite graph. \(\varGamma \) has a vertex cover of size \(k\) iff in the bipartite graph \(\Delta \) there is a pair \((Z,Y)\), where \(Z \subseteq X\), \(Z\) dominates vertices from \(Y\) and \(Z=k\).
Proof
The proof directly follows from the construction of the graph \(\varDelta \). \(\square \)
Definition 6
Tricluster bipartite cover graph corresponding to a bipartite graph \(\varDelta =(X \cup Y, E_1)\) is the graph \(\Theta (\Delta )=(\mathcal {T} \cup I,J)\), where all triclusters from \(\mathcal {T}\) are in in onetoone correspondence with vertices from \(X\), all triples (\(g,m,b) \in I\) are in onetoone correspondence with vertices from \(Y\) and \((T,(g,m,b)) \in J\) if \((T, y_{(g,m,b)}) \in E_1\). The internal tricluster structure \(T=(G_T,M_T,B_T)\) is defined by adjacent edges in \(J\), i.e. \(\forall (g,m,b) \in I \exists T \in \mathcal {T}: (g,m,b) \in G_T \times M_T \times B_T\).
Without loss of generality let \(\mathcal {T}=X\).
Lemma 2
 1.
There is a pair \((Z,Y)\) of sets of vertices of graph \(\varDelta \) such that \(Z \subseteq X\) and \(Z\) dominates all vertices from \(Y\).
 2.
\(Z\) is a tricluster cover of the tricluster bipartite cover graph \(\varTheta (\varDelta )\), i.e. for every \((g,m,b)\in I\) there exist \(T=(G_T,M_T,B_T)\in Z\) such that \((g,m,b) \in G_T\times M_T\times B_T\).
Proof
The proof directly follows from the construction of the graph \(\varTheta (\varDelta )\). \(\square \)
Theorem 1

Instance: Triadic context \(\mathbb {K}=(G,M,B,I)\), tricluster bipartite cover graph \(\varTheta =(\mathcal {T},I,J)\), and positive integer \(k\).

Question: Does there exist a tricluster cover \(\mathcal {T}_{cov}\subseteq \mathcal {T}\) such that \(\mathcal {T}_{cov}\le k\)?
Proof
The problem obviously belongs to NP. For each potential solution, i.e., a subset of triclusters \(S\subseteq \mathcal {T}\), one needs to check whether each \((g,m,b)\) from \(I\) belongs to at least one tricluster \(T\) from \(\mathcal {T}\) and the size of \(\mathcal {T}\) is less or equal to \(k\). The first condition can be verified within \(O(I\cdot \mathcal {T} \cdot (G+ M+ B))\) using tricontext or within \(O(I\cdot \mathcal {T})\) using \(\varTheta \).

Instance: Graph \(\varGamma =(G,V)\), positive integer \(k \le V\)

Question: Does there exist a set \(W\subseteq V\) such that \(W\le k\) and \(v \in W\) or \(u \in W\) for each \(e=(u,v)) \in E\)?
Theorem 2
The following problem “the number of all minimal tricluster covers” is #Pcomplete.
Proof

Input: Graph \(\varGamma =(V,E)\).

Output: \(\#\{W \in V{\,\,}((u,v) \in E))\rightarrow (u \in A) \vee (v \in A)) \text{ holds } \text{ for } A= W \text{ but } \text{ not } \text{ for } \text{ any } A \subset W\}.\)
6.2 Criteria for algorithms
Time complexity of the algorithms
Algorithm  Time complexity  Comment 

OAC (\(\square \))  \(O(I\cdot (MB+GB+GM))\)  \(\rho _{min}=0\) 
\(O(IGMB)\)  \(\rho _{min}>0\)  
OAC (\(\prime \))  \(O(I\cdot (G+M+B))\)  \(\rho _{min}=0\) 
\(O(IGMB)\)  \(\rho _{min}>0\)  
SpecTric  \(O(kI(G+M+B))\)  For Lancsoz and Arnoldi algorithm on sparse data \(k \ll G+M+B\) 
\(O(I(G+M+B)^3)\)  For general diagonalization methods  
TriBox  \(O(I\cdot (G+M+B)^2GMB)\)  
Trias  \(O(G^2M^3B^2L_{\tilde{I}}maxL_{J})\)  The upper bound values of \(L_{\tilde{I}}\) and \(L_{J}\) are \(2^{\min \{G,MB\}}\) and \(2^{\min \{M,B\}}\). However, it is a rare case in practice since the data are usually sparse 
7 Selection of triadic datasets for experiments
Contexts for the experiments with 5 chosen evaluation measures
Context  \(G\)  \(M\)  \(B\)  # Triples  Density 

Uniform  30  30  30  2660  0.0985 
Gaussian  30  30  30  3604  0.1335 
IMDB  250  795  22  3818  0.00087 
BibSonomy  51  924  2844  3000  0.000022 
Mobile  16  113  20  1225  0.0339 
7.1 Real datasets
Mobile operators We select 16 mobile operators with maximal revenue^{1}. As attributes we consider countries where a particular mobile operator acts. A network type (technology) is chosen as a condition. Thus, each triple in the dataset has the following structure: “operator”, “country”, “technology”.
Movies We compose a context of top 250 popular movies from www.imdb.com, objects are movie titles, attributes are tags, whereas conditions are genres.
Bibsonomy We selected a random sample of 3000 of the first 100,000 triples of the bibsonomy.org dataset, objects are users, attributes are tags, and conditions are bookmark names. The Bibsonomy resource sharing system was developed for collecting, organising, and sharing bookmarks and publications and relies on folksonomy as a data structure.
7.2 Synthetic datasets
Noised contexts
Context  # Triples  Density 

\(p=0\)  3000  0.1111 
\(p=0{.}1\)  5069.6  0.1873 
\(p=0{.}2\)  7169.4  0.2645 
\(p=0{.}3\)  9290.2  0.3440 
\(p=0{.}4\)  11,412.8  0.4222 
\(p=0{.}5\)  13,533.4  0.5032 
Random uniform triple generation Let \(\mathbb {K}=(G,M,B,I)\) be an initial tricontext where \(I=\emptyset \). Assume that all triples in \(I\) are uniformly generated with probability 0.1, i.e. we produce a uniform context of size \(30 \times 30 \times 30\) such that \(\forall (g,m,b)\,P((g,m,b)\in I) = 0{.}1\).
8 Experimental comparison of the methods
The report of experimental results with graphs and tables is given in Sect. 8.1. The methodbymethod and overall discussion of the results with the examples of found triclusters is provided in Sect. 8.2.
8.1 Experimental results
All the methods have been implemented by the authors and incorporated into a single triclustering toolbox. The toolbox has been implemented in C# using MS Visual Studio 2010/2012. All the experiments have been performed on Windows 7 SP1 x64 system equipped with an Intel Core i72600 @ 3.40GHz processor and 8 GB of RAM. AlgLib^{2} library was used for performing eigenvalue decomposition.
It is clear that every method has managed to successfully find initial cuboids, but the results quickly deteriorate for most of methods with the growth of inversion probability. TriBox has shown the best results as it tries to optimize the densityvolume tradeoff (which most probably is the best for the areas of the former cuboids with small error probability). Though prime OACtriclustering has been also rather noisetolerant, it generated significantly more triclusters (most likely the high number of triclusters is the reason for these results). All the other methods have been unable to provide significant results for noisy contexts. Moreover, as it was expected, no adequate triclusters were generated by any of the methods for the inversion probability \(0{.}5\) contexts.
Results of the experiments on the computation time (\(t\), ms), triclusters count (\(n\)), density (\(\rho \), %), coverage (\(Cov\), %), and diversity (\(Div\), %)
Algorithm  \(t\)  \(t_{par}\)  \(n\)  \(\rho _{av}\)  \(Cov\)  \(Div\)  \(Div_G\)  \(Div_M\)  \(Div_B\) 

Uniform random context  
OAC (\(\square \))  407  196  73  10  100  0  0  0  0 
OAC (\(\prime \))  312  877  2659  32  100  93  60  60  60 
SpecTric  277  –  5  9  9  100  100  100  100 
TriBox  6218  1722  1011  74  96  97  66  80  85 
Trias  29,367  –  38,356  100  100  \(\approx \)100  \(\approx \)100  4  4 
Nonuniform random context (Gaussiangenerated)  
OAC (\(\square \))  334  135  276  14  100  \(\approx \)100  0  0  0 
OAC (\(\prime \))  382  1391  3604  38  100  59  33  22  33 
SpecTric  131  –  7  24  2  100  100  100  100 
TriBox  1,128,449  309,640  77  92  40  98  71  86  87 
Trias  130,013  –  16,685  100  100  99  96  18  14 
IMDB  
OAC (\(\square \))  2314  1573  1500  2  100  16  10  1  8 
OAC (\(\prime \))  547  2376  1274  54  100  97  95  92  29 
SpecTric  98,799  –  21  17  21  100  100  100  100 
TriBox  197,136  55,079  328  92  99  99  99  95  31 
Trias  102,554  –  1956  100  100  \(\approx \)100  \(\approx \)100  53  26 
BibSonomy  
OAC (\(\square \))  19297  6803  398  4  100  80  67  43  80 
OAC (\(\prime \))  13,556  9400  1289  9466  100  \(\approx \)100  89  \(\approx \)100  \(\approx \)100 
SpecTric  5,906,563  –  2  50  100  100  100  100  100 
TriBox  \(>\)24 h  
Trias  110,554  –  1305  100  100  \(\approx \)100  92  \(\approx \)100  \(\approx \)100 
 1.
OACtriclustering: \(\rho _{min}=0\)
 2.
SpecTric: \(s_{min}=0\)
 3.
TriBox: \(\lambda _0=\rho (\mathbb {K})\)
 4.
Trias: \(\tau _G=\tau _M=\tau _B=0\)
Results of the experiments on mobile operators dataset
Algorithm  Param.  \(t\), ms  \(n\)  \(Cov\)  \(Cov_G\)  \(Cov_M\)  \(Cov_B\)  \(Div\)  \(Div_G\)  \(Div_M\)  \(Div_B\)  \(\rho _{av}\) 

OAC (\(\square \))  0  470  173  100  100  100  100  5  4  1  0  15 
0.2  1365  39  86  94  83  100  50  47  18  0  39  
0.4  1373  9  41  63  42  100  81  78  53  0  70  
0.6  1363  5  35  50  42  100  100  100  70  0  88  
0.8  1366  3  32  19  41  70  100  100  33  0  100  
1  1371  3  32  19  41  70  100  100  33  0  100  
OAC (\(\prime \))  0  180  133  100  100  100  100  62  57  36  0  56 
0.2  128  133  100  100  100  100  62  57  36  0  56  
0.4  93  100  100  100  100  100  71  66  43  0  63  
0.6  95  37  100  100  100  100  84  83  60  0  83  
0.8  98  18  100  100  100  100  97  97  65  0  99  
1  93  16  100  100  100  100  99  99  63  0  100  
SpecTric  0  351  8  16  100  100  100  100  100  100  100  67 
0.2  37  7  17  100  100  100  100  100  100  100  58  
0.4  40  3  38  100  100  100  100  100  100  100  14  
0.6  33  3  38  100  100  100  100  100  100  100  14  
0.8  26  2  54  100  100  100  100  100  100  100  8  
1  3  1  100  100  100  100  NaN  NaN  NaN  NaN  3  
TriBox  –  73,077  24  96  81  96  90  71  66  45  0  80 
Trias  \(\langle 0, 0, 0\rangle \)  519  100  100  100  100  100  94  85  58  2  100 
Reading the pairwise graphs on the triclustering results for Mobile operators dataset (Fig. 4) one can conclude that there is no winning approach. However, these graphs make it possible to find a suboptimal solution. One can see that points for OAC(\(\prime \)) are at the top right corner of each diagram and this is not the case for any of the rest algorithms, thus Trias looses in the triclusters number. There is another suboptimal approach, TriBox, since its points are close to the top right corner for all graphs. Guided by this pairwise plots and the idea which of the quality measures are most important, an analyst can conclude which method is the most suitable for her dataset.
The graphs for synthetic dataset also show that there is no a winning approach. However, for uniform triple generation scheme one can conclude that Trias is the best one with respect to three criteria, \(Diversity\), \(Coverage\) and \(Density\). Moreover, it is possible to see the tradeoff between \(Density\) and \(Coverage\) for OAC(\(\prime \)). The weakness of Spectric is revealed: it has low density, but the rest measures are of high value. OAC(\(\square \)) has extremely poor diversity. Similarly for Gaussian triple generation scheme, Trias found the best patterns with respect to \(Diversity\), \(Coverage\) and \(Density\). One more tradeoff appears for OAC(\(\prime \)) between \(Diversity\) and \(Coverage\). The drawbacks of OAC(\(\square \)) and Spectric remain the same.
For the IMDB dataset Trias again produces highly diverse, absolutely dense and patterns of 100 % \(Coverage\), but the number of patterns is too high for analysis. Two suboptimal solutions are OAC(\(\prime \)) and TriBox. It is beneficially that for OAC(\(\prime \)) there is no tradeoff between \(Cardinality\) and \(Diversity\), and \(Density\) and \(Coverage\).
The Bibsonomy dataset is the biggest one and experiencing intrinsic noise of tagging procedure, therefore it is not a surprise that Trias and OAC(\(\prime \)) discovered many patterns. For OAC(\(\prime \)) it is possible to reach less number of patterns than Trias produces keeping the best level of \(Diversity\) and \(Coverage\). An analyst may play with OAC(\(\square \)) density to find the balance between \(Density\) and \(Coverage\) or \(Coverage\) and \(Diversity\) if she needs less patterns than for the preceding two suboptimal methods.
8.2 Discussion of the results
Trias is one of the most time consuming algorithms considered in the paper, along with TriBox and SpecTric, for large contexts. Thus on the pairwise criteria graphs, the Trias point lies at the right upper corner of three plots (a), (c), (e) and it is close to the origin at the axis \(Cardinality\) for the other three. Although each of the resulting triclusters (triconcepts) can be easily interpreted, their number and small sizes make it difficult to see the general structure of the dataset. Since all of the triconcepts have been generated so that every triple has been covered, the coverage is equal to \(1\). Because the concepts are small, the general diversity is rather high. Still, the set diversity depends on the size of the corresponding set: the smaller the set, the greater chance of intersection and the lower the diversity.
 1.
{The Princess Bride (1987), Pirates of the Caribbean: The Curse of the Black Pearl (2003)}, {Pirate}, {Fantasy, Adventure}
 2.
{V for Vendetta (2005)}, {Fascist, Terrorist, Government, Secret Police , Fight}, {Action, SciFi, Thriller}
 1.
\(\rho =23{.}08\,\%\), {Alien (1979), The Shining (1980), The Thing (1982), The Exorcist (1973)}, {Spaceship, Egg, Parasite, Creature, Caretaker, Colorado, Actress, Blood, Helicopter, Scientist, Priest, Washington D.C., Faith}, {Horror}
 2.
\(\rho =2{.}09\,\%\), {The Shawshank Redemption (1994), The Godfather (1972), The Godfather: Part II (1974), ..., Bonnie and Clyde (1967), Arsenic and Old Lace (1944)}, {Prison, Cuba, Business, 1920s, ..., Texas, Cellar}, {Crime, Thriller }
 1.
\(100\,\%\), {Million Dollar Baby (2004), Rocky (1976), Raging Bull (1980)}, {Boxer, Boxing}, {Drama, Sport}
 2.
\(83{.}33\,\%\), {The Sixth Sense (1999), The Exorcist (1973), The Silence of the Lambs (1991)}, {Psychiatrist}, {Drama, Thriller}
 3.
\(33{.}33\,\%\), {Platoon (1986), All Quiet on the Western Front (1930), Glory (1989), Apocalypse Now (1979), Lawrence of Arabia (1962), Saving Private Ryan (1998), Paths of Glory (1957), Full Metal Jacket (1987)}, {Army, General, Jungle, Vietnam, Soldier, Recruit}, {Drama, Action, War}
 1.
\(0{.}9\,\%\), {The Shawshank Redemption (1994), The Godfather (1972), Ladri di biciclette (1948), Unforgiven (1992), Batman Begins (2005), Die Hard (1988), ..., The Green Mile (1999), Sin City (2005), The Sting (1973)}, {Prison, Murder, Cuba, FBI, Serial Killer, Agent, Psychiatrist,..., Window, Suspect, Organized Crime , Revenge, Explosion, Assassin, Widow}, {Crime, Drama, SciFi, Fantasy, Thriller, Mystery}
 2.
\(1{.}07\,\%\), {The Great Escape (1963), Star Wars: Episode VI  Return of the Jedi (1983), Jaws (1975), Batman Begins (2005), Blade Runner (1982), Die Hard (1988),..., Metropolis (1927), Sin City (2005), Rebecca (1940)}, {Prison, Murder, Cuba, FBI, Serial Killer, Agent, Psychiatrist,..., Shower, Alimony, Phoenix Arizona, Assassin, Widow}, {Drama, Thriller, War}
 1.
\(56.67\,\%\), {The Godfather: Part II (1974), The Usual Suspects (1995)}, {Cuba, New York, Business, 1920s, 1950s}, {Crime, Drama, Thriller}
 2.
\(60\,\%\), {Toy Story (1995), Toy Story 2 (1999)}, {Jealousy, Toy, Spaceman, Little Boy, Fight}, {Fantasy, Comedy, Animation, Family, Adventure}
9 Conclusion
In this paper, we presented a general view of triclustering for binary triadic datasets unifying formal triconcepts, densitybased heuristics and approximation frameworks. In addition to the conventional computation time criterion, we presented a set of evaluation criteria for the results, oriented at finding interpretable solutions. These criteria—density, coverage, diversity, noise tolerance, and the cardinality—represent different aspects of the interpretability. The cardinality is of an issue because the number of triclusters should correspond to the structure of the dataset under investigation—but this is usually unknown. We cannot help but refer the reader to an analogous issue of “the right number of clusters” in a conventional setting, which found no reasonable solution as yet. We took a number of triclustering algorithms developed by the authors, including a novel algorithm OACPrime, and a representative formal triconcept finding algorithm Trias, and presented a number of theoretical results to explore their efficiency and allow making them more efficient in some cases. We designed a comprehensive experimental testing framework including a rich structure and noise generating setup.
The investigation of resource efficiency of the proposed methods proves that OACbox, OACprime, Tribox, and SpecTric have polynomial computational time in the input size, and the number of output patterns is no more than the number of triples in the input data. This contrasts the fact that formal triconcept Trias algorithm has its worst computation time exponential as well as the number of triconcepts. Yet the experimentation on both synthetic and real data shows that there is no one winning method according to the introduced criteria. For example, maximally dense patterns with maximal coverage found with Trias, impose a less than optimal diversity and a very large number of output patterns. The multicriteria choice allows an expert to decide which of the criteria are most important in a specific case and make a choice. Overall, our experiments show that our Tribox and OACprime algorithms can be reasonable alternatives to triadic formal concepts and lead to Paretoeffective solutions. Although TriBox is better with respect to noisetolerance and the number of clusters, OACprime is the best on scalability to large realworld datasets.

developing a unified theoretical framework for \(n\)clustering,

finding bridges between probabilistic (Meulders et al. 2002) and algebraic approaches,

combining several constraintbased approaches to triclustering (e.g., mining dense triclusters first and then frequent trisets in them),

finding better approaches for estimating the tricluster density,

taking into account features of realworld data in optimization procedures (their sparsity, value distribution, etc.) and online data processing,

using different bicluster approaches to extend them to triadic data,

shifting to arbitrary numeric or interval datasets from the binary case [continuing the work (Kaytoue et al. 2014)],

applying triclustering in recommender systems and social network analysis.
Footnotes
 1.
The information was collected from open sources on the Internet. Supplementary materials and several datasets are available at http://bit.ly/triMLData.
 2.
Notes
Acknowledgments
We would like to thank our colleagues Jonas Poelmans, Leonid Zhukov, Svetlana Vasukova. Special thanks for help in coding of triclustering algorithms go to former students Andrey Kramarenko (TriBox), Ruslan Magizov (box OACtriclustering) and Zarina Sekinaeva (SpecTric). We especially thankful for useful discussions and suggestions to JeanFraçois Boulicaut and his research group, Radim Belohlávek, Loïc Cerf, Vincent Duquenne, Bernhard Ganter, Bart Goethals, Robert Jäschke, Mehdi Kaytoue, Rokia Missaoui, Amedeo Napoli, Engelbert Mephu Nguifo, Lhouari Nourine, Mykola Pechenizkiy, Arno Siebes, Dominik Ślȩzak, David Barber and Saira Mian. The study was conducted in the framework of the Basic Research Program at the National Research University Higher School of Economics in 20132015 and in the Laboratory of Intelligent Systems and Structural Analysis. This work was also partially supported by Russian Foundation for Basic Research, grant no. 130700504.
References
 Asses, Y., Buzmakov, A., Bourquard, T., Kuznetsov, S. O., & Napoli, A. (2012). A hybrid classification approach based on FCA and emerging patterns—an application for the classification of biological inhibitors. In Proceedings of the 9th international conference on concept lattices and their applications, pp. 211–222.Google Scholar
 Banerjee, A., Dhillon, I. S., Ghosh, J., Merugu, S., & Modha, D. S. (2007). A generalized maximum entropy approach to Bregman coclustering and matrix approximation. Journal of Machine Learning Research, 8, 1919–1986.MathSciNetMATHGoogle Scholar
 Barkow, S., Bleuler, S., Prelic, A., Zimmermann, P., & Zitzler, E. (2006). BicAT: a biclustering analysis toolbox. Bioinformatics, 22(10), 1282–1283.CrossRefGoogle Scholar
 Belohlávek, R., & Vychodil, V. (2010). Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences, 76(1), 3–20.MathSciNetCrossRefMATHGoogle Scholar
 Belohlávek, R., Baets, B. D., Outrata, J., & Vychodil, V. (2009). Inducing decision trees via concept lattices. International Journal of General Systems, 38(4), 455–467.MathSciNetCrossRefMATHGoogle Scholar
 Belohlávek, R., Glodeanu, C., & Vychodil, V. (2013). Optimal factorization of threeway binary data using triadic concepts. Order, 30(2), 437–454.MathSciNetCrossRefMATHGoogle Scholar
 Belohlávek, R., Outrata, J., & Trnecka, M. (2014). Impact of boolean factorization as preprocessing methods for classification of boolean data. Annals of Mathematics and Artificial Intelligence, 72(1–2), 3–22.MathSciNetCrossRefMATHGoogle Scholar
 Benz, D., Hotho, A., Jäschke, R., Krause, B., Mitzlaff, F., Schmitz, C., et al. (2010). The social bookmark and publication management system Bibsonomy—A platform for evaluating and demonstrating web 2.0 research. VLDB Journal, 19(6), 849–875.CrossRefGoogle Scholar
 Besson, J., Robardet, C., Boulicaut, J. F., & Rome, S. (2005). Constraintbased concept mining and its application to microarray data analysis. Intelligent Data Analysis, 9(1), 59–82.Google Scholar
 Biedermann, K. (1998). Powerset trilattices. Conceptual structures: Theory, tools and applications, LNCS (Vol. 1453, pp. 209–221). Berlin: Springer.CrossRefGoogle Scholar
 Blinova, V. G., Dobrynin, D. A., Finn, V. K., Kuznetsov, S. O., & Pankratova, E. S. (2003). Toxicology analysis by means of the JSMmethod. Bioinformatics, 19(10), 1201–1207.CrossRefGoogle Scholar
 Buzmakov, A., Egho, E., Jay, N., Kuznetsov, S.O., Napoli, A., & Raïssi, C. (2013). On projections of sequential pattern structures (with an application on care trajectories). In: Proceedings of the 10th international conference on concept lattices and their applications, pp. 199–208.Google Scholar
 Carpineto. C., & Romano, G. (1993). Galois: An ordertheoretic approach to conceptual clustering. In: Proceeding of ICML93, Amherst, (pp. 33–40).Google Scholar
 Carpineto, C., & Romano, G. (1996). A lattice conceptual clustering system and its application to browsing retrieval. Machine Learning, 24, 95–122.Google Scholar
 Carpineto, C., & Romano, G. (2005). Concept data analysis—theory and applications. New York: Wiley.MATHGoogle Scholar
 Carpineto, C., Michini, C., & Nicolussi, R. (2009). A concept latticebased kernel for SVM text classification. In: ICFCA 2009, (vol LNAI 5548, pp. 237–250). Berlin: Springer.Google Scholar
 Cerf, L., Besson, J., Robardet, C., & Boulicaut, J. F. (2009). Closed patterns meet nary relations. ACM Transactions on Knowledge Discovery from Data, 3, 3:1–3:36.CrossRefGoogle Scholar
 Cerf, L., Besson, J., Nguyen, K. N., & Boulicaut, J. F. (2013). Closed and noisetolerant patterns in nary relations. Data Mining and Knowledge Discovery, 26(3), 574–619.MathSciNetCrossRefMATHGoogle Scholar
 Cimiano, P., Hotho, A., & Staab, S. (2005). Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research, 24, 305–339.MATHGoogle Scholar
 Dhillon, I. S. (2001). Coclustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, KDD’01, pp. 269–274.Google Scholar
 DiMaggio, P. A., Subramani, A., Judson, R. S., & Floudas, C. A. (2010). A novel framework for predicting in vivo toxicities from in vitro data using optimal methods for dense and sparse matrix reordering and logistic regression. Toxicological Sciences, 118(1), 251–265.CrossRefGoogle Scholar
 du BoucherRyan, P., & Bridge, D. G. (2006). Collaborative recommending using formal concept analysis. KnowledgeBased Systems, 19(5), 309–315.CrossRefGoogle Scholar
 Duquenne, V. (1996). Lattice analysis and the representation of handicap associations. Social Networks, 18(3), 217–230.CrossRefGoogle Scholar
 Eklund, P., Ducrou, J., & Dau, F. (2012). Concept similarity and related categories in information retrieval using Formal Concept Analysis. International Journal of General Systems, 41(8), 826–846.MathSciNetCrossRefGoogle Scholar
 Eren, K., Deveci, M., Kucuktunc, O., & Catalyurek, U. V. (2013). A comparative analysis of biclustering algorithms for gene expression data. Briefings in Bioinformatics, 14(3), 279–292.Google Scholar
 Fiedler, M. (1973). Algebraic connectivity of graphs. Czechosloval Mathematical Journal, 23(98), 298–305.MathSciNetMATHGoogle Scholar
 Freeman, L. C. (1996). Cliques, Galois lattices, and the structure of human social groups. Social Networks, 18, 173–187.CrossRefGoogle Scholar
 Fu, H., Fu, H., Njiwoua, P., & Nguifo, E. M. (2004). A comparative study of FCAbased supervised classification algorithms. In: Proceedings of 2nd International Conference on Formal Concept Analysis, ICFCA 2004, Sydney, Australia, February 23–26, 2004, pp. 313–320.Google Scholar
 Ganter, B. (1987). Algorithmen zur formalen begriffsanalyse. In: Ganter B, Wille R, Wolff KE (eds) Beiträge zur Begriffsanalyse, B.I.Wissenschaftsverlag, Mannheim, pp. 241–254.Google Scholar
 Ganter, B., & Kuznetsov, S. O. (2003). Hypotheses and version spaces. In: A. de Moor, W. Lex, & B. Ganter (Eds.), ICCS, lecture notes in computer science, Vol. 2746, pp. 83–95. Berlin: Springer.Google Scholar
 Ganter, B., & Wille, R. (1999). Formal Concept Analysis: Mathematical foundations (1st ed.). Secaucus, NJ: Springer.CrossRefMATHGoogle Scholar
 Gao, B., Liu, T. Y., Zheng, X., Cheng, Q. S., & Ma, W. Y. (2005). Consistent bipartite graph copartitioning for starstructured highorder heterogeneous data coclustering. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, ACM, New York, NY, KDD ’05, pp. 41–50.Google Scholar
 Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: A guide to the theory of NPcompleteness. New York: W. H. Freeman.MATHGoogle Scholar
 Georgii, E., Tsuda, K., & Schölkopf, B. (2011). Multiway set enumeration in weight tensors. Machine Learning, 82(2), 123–155.MathSciNetCrossRefMATHGoogle Scholar
 Gnatyshak, D., Ignatov, D. I., Semenov, A., & Poelmans, J. (2012). Gaining insight in social networks with biclustering and triclustering. In: BIR, Springer, Lecture Notes in Business Information Processing, vol. 128, pp. 162–171.Google Scholar
 Gnatyshak, D., Ignatov, D. I., & Kuznetsov, S. O. (2013). From triadic FCA to triclustering: Experimental comparison of some triclustering algorithms. In: Proceedings of the tenth international conference on concept lattices and their applications, La Rochelle, France, October 15–18, 2013, pp. 249–260.Google Scholar
 Golub, G., & van Loan, C. (1989). Matrix computations. Baltimore: The John Hopkins University Press.MATHGoogle Scholar
 Hanczar, B., & Nadif, M. (2010). Bagging for biclustering: Application to microarray data. In: Machine learning and knowledge discovery in databases, LNCS, Vol. 6321, pp. 490–505. Berlin: Springer.Google Scholar
 Ignatov, D. I., & Kuznetsov, S. O. (2008). Conceptbased recommendations for internet advertisement. In Belohlavek, R., Kuznetsov, S.O. (Eds.), Proceedings of the sixth international conference concept lattices and their applications (CLA’08), (pp. 157–166). Olomouc: Palacky University.Google Scholar
 Ignatov, D. I., & Kuznetsov, S. O. (2009). Frequent itemset mining for clustering near duplicate web documents. In Rudolph, S., Dau, F., Kuznetsov, S.O. (Eds.), ICCS, lecture notes in computer science, Vol. 5662, pp. 185–200. Berlin: Springer.Google Scholar
 Ignatov, D. I., Kuznetsov, S. O., Magizov, R. A., & Zhukov, L. E. (2011). From triconcepts to triclusters. In Rough sets, fuzzy sets, data mining and granular computing, LNCS, Vol. 6743, pp. 257–264. Berlin: Springer.Google Scholar
 Ignatov, D. I., Kuznetsov, S. O., & Poelmans, J. (2012). Conceptbased biclustering for internet advertisement. In: IEEE computer society ICDM workshops, pp. 123–130.Google Scholar
 Ignatov, D. I., Kuznetsov, S. O., Poelmans, J., & Zhukov, L. E. (2013). Can triconcepts become triclusters? International Journal of General Systems, 42(6), 572–593.MathSciNetCrossRefMATHGoogle Scholar
 Ignatov, D. I., Nenova, E., Konstantinova, N., & Konstantinov, A. V. (2014). Boolean Matrix Factorisation for Collaborative Filtering: An FCABased Approach. In Artificial intelligence: Methodology, systems, and applications, LNCS, Vol. 8722, pp. 47–58. Berlin: Springer.Google Scholar
 Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., & Stumme, G. (2006). TRIASan algorithm for mining iceberg trilattices. In Proceedings of the sixth international conference on data mining, IEEE computer society, Washington, DC, ICDM ’06, pp. 907–911.Google Scholar
 Ji, L., Tan, K. L., & Tung, A. K. H. (2006). Mining frequent closed cubes in 3D datasets. In Proceedings of the 32nd international conference on Very large data bases, VLDB ’06, pp. 811–822.Google Scholar
 Kaytoue, M., Kuznetsov, S. O., Napoli, A., & Duplessis, S. (2011). Mining gene expression data with pattern structures in formal concept analysis. Information Sciences, 181(10), 1989–2001.MathSciNetCrossRefGoogle Scholar
 Kaytoue, M., Kuznetsov, S. O., Macko, J., & Napoli, A. (2014). Biclustering meets triadic concept analysis. Annals of Mathematics and Artificial Intelligence, 70(1–2), 55–79.MathSciNetCrossRefMATHGoogle Scholar
 Koester, B. (2006). Conceptual knowledge retrieval with FooCA: Improving web search engine results with contexts and concept hierarchies. In Proceedings on sixth industrial conference on data mining, ICDM 2006, pp. 176–190.Google Scholar
 KrolakSchwerdt, S., Orlik, P., & Ganter, B. (1994). Tripat: A model for analyzing threemode binary data. Information systems and data analysis, studies in classification, data analysis, and knowledge organization (pp. 298–307). Berlin: springer.Google Scholar
 Kuznetsov, S. (2004). Machine learning and Formal Concept Analysis. In Concept lattices, LNCS, Vol. 2961, pp. 287–312. Berlin: Springer.Google Scholar
 Kuznetsov, S., & Samokhin, M. (2005). Learning closed sets of labeled graphs for chemical applications. In ILP 2005, LNCS (LNAI), Vol. 3625, pp. 190–208. Berlin: Springer.Google Scholar
 Kuznetsov, S. O., & Obiedkov, S. A. (2002). Comparing performance of algorithms for generating concept lattices. Journal of Experimental & Theoretical Artificial Intelligence, 14(2–3), 189–216.CrossRefMATHGoogle Scholar
 Latapy, M., Magnien, C., & Vecchio, N. D. (2008). Basic notions for the analysis of large twomode networks. Social Networks, 30(1), 31–48.CrossRefGoogle Scholar
 Lehmann, F., & Wille, R. (1995). A triadic approach to Formal Concept Analysis. In Proceedings of the third international conference on conceptual structures: Applications implementation and theory (pp. 32–43). London: Springer.Google Scholar
 Li, A., & Tuck, D. (2009). An effective triclustering algorithm combining expression data with gene regulation information. Gene Regulation and Systems Biology, 3, 49–64.Google Scholar
 Liu, K., Fang, B., & Zhang, W. (2010). Unsupervised tag sense disambiguation in folksonomies. Journal of Computers, 5(11), 1715–1722.Google Scholar
 Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1), 24–45.CrossRefGoogle Scholar
 Meulders, M., DeBoeck, P., Kuppens, P., & Van Mechelen, I. (2002). Constrained latent class analysis of threeway threemode data. Journal of Classification, 19(2), 277.MathSciNetCrossRefMATHGoogle Scholar
 Miettinen, P. (2011). Boolean tensor factorization. In Cook, D., Pei, J., Wang, W., Zaïane, O., & Wu, X. (Eds.), ICDM 2011, 11th IEEE international conference on data mining, IEEE computer society (pp. 447–456). Vancouver: CPS.Google Scholar
 Mirkin, B. (1996). Mathematical classification and clustering. Dordrecht: Kluwer.CrossRefMATHGoogle Scholar
 Mirkin, B. G., & Kramarenko, A. V. (2011). Approximate bicluster and tricluster boxes in the analysis of binary data. In Rough sets, fuzzy sets, data mining and granular computing, LNCS, Vol. 6743, (pp. 248–256). Berlin: Springer.Google Scholar
 Nanopoulos, A., Gabriel, H. H., & Spiliopoulou, M. (2009). Spectral clustering in socialtagging systems. In Vossen, G., Long, D.D.E., Yu, J.X. (Eds.), WISE, Springer, lecture notes in computer science, Vol. 5802, (pp. 87–100).Google Scholar
 Nanopoulos, A., Rafailidis, D., Symeonidis, P., & Manolopoulos, Y. (2010). Musicbox: Personalized music recommendation based on cubic analysis of social tags. IEEE Transactions on Audio, Speech & Language Processing, 18(2), 407–412.CrossRefGoogle Scholar
 Outrata, J. (2010). Boolean factor analysis for data preprocessing in machine learning. In The ninth international conference on machine learning and applications, ICMLA 2010, 12–14 December 2010, (pp. 899–902). Washington, DC.Google Scholar
 Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Efficient mining of association rules using closed itemset lattices. Information Systems, 24(1), 25–46.CrossRefMATHGoogle Scholar
 Poelmans, J., Ignatov, D. I., Viaene, S., Dedene, G., Kuznetsov, S. O. (2012). Text mining scientific papers: A survey on FCAbased information retrieval research. In Perner, P. (Ed.), ICDM, lecture notes in computer science, Vol. 7377 (pp. 273–287). Berlin: Springer.Google Scholar
 Poelmans, J., Ignatov, D. I., Kuznetsov, S. O., & Dedene, G. (2013a). Formal Concept Analysis in knowledge processing: A survey on applications. Expert Systems with Applications, 40(16), 6538–6560.CrossRefGoogle Scholar
 Poelmans, J., Kuznetsov, S. O., Ignatov, D. I., & Dedene, G. (2013b). Formal Concept Analysis in knowledge processing: A survey on models and techniques. Expert Systems with Applications, 40(16), 6601–6623.CrossRefGoogle Scholar
 Roth, C., Obiedkov, S. A., & Kourie, D. G. (2008). On succinct representation of knowledge community taxonomies with Formal Concept Analysis. International Journal of Foundations of Computer Science, 19(2), 383–404.MathSciNetCrossRefMATHGoogle Scholar
 Rudolph, S. (2007). Using FCA for encoding closure operators into neural networks. In Proceedings on 15th international conference on conceptual structures, ICCS 2007, July 22–27, 2007 (pp. 321–332). Sheffield.Google Scholar
 Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRefGoogle Scholar
 Spyropoulou, E., De Bie, T., & Boley, M. (2014). Interesting pattern mining in multirelational data. Data Mining and Knowledge Discovery, 28(3), 808–849.MathSciNetCrossRefMATHGoogle Scholar
 Symeonidis, P., Nanopoulos, A., Papadopoulos, A. N., & Manolopoulos, Y. (2008). Nearestbiclusters collaborative filtering based on constant and coherent values. Information Retrieval, 11(1), 51–75.CrossRefGoogle Scholar
 Tarca, A. L., Carey, V. J., wen Chen, X., Romero, R., & Drǎghici, S. (2007). Machine learning and its applications to biology. PLOS Computational Biology, 3(6), e116.CrossRefGoogle Scholar
 Tsopzé, N., Nguifo, E. M., & Tindo, G. (2007). CLANN: Concept latticebased artificial neural network for supervised classification. In Proceedings of the 5th international conference on concept lattices and their applications, CLA 2007.Google Scholar
 Tsymbal, A., Pechenizkiy, M., & Cunningham, P. (2005). Diversity in search strategies for ensemble feature selection. Information Fusion, 6(1), 83–98.CrossRefGoogle Scholar
 Valiant, L. G. (1979). The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3), 410–421.MathSciNetCrossRefMATHGoogle Scholar
 Vander Wal, T. (2007). Folksonomy coinage and definition. http://vanderwal.net/folksonomy.html. Accessed on 12 03 2012.
 Visani, M., Bertet, K., & Ogier, J. (2011). Navigala: An original symbol classifier based on navigation through a Galois lattice. IJPRAI, 25(4), 449–473.MathSciNetGoogle Scholar
 Voutsadakis, G. (2002). Polyadic concept analysis. Order, 19(3), 295–304.MathSciNetCrossRefMATHGoogle Scholar
 Wille, R. (1995). The basic theorem of Triadic Concept Analysis. Order, 12, 149–158.MathSciNetCrossRefMATHGoogle Scholar
 Zaki, M. J. (2001). Spade: An efficient algorithm for mining frequent sequences. Machine Learning, 42, 31–60.CrossRefMATHGoogle Scholar
 Zaki, M. J., & Aggarwal, C. C. (2006). Xrules: An effective algorithm for structural classification of XML data. Machine Learning, 62(1–2), 137–170.CrossRefGoogle Scholar
 Zaki, M. J., & Hsiao, C. (2005). Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4), 462–478.CrossRefGoogle Scholar
 Zhao, L., & Zaki, M. J. (2005). Tricluster: An effective algorithm for mining coherent clusters in 3D microarray data. In Özcan, F. (Ed.), SIGMOD Conference, (pp. 694–705). New York: ACM.Google Scholar