Biclustering methods for one-mode asymmetric matrices

Brusco, Michael J.; Doreian, Patrick; Steinley, Douglas

doi:10.3758/s13428-015-0587-y

Biclustering methods for one-mode asymmetric matrices

Published: 22 April 2015

Volume 48, pages 487–502, (2016)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Biclustering methods for one-mode asymmetric matrices

Download PDF

Michael J. Brusco^1,5,
Patrick Doreian^2,3 &
Douglas Steinley⁴

1555 Accesses
6 Citations
Explore all metrics

Abstract

An asymmetric one-mode data matrix has rows and columns that correspond to the same set of objects. However, the roles of the objects frequently differ for the rows and the columns. For example, in a visual alphabetic confusion matrix from an experimental psychology study, both the rows and columns pertain to letters of the alphabet. Yet the rows correspond to the presented stimulus letter, whereas the columns refer to the letter provided as the response. Other examples abound in psychology, including applications related to interpersonal interactions (friendship, trust, information sharing) in social and developmental psychology, brand switching in consumer psychology, journal citation analysis in any discipline (including quantitative psychology), and free association tasks in any subarea of psychology. When seeking to establish a partition of the objects in such applications, it is overly restrictive to require the partitions of the row and column objects to be identical, or even the numbers of clusters for the row and column objects to be the same. This suggests the need for a biclustering approach that simultaneously establishes separate partitions of the row and column objects. We present and compare several approaches for the biclustering of one-mode matrices using data sets from the empirical literature. A suite of MATLAB m-files for implementing the procedures is provided as a Web supplement with this article.

Maximal Interaction Two-Mode Clustering

Article Open access 20 March 2017

Multidimensional Joint Graphical Display of Symmetric Analysis: Back to the Fundamentals

Methods for the analysis of asymmetric pairwise relationships

Article 01 February 2018

A common data structure is a ℜ^N×N matrix, X = [x _ij], where x _ij is some measure of proximity (similarity or dissimilarity) between objects i and j, for all 1 ≤ i ≤ N and 1 ≤ j ≤ N objects. In some instances, X is a symmetric matrix, with x _ij = x _ji for all 1 ≤ i ≤ N and 1 ≤ j ≤ N objects. For example, when the N objects are measured on a collection of metric variables, a nonnegative N × N symmetric dissimilarity version of X can be established on the basis of the pairwise Euclidean distances between pairs of objects. Symmetry is present because the Euclidean distance between objects i and j is the same as the distance between j and i. The dissimilarity interpretation is appropriate because larger matrix elements (i.e., distances) imply greater dissimilarity (rather than greater similarity) among the pairs of objects.

Although symmetric matrices are common, there are also many psychological applications for which X is an asymmetric matrix. For example, in experimental psychology, when studying confusion among a set of stimulus objects, x _ij (for i ≠ j) is commonly a measure of the number of instances (or the percentage of instances) for which subjects mistakenly responded with stimulus object j when stimulus object i was actually presented (see Brusco & Steinley, 2006). For any given pair of objects (i, j), asymmetry can potentially occur because a presented stimulus i could be mistaken more frequently for j than stimulus j would be mistaken for i, or vice versa. Likewise, for brand-switching applications in consumer psychology, where x _ij (for i ≠ j) is a measure reflecting the degree to which the consumers in a sample switch from brand i to brand j, asymmetry occurs because switches from i to j might be more or less frequent than those from j to i. Asymmetry is also apt to be present in matrices associated with social network ties among schoolchildren (in developmental psychological studies—e.g., Anderson, Wasserman, & Crouch, 1999; Parker & Asher, 1993), social groups (in social psychological studies—e.g., Gibbons & Olk, 2003), and organizational members (in studies from industrial/applied psychology—e.g., Totterdell, Wall, Holman, Diamond, & Epitropaki, 2004). Regardless of whether these network examples concern friendship, trust, advice seeking, information sharing, or some other type of relational tie, the potential for asymmetry arises because ties need not be reciprocated: Actor i could identify actor j as a friend, someone they trust, or from whom they seek advice, but actor j might not reciprocate such ties.

Although the rows and columns of the matrix pertain to the same set of objects, the asymmetry in X generally stems from the objects having two different roles. In confusion matrices, the role of the row objects is the “presented stimulus,” whereas the role of the column objects is the “response to a presented stimulus.” Similarly, for brand-switching applications, the role of the row objects is “previously purchased brand” (or “switched from”), with the role of the column objects being “most recently purchased brand” (or “switched to”). In social network situations, the roles of the row and column actors are commonly “tie senders” and “tie receivers,” respectively. For journal citation networks, the row journals could be “cited journals” (or “producers”), with the column journals being “citing journals” (or “consumers”).

There are a variety of methods for analyzing asymmetric proximity data (a recent review of some of these approaches has been provided by Vicari, 2014). Moreover, regardless of the type of method selected, transformation or decomposition processes are sometimes applied prior to analysis. For example, one popular approach is to decompose X into its symmetric (Ψ) and skew-symmetric (Λ) components and to pursue analysis of these components independently or jointly (Brusco & Stahl, 2005a; Hubert, 1987, chap. 4; Hubert, Arabie, & Meulman, 2001, chap. 4; Zielman & Heiser, 1996). The elements of the symmetric component are ψ_ij = ψ_ji = (x _ij + x _ji)/2, whereas the elements of the skew-symmetric component are λ_ij = (x _ij – x _ji)/2 for all 1 ≤ i, j ≤ N. The decomposition of X is then provided by X = Ψ + Λ.

The methods for analyzing asymmetric data include graphical approaches, seriation, unidimensional scaling, multidimensional scaling, hierarchical clustering, and nonhierarchical clustering. Graphical representation procedures have been described by Constantine and Gower (1978) and Chino (1978). Seriation methods, which seek to develop orderings of the objects associated with an asymmetric matrix, have also been proposed by numerous authors (Baker & Hubert, 1977; Brusco, 2001; Brusco & Stahl, 2005b; DeCani, 1972; Flueck & Korsh, 1974; Hubert, 1976; Hubert et al., 2001, chap. 4; Ranyard, 1976). Multidimensional-scaling approaches for asymmetric matrices have been developed by Harshman, Green, Wind, and Lundy (1982), Okada and Imaizumi (1987), and Zielman and Heiser (1996). Here, however, we restrict attention to methods for clustering or partitioning the objects associated with the one-mode proximity data. Hubert (1973) provided one of the earliest investigations of this topic in his discussion of the application of hierarchical clustering methods to asymmetric data. A more recent review of hierarchical methods was provided by Takeuchi, Saito, and Yadohisa (2007), and Brusco and Steinley (2006) discussed approaches within the framework of partitioning.

The presence of asymmetry in a one-mode proximity matrix presents a challenge when one is seeking to identify a partition of the N objects. One approach is to collapse X into a symmetric proximity matrix using some type of transformation procedure, and then to apply traditional one-mode partitioning methods to obtain a solution (see, e.g., Brusco & Steinley, 2006, for a discussion of this approach within the context of confusion data). Unfortunately, this approach discards information associated with the asymmetry. For example, two journals might have very similar roles as producers for the journals that cite them, but markedly different roles as consumers of the journals they cite (see, e.g., Brusco, 2011; Brusco & Doreian, 2015b). A second approach is to sequentially establish partitions based on the row objects and column objects. In the first stage, a symmetric nonnegative matrix of Euclidean distances between the row objects could be established by treating the columns as variables, and a partition of the row objects could then be obtained using hierarchical or nonhierarchical clustering. This process would then be repeated for the column objects by treating the rows as variables. A caveat associated with this approach is that the main diagonal elements are often arbitrary and, therefore, not appropriate for use when computing the distance matrices.

A preferable approach is to adopt a biclustering (Madeira & Oliveira, 2004; Prelić et al., 2006; Van Uitert, Meuleman, & Wessels, 2008; Wilderjans, Depril, & Van Mechelen, 2013) perspective that seeks to simultaneously establish two distinct partitions of the objects: (i) one based on their role as row objects, and (ii) one based on their role as column objects. The term biclustering is used broadly here, and it is noted that alternative terminology, such as two-mode clustering or two-mode partitioning, is also used in some instances. As with the sequential method, biclustering creates flexibility by allowing different numbers of clusters for the row and column objects. However, unlike the sequential approach, the entire asymmetric proximity matrix is used when establishing the row-object and column-object partitions.

Succinctly, the approach adopted herein is to treat the one-mode asymmetric matrix X as though it is two-mode. Strictly, a two-mode matrix has data in which the N row objects are completely distinct from the M column objects. Psychological examples of two-mode matrices include memberships of CEOs on boards of directors and the scores of examinees on test items. A formidable research effort has been devoted to two-mode partitioning problems (Brusco & Doreian, 2015a, b; Brusco, Doreian, Lloyd, & Steinley, 2013; Brusco, Doreian, Mrvar, & Steinley, 2013; Brusco & Steinley, 2007, 2009, 2011; Doreian, Batagelj, & Ferligoj, 2004, 2005; Doreian, Lloyd, & Mrvar, 2013; Schepers, Ceulemans, & Van Mechelen, 2008; Schepers & Van Mechelen, 2011; Schepers, Van Mechelen, & Ceulemans, 2011; Van Mechelen, Bock, & DeBoeck, 2004; van Rosmalen, Groenen, Trejos, & Castillo, 2009; Vichi, 2001; Wilderjans, Depril, & Van Mechelen, 2013). An especially important aspect of applying a biclustering approach to a one-mode matrix concerns the handling of the main diagonal. A two-mode matrix has no main diagonal, but a one-mode matrix clearly does. In some situations, having main diagonal elements is nonsensical, as in the case of social network ties (e.g., are people friends with themselves?). However, in brand-switching applications, the main diagonal can reflect the degree of retention of customers. Similarly, in confusion data, the main diagonal exemplifies correct responses to the presented stimulus.

Ideally, psychological researchers would have easy access to biclustering procedures that could be used for one-mode data and that would have the flexibility to include or exclude the main diagonal, depending on the goals of their studies. Unfortunately, such methods are generally unavailable in commercial software packages. Accordingly, in the spirit of other attempts to identify and make accessible the clustering models and methods that are important for psychological applications (Brusco & Steinley, 2006; Köhn, Steinley, & Brusco, 2010; Schepers & Hofmans, 2009; Schepers & Van Mechelen, 2011), we seek to achieve three interrelated objectives in this article: (i) to briefly describe several methods that can be used for biclustering one-mode matrices, (ii) to present a suite of MATLAB programs that implement these biclustering methods, and (iii) to provide two psychologically oriented examples to demonstrate and compare the procedures and programs. The next section presents three different useful methods for biclustering and describes MATLAB m-files for their implementation. These m-files are available in the Web supplement associated with this article. This is followed by a description of a general process for model selection with biclustering based on ideas gleaned from Schepers et al. (2008), Wilderjans, Ceulemans, and Meers (2013), and Brusco and Steinley (2014). Subsequent sections provide illustrations of the software programs for a real-valued asymmetric matrix and a binary asymmetric matrix, respectively. The article concludes with a brief summary.

Biclustering methods

Two-mode KL-means partitioning

Two-mode KL-means partitioning (TMKLMP) is a generalization of K-means partitioning. Whereas a K-means method (see Steinley, 2006, for a review) partitions a single set of objects into K clusters on the basis of minimization of the sum of squared errors, TMKLMP simultaneously establishes distinct partitions of the row and column objects using an analogous criterion. Although similar approaches have been discussed using different names (Brusco & Steinley, 2007; Hartigan, 1972; van Rosmalen et al., 2009), TMKLMP was recently adopted by Brusco and Doreian (2015a, b), such that the “KL” component of the name reflects the fact that the number of row clusters (K) need not be the same as the number of column clusters (L). The goal is to find row-object and column-object partitions that minimize the within-submatrix sum of squared deviations from the within-cluster means. Formally, we denote P = {S ₁, . . . , S _K} as a K-cluster partition of the N objects associated with the rows of a data matrix, X, where S _k is the set of objects assigned to cluster k for all 1 ≤ k ≤ K. The standard conditions of a partition having clusters that are nonempty (S _k ≠ ∅, for all 1 ≤ k ≤ K), mutually exclusive (S _k ∩ S_l = ∅ for all 1 ≤ k ≠ l ≤ K), and exhaustive (S ₁ ∪ . . . ∪ S _k = S) apply. Similarly, we define Q = {T ₁, . . . , T _L} as an L-cluster partition of the N column objects of X, where T _l is the set of objects assigned to cluster l for all 1 ≤ l ≤ L. The standard conditions of a partition hold also for Q.

Denoting Π as the set of all partitions of N objects into K clusters and Ω as the set of all partitions of N objects into L clusters, the optimization problem associated with TMKLMP can be specified as follows:

$$ \underset{P\in \varPi, Q\in \varOmega }{\mathrm{Minimize}}:f\left(P,Q\right)={\displaystyle \sum_{k=1}^K{\displaystyle \sum_{l=1}^L{\displaystyle \sum_{i\in {S}_k}{\displaystyle \sum_{j\in {T}_l}{\left({x}_{ij}-{\overline{x}}_{kl}\right)}^2,}}}} $$

(1)

where

$$ {\overline{x}}_{kl}=\frac{{\displaystyle \sum_{i\in {S}_k}{\displaystyle \sum_{j\in {T}_l}{x}_{ij}}}}{\left|{S}_k\right|\left|{T}_l\right|}, $$

(2)

and ∣S _k∣ (1 ≤ k ≤ K) and ∣T _l∣ (1 ≤ l ≤ L) are the numbers of row objects and column objects in clusters k and l, respectively. Together, the row objects (S _k) in cluster k and the column objects (T _l) in cluster l define a submatrix of X. The value of $ {\overline{x}}_{kl} $ is the mean of the elements in that submatrix, and the degree of homogeneity for the submatrix is a variance type with a measure defined by the sum of the squared deviations of each element from the submatrix mean, ($ {x}_{ij}-{\overline{x}}_{kl} $)². Perfect homogeneity of a submatrix is achieved when all elements in the submatrix are the same (e.g., all 0 s or all 1 s, in the case in which X is defined by the presence or absence of network ties). From an interpretive standpoint, the row objects (or column objects) in the same cluster can be perceived as being similar, in the sense that they tend to have comparable measures within the submatrices.

Brusco and Doreian (2015b) recently developed an exact algorithm for the TMKLMP that can be successfully applied to problems of size N = 20. A number of heuristic algorithms have been designed for TMKLMP, including simulated annealing (Trejos & Castillo, 2000), genetic algorithms (Brusco & Doreian, 2015a; Hansohm, 2002), and variable neighborhood search (Brusco & Steinley, 2007). However, a two-mode generalization of the K-means algorithm designed by Baier, Gaul, and Schader (1997) has been shown to be competitive with these more sophisticated procedures (van Rosmalen et al., 2009). Accordingly, we make available the MATLAB m-file tmklmp.m, which implements this procedure. The key inputs to tmklmp.m are X, K, and L. The default setting is for 500 restarts of the algorithm from different initial partitions, which is based on the implementations in previous studies (Brusco & Steinley, 2007; van Rosmalen et al., 2009). The principal outputs are the partitions of the row and column objects. Also, the raw objective function value corresponding to Eq. 1 is reported, as well as a normalized measure representing the total variance accounted for (vaf) by the partition, computed as follows:

$$ vaf=1-\frac{f\left(P,Q\right)}{{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=1}^N{\left({x}_{ij}-\overline{x}\right)}^2}}}, $$

(3)

where

$$ \overline{x}=\frac{{\displaystyle \sum_{i=1}^N}{\displaystyle \sum_{j=1}^N}{x}_{ij}}{N^2} $$

(4)

The program tmklmp.m is applicable directly to either two-mode matrices or asymmetric one-mode matrices. However, when applied to the latter data structures, it is important to recognize the inclusion of the main diagonal elements of X in the computation of the submatrix means in Eq. 2, as well as in the sum-of-squares objective function in Eq. 1. As we noted previously, a compelling argument for ignoring the main diagonal elements can be made for many applications. Therefore, we also developed an alternative version, tmklmp_nodiag.m, which ignores the main diagonal elements when computing the submatrix means in Eq. 2 and the sum of squares in Eq. 1. The main diagonal is also ignored when computing the overall mean in Eq. 4 and the denominator term in Eq. 3. The inputs and outputs of this program are identical to those for tmklmp.m.

Nonnegative matrix factorization

The second approach for biclustering of one-mode asymmetric matrices has its roots in pioneering work on matrix decomposition (Eckart & Young, 1936; Young & Householder, 1938). More specifically, low-dimensional least-squares approximations of a two-mode matrix can be obtained using singular value decomposition (SVD) or principal component analysis (PCA). Given that the one-mode asymmetric matrix can be treated as two-mode data, these methods are applicable here, as well. However, two limitations are readily apparent: (i) the potential requirement of ignoring the main diagonal and (ii) the possibility of negative elements in the factors, which can hinder interpretability (Lee & Seung, 1999).

As with the SVD and PCA approaches, nonnegative matrix factorization (NMF) seeks a minimum-sum-of-squares, low-dimensionality approximation of X. However, in contrast to the eigenvectors obtained by SVD and PCA, which can assume both positive and negative values, these components are required to be strictly nonnegative in NMF. The rationale for this constraint, as described by Lee and Seung (1999), is to preserve the condition of having components be the sum of their parts. The arbitrary sign of the elements of a PCA factorization presents the following difficulty, noted by Lee and Seung (1999, p. 789): “As the eigenfaces are used in linear combinations that generally involve complex cancellations between positive and negative numbers, many individual eigenfaces lack intuitive meaning.” Expanding on this point, Fogel, Hawkins, Beecher, Luta, and Young (2013, p. 207) found that NMF “often leads to substantial improvements in the interpretability of the factors,” relative to attempts to transform PCA scores of mixed signs into meaningful factors.

To formalize NMF, we denote D as the desired dimensionality of the factorization. The problem requires the determination of two nonnegative matrices, G and H, the product of which will generate a D-dimensional least-squares approximation of X. Matrix G = [g _id] is an N × D matrix of nonnegative coefficients for the row objects (1 ≤ i ≤ N) for each of the D dimensions (1 ≤ d ≤ D). Similarly, H = [h _dj] is a D × N matrix of nonnegative coefficients for the column objects (1 ≤ j ≤ N) for each of the D dimensions (1 ≤ d ≤ D). The optimization problem for NMF is

$$ \mathrm{Minimize}:Z={\left\Vert \mathbf{X}-\mathbf{G}\mathbf{H}\right\Vert}^2={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=1}^N{\left({x}_{ij}-{\displaystyle \sum_{d=1}^D{g}_{id}{h}_{dj}}\right)}^2,}} $$

(5)

$$ \mathrm{Subject}\ \mathrm{t}\mathrm{o}:\ \begin{array}{ll}{g}_{id}\ge 0,\hfill & \mathrm{f}\mathrm{o}\mathrm{r}\;1\hfill \end{array}\le I\le N\;\mathrm{and}\;1\le d\le D, $$

(6)

$$ \begin{array}{ll}{h}_{dj}\ge 0,\hfill & \mathrm{f}\mathrm{o}\mathrm{r}\ 1\le d\le D\;\mathrm{and}\ 1\le j\le N.\hfill \end{array} $$

(7)

Essentially, the optimization problem is to obtain G and H while minimizing the least-squares loss function in Eq. 5, subject to the nonnegativity constraints in Eqs. 6 and 7. Lee and Seung (2001) described a rescaled gradient descent algorithm for solving the optimization problem stated in Eqs. 5–7 (see also Brusco, 2011, for a description and extension to the case of an asymmetric one-mode matrix with an arbitrary main diagonal). Once the optimization problem has been solved by obtaining G and H, there are several possibilities for constructing partitions of the row and column objects. Perhaps the simplest approach is to assign each row object i to the cluster d for which g _id is maximum, and each column object j to the cluster d for which h _dj is maximum. Of course, this approach imposes the restriction that K = L = D. An alternative approach, which is employed herein, is to apply K-means partitioning (with, say, 5,000 restarts) to the G and H matrices in order to obtain K-cluster and L-cluster partitions of the row and column objects, respectively.

We developed a MATLAB m-file implementation of the NMF algorithm, nmf.m. The nmf.m program is similar to the module that is available in the MATLAB system, yet it is differentiated by distinct parameter settings for the descent algorithm and, more importantly, by the inclusion of a K-means clustering subroutine to cluster the rows and columns on the basis of G and H, respectively. The inputs to the program are X, K, L, and D. The outputs of the nmf.m program are the percentage reduction of error (PRE) provided by the D-dimensional factorization, the partition of the row objects, and the partition of the column objects. As was the case for TMKLMP, a second version of the NMF program, nmf_nodiag.m, was prepared that ignores the main diagonal elements in the estimation process. The inputs and outputs of this program are identical to those of nmf.m.

Two-mode blockmodeling

Although the previously described methods can be applied to binary data matrices (i.e., x _ij ∈ {0, 1} for all i and j), some analysts might prefer a method that is explicitly designed for such data. Two-mode blockmodeling methods are especially well-suited for binary data (Brusco & Steinley, 2007, 2011; Brusco, Doreian, Lloyd, & Steinley, 2013; Brusco, Doreian, Mrvar, & Steinley, 2013; Doreian et al., 2004, 2005). These methods seek partitions of the row and column objects such that the ideal submatrices formed by the row and column clusters are either complete (having all 1 s) or null (having all 0 s) to the greatest extent possible. Accordingly, the typical objective is to obtain partitions minimizing the number of inconsistencies with this ideal structure, as measured by a count of the total number of violations. More formally, the objective of the two-mode blockmodeling problem (TMBP) that we consider is to minimize the total of the sum of 1 s in submatrices that are mostly 0 s plus the sum of 0 s in submatrices that are mostly 1 s. The optimization problem is:

$$ \underset{\left(P\in \varPi, Q\in \varOmega \right)}{\mathrm{Minimize}}:g\left(P,Q\right)={\displaystyle \sum_{k=1}^K{\displaystyle \sum_{l=1}^L \min }}\left\{{\eta}_{kl},{\rho}_{kl}\right\}, $$

(8)

where

$$ \begin{array}{ll}{\eta}_{kl}={\displaystyle \sum_{i\in {S}_k}{\displaystyle \sum_{j\in {T}_l}{x}_{ij},}}\hfill & \forall 1\le k\le K\; and\;1\le l\le L,\hfill \end{array} $$

(9)

and

$$ \begin{array}{ll}{\rho}_{kl}={\displaystyle \sum_{i\in {S}_k}{\displaystyle \sum_{j\in {T}_l}\left(1-{x}_{ij}\right)}}\hfill & \forall 1\le k\le K\; and\;1\le l\le L.\hfill \end{array} $$

(10)

Brusco, Doreian, Lloyd, and Steinley (2013) designed an exact algorithm for TMBP that is scalable for applications in which N ≤ 20. Heuristic methods for TMBP include tabu search (Brusco & Steinley, 2011) and variable neighborhood search (Brusco, Doreian, Mrvar, & Steinley, 2013). However, recent computational results reported by Brusco, Doreian, Mrvar, and Steinley have shown that a two-mode relocation heuristic designed by Doreian et al. (2004, 2005) often produces solutions that are as good as those obtained by these more sophisticated procedures. For this reason, we prepared a MATLAB m-file, tmbp.m, implementing the relocation heuristic procedure. The key inputs to tmbp.m are X, K, and L. The default setting is for 5,000 restarts of the algorithm from different initial partitions. The principal outputs are the objective function value (Eq. 8) and the partitions of the row and column objects.

Model selection

All three of the methods described in the previous section require the selection of a model. In the case of TMKLMP and TMBP, model selection is generally limited to the selection of K and L. We recommend running these algorithms for all possible different combinations of K and L obtained from the intervals K ₁ ≤ K ≤ K ₂ and L ₁ ≤ L ≤ L ₂. The total number of clusters for any given combination is ξ = K + L, which represents the level of complexity of the model. To measure the improvement in the objective function with respect to an increase in complexity, we employ the convex hull (CHull) approach, which has been widely adopted in the context of multimode clustering (Ceulemans & Van Mechelen, 2005; Schepers et al., 2008; Schepers & Van Mechelen, 2011; Wilderjans, Ceulemans, & Meers, 2013).

To illustrate the CHull approach for TMKLMP, we consider the vaf as the objective criterion of interest. The process begins with a deviance plot of the vaf values obtained from all combinations of K and L. The vertical axis of this plot is vaf, and the horizontal axis is the total number of clusters (ξ = K + L). The second step retains only those solutions falling on the upper boundary of the convex hull and places them in an ordered list based on complexity. We use B to denote the total number of solutions falling on the boundary, vaf(b), as the vaf for the solution b (1 ≤ b ≤ B), and ξ(b) to denote the complexity of solution b (1 ≤ b ≤ B). The upper boundary is used because the goal is to maximize vaf. For a minimization objective function, the lower boundary would be used. The third step is to select one of the B solutions on the basis of visual inspection of the deviance plot and/or the use of measures based on the slopes of segments of the convex hull. A visually based selection of a solution from the boundary is made via a search for an “elbow” in the plot, in a manner similar to the use of a scree plot in factor analysis. Ceulemans and Van Mechelen (2005) have provided two slope-based measures to augment visual inspections. The difference measure, DiffCH, is used to choose the solution b that maximizes

$$ \frac{\left|vaf(b)-vaf\left(b-1\right)\right|}{\left|\xi (b)-\xi \left(b-1\right)\right|}-\frac{\left|vaf\left(b+1\right)-vaf(b)\right|}{\left|\xi \left(b+1\right)-\xi (b)\right|}. $$

(11)

Similarly, the ratio measure, RatioCH, leads to a choice of the solution b maximizing

$$ \frac{\left[\left|vaf(b)-vaf\left(b-1\right)\right|/\left|\xi (b)-\xi \left(b-1\right)\right|\right]}{\left|vaf\left(b+1\right)-vaf(b)\right|/\left|\xi \left(b+1\right)-\xi (b)\right|}. $$

(12)

Although the absolute value signs in Eqs. 11 and 12 are not required in the case of a maximization objective such as vaf, they are included to avoid any sign confusion that might arise in a minimization context, such as g(P, Q) in Eq. 8. Ceulemans and Van Mechelen (2005) found that RatioCH outperformed DiffCH in their simulation study. Nevertheless, some caution regarding the use of RatioCH is advisable, because it can be extremely sensitive to very small changes in the criterion function. To illustrate this, suppose that the vaf(b) values for four possible solutions B – 3, B – 2, B – 1, and B are .6, .9, .92, and .921, respectively. Assuming that ξ(b) = b for all 1 ≤ b ≤ B, solution B – 2 has DiffCH = (.9 – .6) – (.92 – .9) = .28. In contrast, solution B – 1 has DiffCH = (.92 – .9) – (.921 – .92) = .019. Solution B – 2 would be preferred to B – 1 on the basis of DiffCH. However, B – 1 is preferred to B – 2 on the basis of RatioCH, because (.92 – .9)/(.921 – .92) = 20 exceeds (.9 – .6)/(.92 – .9) = 15. Arguably, the B – 2 solution is the proper choice. Yet the very small change when moving from B – 1 to B makes B – 1 preferred according to the RatioCH measure. For this reason, we recommend consideration of both measures, along with visual inspection to select a model. Examples of doing this are provided in our empirical examples.

In addition to K and L, NMF also requires the selection of the dimensionality of the factorization. We recommend decoupling the selection of D from the selection of K and L. More specifically, a choice for D is made first on the basis of the analysis of PRE at different values of D. We recommend choosing D by using visual inspection of the plot of PRE measures at different values of D together with the DiffCH and RatioCH measures. Once D is selected, the numbers of clusters (K and L) for NMF can be chosen in a variety of ways. One approach would be to obtain a set of K-means clustering solutions for G using different Ks and to select K on the basis of any of the host of different indices (see, e.g., Steinley & Brusco, 2011). This process would then be repeated for H to choose the best value for L. An alternative approach, adopted here, is to select K and L on the basis of the results obtained using TMKLMP (TMBP could also be used). Although this approach does make NMF dependent on TMKLMP, it has the advantage of enabling the joint (or simultaneous) selection of K and L.

Example 1

Lipread consonant data

The first example comes from a study of confusion among N = 21 lipread consonants (Manning & Shofner, 1991, p. 596). The elements (x _ij) of the data matrix correspond to the proportions of responses given as letter j when letter i was the presented stimulus. All elements of the lipread consonant data are nonnegative real-valued numbers. Accordingly, two-mode KL-means clustering and nonnegative matrix factorization can be applied for analyzing these data. Since their original publication, these lipread consonant data have been reanalyzed using clustering or seriation methods (Brusco & Stahl, 2005b; Brusco & Steinley, 2010). However, the latter studies analyzed the data subsequent to transforming the asymmetric confusion proportions to a symmetric matrix. In contrast, our analysis herein preserves the asymmetry in these data.

The main diagonal of the confusion matrix represents the proportions of correct responses for the corresponding stimulus objects. For most consonants the proportions are, by far, the largest numbers in the rows. At the extreme, the letters {y} and {w} have main diagonal entries of .974 and .968, respectively. However, a few of the less frequently used letters have high confusability with more popular letters, and accordingly, their main diagonal entries are much smaller. Examples include the letters {x} and {z}, which have main diagonal entries of .091 and .151, respectively. Given that the focus of this type of study is usually on the patterns of confusion, as well as on the artificial inflation in variation that is likely to be incurred from large diagonal elements, we emphasize that we used tmklmp_nodiag.m and nmf_nodiag.m for our analyses. However, our presentation of the results will culminate with a brief discussion of the effects of including the main diagonal in the analysis, via the use of tmklmp.m.

Results

We began our analysis by applying tmklmp_nodiag.m to the confusion matrix for all combinations of 2 ≤ K ≤ 9 and 2 ≤ L ≤ 9. The vaf for each combination is displayed in Table 1, and the deviance plot of the vaf values is provided in Fig. 1. An inspection of the upper boundary of the convex hull reveals three levels of complexity (ξ = K + L = 5, ξ = K + L = 7, and ξ = K + L = 9) as not producing a solution on the boundary. Moreover, visual inspection of the plot shows the sharpest elbow occurring at ξ = K + L = 8. The selection of ξ = 8 is supported strongly also by its DiffCH measure of .0743, which is far larger than the corresponding measures for the other values of ξ. The ξ = 8 solution on the upper boundary has the second largest RatioCH measure (2.65) and is appreciably larger than the measures at most other values of ξ. The lone exception is ξ = 16 (RatioCH = 4.29). However, its large measure arises because of the issue noted in the Model Selection section. In this instance, the very small increase in vaf when moving from ξ = 16 to ξ = 17 inflates the RatioCH measure for ξ = 16. Considering this caveat with respect to the RatioCH measure, along with the visual inspection of Fig. 1 and the DiffCH measure, ξ = 8 is a more appropriate level of model complexity. The particular solution on the upper boundary at ξ = 8 corresponds to K = L = 4, and this solution was selected for interpretation. The computation time for tmklmp_nodiag.m for K = L = 4 was approximately 3.3 s on a 2.2-GHz Pentium 4 PC, which is a circa 2000–2002 hardware platform. The best-found vaf was identified on 97 % of the 500 restarts of the algorithm, which is a high attraction rate suggesting (but not guaranteeing) that the global optimum has been found.

Table 1 Two-mode KL-means partitioning results for the Manning and Shofner (1991, p. 596) lipread consonant confusion data: Variance accounted for at different combinations of K and L

Full size table

Next, we applied nmf_nodiag.m to the lipread consonant confusion matrix using 1 ≤ D ≤ 6. A plot of the PRE values is displayed in Fig. 2. Visual inspection of the plot reveals a sharp elbow at D = 3, with a lesser elbow at D = 4. On the basis of the DiffCH measure, D = 3 (DiffCH = 0.17) would be preferred to D = 4 (DiffCH = 0.08). However, on the basis of the RatioCH measure, D = 4 (RatioCH = 4.83) is preferred to D = 3 (RatioCH = 2.70). Once again, we recommend caution when using the RatioCH measure because of its sensitivity to very small changes on the right. In this instance, the PRE = 95.63 % at D = 4, so there is very little room for further improvement by increasing the dimensionality to D = 5. The move to D = 5 only improves the PRE to 97.77 %. This small improvement drives the inflated RatioCH measure for D = 4. Considering the visual inspection of Fig. 2, together with the slope measures overall, D = 3 is more reasonable for the dimensionality. We applied K-means clustering, independently, to the G (using K = 4) and H (using L = 4) matrices obtained by the factorization using D = 3. The selection of K = L = 4 was made for these K-means analyses, to facilitate a comparison with the TMKLMP results. The total computation time for nmf_nodiag.m using the settings of D = 3, K = L = 4, and 500 restarts of the K-means algorithm to obtain the row and column partitions was 21.9 s on the 2.2-GHz Pentium 4 PC.

The K = L = 4 partitions obtained using tmklmp_nodiag.m and nmf_nodiag.m (D = 3) were identical. This solution is displayed in Fig. 3. Two observations regarding the solution are immediately observable: (i) There is considerable symmetry in the partitions—that is, the partition of the consonants as stimuli is strikingly similar to the partition of the same consonants as responses—and (ii) the partitions each consist of three small clusters and one large cluster. The consonants {b, p} form both a row (stimulus) cluster and column (response) cluster. These two letters were highly confused with one another but were not confused strongly with any of the other consonants. The consonant “p” was frequently (.440) a mistaken response for the stimulus “b,” and similarly, “b” was frequently (.424) a mistaken response for the stimulus “p.” The cluster {c, d, t, z} also emerges in both the row and column partitions, because of the generally modest levels of (symmetric) confusion among these four letters. Unlike “b” and “p,” which exhibited high symmetry in their confusion, the consonants “s” and “x” show strong asymmetry in confusion. The consonant “s” was frequently (.582) a mistaken response for stimulus “x,” but “x” was seldom (.074) a mistaken response for stimulus “s.” For this reason, {s, x} emerges as a cluster in the row (stimulus) objects, and {s} is a singleton cluster in the column (response) objects.

One final analysis of the lipread consonant data was conducted by applying tmklmp.m to the confusion matrix. This program includes the main diagonal elements (i.e., the proportions of correct responses for each stimulus letter) in the computation of the submatrix means and the sum of squared deviations. For comparative purposes, we selected K = 4 and L = 4. The tmklmp.m algorithm produced a four-cluster partition of the stimulus (row) letters, whereby the three letters {f}, {w}, and {y} were each a singleton cluster and all other letters were placed in the fourth cluster. The partition of response (column) letters was the same as the row partition. The explanation for the extraction of the singleton clusters is that those three letters have the largest main diagonal elements and, therefore, their isolation reduces the total variation. Clearly, the tmklmp.m solution is far less useful and interesting than the tmklmp_nodiag.m solution. This highlights the importance of having available software that permits the exclusion of the main diagonal.

Example 2

Friendship ties among third-grade students

The second asymmetric one-mode matrix comes from a study of friendship ties and peer group acceptance among elementary schoolchildren (Parker & Asher, 1993). The particular classroom used in this example corresponded to 22 third-grade students (14 boys and eight girls). The data were collected using the roster method: The schoolchildren were provided with a list of their classmates and asked to identify their “very best friend,” three best friends, and as many other friends as they liked. Anderson et al. (1999) analyzed the resulting data using p* models. For simplicity, they ignored the information regarding the degree or strength of the friendship ties. Our analysis herein focuses on the binary matrix corresponding to the friendship ties as published in Anderson et al. (1999, p. 42), where the rows and columns of the data matrix correspond to the senders and receivers of friendship ties, respectively. A value of x _ij = 1 indicates that student i identified student j as one of his or her friends, whereas x _ij = 0 indicates the lack of a friendship tie. The main diagonal of the friendship matrix is arbitrary: Students did not identify themselves as friends. Accordingly, methods ignoring the main diagonal are appropriate.

Results

Given the binary nature of the friendship ties matrix, all three methods (TMKLMP, TMBP, and NMF) can be applied. These data were analyzed using tmklmp_nodiag.m, tmbp_nodiag.m, and nmf_nodiag.m. The tmklmp_nodiag.m program was implemented for all combinations of 2 ≤ K ≤ 5 and 2 ≤ L ≤ 5. The vaf results are reported in Table 2, and the deviance plot is displayed in Fig. 4. Visual inspection of the deviance plot reveals an elbow at ξ = 7, which is supported by having the maximum values of both DiffCH (0.0219) and RatioCH (1.81) achieved at ξ = 7. The solution on the upper boundary of the convex hull for ξ = 7 corresponds to having K = 4 and L = 3. The computation time for tmklmp_nodiag.m using K = 4 and L = 3 was approximately 3.9 s on the 2.2-GHz Pentium 4 PC. Once again, the attraction rate was high, such that the the best-found vaf was identified on 97 % of the 500 restarts of the algorithm.

Table 2 Two-mode KL-means partitioning results for the Parker and Asher (1993) third-grade classroom friendship data: Variance accounted for at different combinations of K and L

Full size table

Next, we applied the tmbp_nodiag.m program for all combinations of 2 ≤ K ≤ 5 and 2 ≤ L ≤ 5. The g(P, Q) results and the numbers of equally well-fitting partitions (shown in parentheses) are reported in Table 3, and the deviance plot is displayed in Fig. 5. Unlike the vaf measure associated with TMKLMP, which we seek to maximize, the goal is to minimize g(P, Q). Therefore, when examining the deviance plot, the lower boundary of the convex hull is the measure of interest. Visual inspection of the deviance plot in Fig. 5 reveals elbows at both ξ = 6 and ξ = 7. There were two equally well-fitting partitions for both the ξ = 6 and 7 levels of model complexity, and both levels also produced a DiffCH measure of 2.0. Although the ξ = 6 solution is more parsimonious, given the competitiveness of ξ = 6 and ξ = 7, we opted for the ξ = 7 solution, to provide consistency with the selected model for TMKLMP. The solution on the lower boundary of the convex hull for ξ = 7 corresponds to K = 4 and L = 3. The computation time for tmbp_nodiag.m using K = 4 and L = 3 was approximately 2 min on the 2.2-GHz Pentium 4 PC. The best-found vaf was identified on only 0.1 % of the 5,000 restarts of the algorithm, which is an appreciably lower attraction rate than what was achieved by tmklmp_nodiag.m.

Table 3 Two-mode blockmodeling (TMBP) results for the Parker and Asher (1993) third-grade classroom friendship data: Numbers of inconsistencies relative to the ideal block structure for different combinations of K and L

Full size table

Finally, we applied nmf_nodiag.m using 1 ≤ D ≤ 5. A plot of the PRE values is displayed in Fig. 6. Visual inspection of the plot reveals a clear and sharp elbow at D = 2—indeed, the only elbow. Moreover, D = 2 is preferred to all other dimensionalities for the factorization based on both the DiffCH (DiffCH = 0.1458) and RatioCH (RatioCH = 2.94) measures. Considering the visual inspection of Fig. 6 and the slope measures, we selected D = 2 as the dimensionality and obtained a (K = 4, L = 3) partition for comparison with the TMKLMP and TMBP results. As we noted previously, the selection of K and L for NMF was based on the TMKLMP solution. The computation time for nmf_nodiag.m using D = 2, K = 4, and L = 3 was approximately 17.8 s on the 2.2-GHz Pentium 4 PC.

The partition obtained by tmklmp_nodiag.m is displayed in Fig. 7. This partition is identical to one of the two equally well-fitting partitions produced by tmbp_nodiag.m (the other equally well-fitting partition differed only by the relocation of one student in the partition of columns). The eight girls form one of the three clusters for the column objects (friendship tie receivers), and the 14 boys are split into two clusters of approximately equal size. The row clusters (friendship tie senders) are a bit more complex: There is one cluster of seven boys, one cluster of four girls, one singleton cluster consisting of one girl, and one cluster consisting of seven boys and three girls.

The friendship tie-sending cluster of four girls exhibits a strong linkage to the tie-receiving cluster consisting of all eight girls, because the resulting submatrix contains mostly 1 s. However, the same cluster of girls exhibits no friendship ties to any of the 14 boys, as is shown by the two null submatrices. Similarly, the friendship tie-sending cluster consisting only of boys (b2, b5, b11, b8, b20, b16, and b19) identifies most of the other boys in the class but very few of the girls. However, the boys in this tie-sending cluster are more strongly linked to the first cluster of tie-receiving boys (b2, b5, b11, b8, b13, and b21) than to the second. The largest friendship tie-sending cluster, consisting of boys and girls (b1, b3, b4, b7, b10, b13, b21, g14, g6, and g22) also has strong linkages to the first cluster of tie-receiving boys (b2, b5, b11, b8, b13, and b21), but not to the other two tie-receiving clusters. Accordingly, the first cluster of tie-receiving boys might be characterized as the “popular boys,” since they are much more heavily linked to the two largest clusters of tie senders than to the other cluster of tie-receiving boys. Finally, we note that g12 emerges as a singleton cluster of tie senders, because she is unique in her identification of almost everyone in the class as friends.

Although interpretable, the partition obtained from using nmf_nodiag.m, shown in Fig. 8, exhibited some marked differences from the TMKLMP and TMBP partition in Fig. 7. The tie-receiving cluster of girls remains intact, as does the tie-sending cluster of four girls. However, the sending and receiving clusters of boys are carved up somewhat differently. Moreover, g12 is folded into a tie-sending cluster with two other girls, despite being appreciably different from those two girls with respect to the pattern of ties. The net result is that the submatrices associated with the NMF solution in Fig. 8 are appreciably less homogeneous than those in the TMKLMP/TMBP solution in Fig. 7: Whereas the number of inconsistencies [g(P, Q)] in Fig. 7 is 77, there are 104 in Fig. 8. This substantial difference in submatrix homogeneity raises concerns regarding the effectiveness of NMF in this application.

Implementation

Acquiring the software

The software programs associated with this article are accessible in the folder ASYM, which can be downloaded from the following website: http://myweb.fsu.edu/mbrusco. Six primary m-file scripts are included in the folder: (i) tmklmp.m, (ii) tmklmp_nodiag.m, (iii) tmbp.m, (iv) tmbp_nodiag.m, (v) nmf.m, and (vi) nmf_nodiag.m. In addition, two m-script subroutines are called by the NMF programs, which are named hkmeans.m and ssquares.m, respectively. There are also two data files, lipread.prn and 3rdGrade.prn, which contain the input matrices for Examples 1 and 2, respectively. Succinct nontechnical pseudocodes for TMKLMP, NMF, and TMBP are provided in Fig. 9. More rigorous descriptions are provided by Brusco and Doreian (2015a), Brusco (2011), and Doreian et al. (2004) for the respective algorithms.

A document in the ASYM folder also contains the source code for all eight m-file scripts. Moreover, for each of the six primary m-scripts, there are descriptions of how to perform the function calls in MATLAB, as well as the primary inputs and outputs of the program. The parameter settings that can be adjusted by users familiar with MATLAB are also identified. Finally, the Word document contains small numerical examples illustrating some of the procedures. As we noted previously, these examples were solved using a 2.2-GHz Pentium 4 PC (circa 2000–2002), and accordingly, the displayed computation times for the examples should be considered conservative.

Choosing among the methods

Two questions concerning the properties of the available asymmetric one-mode data can be used to guide the selection of which method to choose: (i) Should the main diagonal be considered? and (ii) Are the elements of the network matrix binary or nonnegative real-valued? In our experience, the most common answer to question (i) is “no.” Consider, for example, asymmetric matrices that stem from social network analyses, in which it is generally illogical for someone to identify him- or herself as a friend, someone they trust, someone from whom they seek advice, and so forth. Accordingly, in these circumstances, the “nodiag” versions of the programs would seem more appropriate. In other situations the decision might be less clear. For example, in a confusion matrix or brand-switching matrix, the main diagonals represent correct responses and successive purchases of the same brand repeatedly. These measures do have a logical interpretation; however, they often tend to be rather large and could be retained. Therefore, if the goal of the study were to analyze patterns of confusion (or switching), then it still might be preferable to use the “nodiag” programs to avert undue influence from the large diagonal terms. This issue also dovetails with the decision to possibly normalize the input matrix in some manner, which is a thorny matter that is beyond the scope of this article.

If the input matrix consists more generally of nonnegative real-valued integers, then the choice of method is restricted to the TMKLMP and NMF programs, because TMBP is limited to binary data. Our experience is that both TMKLMP and NMF perform well in most instances for nonnegative real-valued data. However, whereas both methods require the specification of the number of row and column clusters (i.e., K and L), NMF has the additional requirement of selecting the dimensionality (D) of the factorization. Moreover, we recommend the use of TMKLMP to select K and L for the NMF program, and therefore, the former procedure would have to be used anyway. Therefore, a reasonable strategy in the case of a nonnegative real-valued asymmetric matrix might be to run TMKLMP first, identify appropriate values for K and L based on CHull, and store the solution corresponding to those K and L values. Subsequently, NMF can be applied using the same K and L, but evaluating different values of D. Upon selection of the appropriate D, the NMF solution could be compared directly to the TMKLMP solution. This basic approach was applied gainfully in the case of the lipread consonant confusion data (Example 1), for which NMF produced a readily interpretable partition comporting well with the TMKLMP solution.

In the case of a binary input matrix, the computational evidence provided herein, as well as in other sources (Brusco, Doreian, Lloyd, & Steinley, 2013; Brusco, Doreian, Mrvar, & Steinley, 2013; Brusco & Steinley, 2007, 2011; Doreian et al., 2004, 2005), reveals the efficacy of TMBP. Accordingly, our general recommendation is that TMBP should typically be evaluated in the case of binary data, despite the fact that its attraction to the best-found objective criterion value across multiple restarts is commonly less than that in TMKLMP. However, the computational evidence reported by Brusco and Steinley (2007) and Brusco and Doreian (2015a) also revealed good performance for TMKLMP for binary data, as well as for nonnegative real-valued data. Although NMF can also be applied to either binary or nonnegative real-valued input matrices, our experience is that it tends to produce more easily interpretable results in the latter case. When applied to the binary friendship social network data among third graders, the NMF solution was less interpretable and, therefore, we do not recommend it highly for binary data.

To summarize, TMKLMP appears to produce good results for both nonnegative real-valued and binary matrices. Nevertheless, it would seem prudent to augment a TMKLMP analysis with either NMF or TMBP in the cases of nonnegative real-valued or binary matrices, respectively. In the examples provided herein, the TMKLMP and NMF methods both produced interpretable results for the lipread consonant confusion data, whereas TMKLMP and TMBP both produced interpretable results for the binary social network data.

Conclusions

Summary

Examples of one-mode asymmetric proximity data abound in the psychological sciences. Examples include data obtained from free association tasks, stimulus recognition tasks, brand selection, and social network analyses. When approaching the problem of partitioning objects in these applications, we suggest that it is generally advisable to adopt a biclustering perspective. More specifically, when the data are arranged in a one-mode asymmetric matrix, two distinct partitions of the objects must be obtained: one partition based on their role in the context of the rows of the matrix, and one partition based on their role in the context of the columns of the matrix. Effective methods for simultaneously establishing these partitions are not readily available in most commercial software packages. Accordingly, our goals were (i) to present three alternative methods for biclustering one-mode asymmetric matrices, (ii) to make available a suite of MATLAB m-files that implement these methods, and (iii) to demonstrate these methods and software using psychologically oriented examples in the literature. Furthermore, the methods are applicable to many areas of behavioral research.

The MATLAB m-file software programs included in the Web supplement associated with this article fall into three categories of methods: (i) two-mode KL-means partitioning (TMKLMP), (ii) nonnegative matrix factorization (NMF), and (iii) two-mode blockmodel partitioning (TMBP). Within each category, two m-file programs are provided, differentiated by the inclusion or exclusion of the main diagonal in the analysis. For example, in the case of TMKLMP, the program tmklmp.m includes the main diagonal in the analysis, whereas tmklmp_nodiag.m ignores the main diagonal. The former program can also be used more generally for any two-mode matrix.

Limitations and extensions

The limitations of the methods presented herein can be characterized along three dimensions: (i) scalability, (ii) suboptimality, and (iii) model selection. Regarding scalability, it is important to recognize that we have made the programs available as MATLAB m-files. Although MATLAB is a user-friendly environment, m-files are not compiled, and therefore they run much slower than comparable codes written in Fortran or C. It also appears that the TMKLMP programs are appreciably more efficient than the TMBP and NMF programs. Nevertheless, most of the m-files should scale for object set sizes of N ≤ 200. Larger matrices can also be tackled; however, it might be necessary to scale back on the number of restarts.

All of the m-files use heuristic procedures to produce solutions to their respective optimization problems. For this reason, a globally optimal solution is not guaranteed. However, as we noted previously, computational results reported in the literature suggest that each of the methods performs well and produces solutions competitive with those obtained by more sophisticated metaheuristics, such as simulated annealing, tabu search, genetic algorithms, and variable neighborhood search. Despite this evidence, we deemed it useful to allow the TMKLMP and TMBP programs to count the numbers of restarts for which the best-found solution was obtained. If the best-found solution was obtained only once or twice out of 500 or 5,000 restarts, it is quite possible that the global optimum was not located. Greater confidence (although no guarantee) of global optimality would be afforded by a larger number of discoveries of the best-found objective function across multiple restarts. Our limited analyses suggest that TMKLMP has a greater attraction to the best-found solution across multiple restarts than does TMBP. A related issue is the agreement among different locally optimal partitions. A capability for the measurement of agreement has not been integrated into the programs at the present time, because it is not entirely clear how users should best interpret such information.

Perhaps the most challenging aspect of all of the procedures described in this article is model selection. For TMKLMP and TMBP, model selection requires the choices of K and L and the decision to include or exclude the main diagonal. The NMF method also requires these decisions, in addition to a choice of D. Furthermore, for real-valued X matrices, transformations of the asymmetric proximity matrix prior to application of the method must be a concern. For example, Brusco and Doreian (2015b) considered applications to journal citation and brand-switching matrices in which a transformation was employed to adjust for scale differences prior to the implementation of TMKLMP.

References

Anderson, C. J., Wasserman, S., & Crouch, B. (1999). A p* primer: Logit models for social networks. Social Networks, 21, 37–66.
Article Google Scholar
Baier, D., Gaul, W., & Schader, M. (1997). Two-mode overlapping clustering with applications in simultaneous benefit segmentation and market structuring. In R. Klar & O. Opitz (Eds.), Classification and knowledge organization (pp. 557–566). Heidelberg: Springer.
Chapter Google Scholar
Baker, F. B., & Hubert, L. J. (1977). Applications of combinatorial programming to data analysis: Seriation using asymmetric proximity measures. British Journal of Mathematical and Statistical Psychology, 30, 154–164.
Article Google Scholar
Brusco, M. J. (2001). Seriation of asymmetric matrices using integer linear programming. British Journal of Mathematical and Statistical Psychology, 54, 367–375.
Article PubMed Google Scholar
Brusco, M. (2011). Analysis of two-mode network data using nonnegative matrix factorization. Social Networks, 33, 201–210.
Article Google Scholar
Brusco, M., & Doreian, P. (2015a). A real-coded genetic algorithm for the two-mode KL-means partitioning problem with application to homogeneity blockmodeling. Social Networks, 41, 26–35.
Article Google Scholar
Brusco, M. J., & Doreian, P. (2015b). An exact algorithm for the two-mode KL-means partitioning problem. Journal of Classification.
Brusco, M., Doreian, P., Lloyd, P., & Steinley, D. (2013a). A variable neighborhood search method for a two-mode blockmodeling problem in social network analysis. Network Science, 1, 191–212.
Article Google Scholar
Brusco, M., Doreian, P., Mrvar, A., & Steinley, D. (2013b). An exact algorithm for blockmodeling of two-mode network data. Journal of Mathematical Sociology, 37, 61–84.
Article Google Scholar
Brusco, M. J., & Stahl, S. (2005a). Bicriterion seriation methods for skew-symmetric matrices. British Journal of Mathematical and Statistical Psychology, 58, 333–343.
Article PubMed Google Scholar
Brusco, M. J., & Stahl, S. (2005b). Optimal least-squares unidimensional scaling: Improved branch-and-bound procedures and comparison to dynamic programming. Psychometrika, 70, 253–270.
Article Google Scholar
Brusco, M. J., & Steinley, D. (2006). Clustering, seriation, and subset extraction of confusion data. Psychological Methods, 11, 271–286.
Article PubMed Google Scholar
Brusco, M., & Steinley, D. (2007). A variable neighborhood search method for generalized blockmodeling of two-mode binary matrices. Journal of Mathematical Psychology, 51, 325–338.
Article Google Scholar
Brusco, M. J., & Steinley, D. (2009). Integer programs for one- and two-mode blockmodeling based on prespecified image matrices for structural and regular equivalence. Journal of Mathematical Psychology, 53, 577–585. doi:10.1016/j.jmp.2009.08.003
Article Google Scholar
Brusco, M., & Steinley, D. (2010). K-balance partitioning: An exact method with application to generalized structural balance and other psychological contexts. Psychological Methods, 15, 145–157. doi:10.1037/a0017738
Article PubMed Google Scholar
Brusco, M., & Steinley, D. (2011). A tabu search heuristic for deterministic two-mode blockmodeling of binary network matrices. Psychometrika, 76, 612–633.
Article Google Scholar
Brusco, M. J., & Steinley, D. (2014). Model selection for minimum diameter partitioning. British Journal of Mathematical and Statistical Psychology, 67, 471–495.
Article PubMed Google Scholar
Ceulemans, E., & Van Mechelen, I. (2005). Hierarchical classes models for three-way three-mode binary data: Interrelations and model selection. Psychometrika, 70, 461–480. doi:10.1007/s11336-003-1067-3
Article Google Scholar
Chino, N. (1978). A graphical technique in representing asymmetric relationships between n objects. Behaviormetrika, 5, 23–40.
Article Google Scholar
Constantine, A. G., & Gower, J. C. (1978). Graphical representation of asymmetric matrices. Applied Statistics, 27, 297–304.
Article Google Scholar
DeCani, J. S. (1972). A branch and bound algorithm for maximum likelihood paired comparison ranking by linear programming. Biometrika, 59, 131–135.
Article Google Scholar
Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized blockmodeling of two-mode network data. Social Networks, 26, 29–53.
Article Google Scholar
Doreian, P., Batagelj, V., & Ferligoj, A. (2005). Generalized blockmodeling. Cambridge: Cambridge University Press.
Google Scholar
Doreian, P., Lloyd, P., & Mrvar, A. (2013). Partitioning large signed two-mode networks: Problems and prospects. Social Networks, 35, 1–21.
Article Google Scholar
Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 211–218.
Article Google Scholar
Flueck, J. A., & Korsh, J. F. (1974). A branch search algorithm for maximum likelihood paired comparison ranking. Biometrika, 61, 621–626.
Article Google Scholar
Fogel, P., Hawkins, D. M., Beecher, C., Luta, G., & Young, S. S. (2013). A tale of two matrix factorizations. American Statistician, 67, 207–218.
Article Google Scholar
Gibbons, D., & Olk, P. M. (2003). Individual and structural origins of friendship and social position among professionals. Journal of Personality and Social Psychology, 84, 340–351.
Article PubMed Google Scholar
Hansohm, J. (2002). Two-mode clustering with genetic algorithms. In W. Gaul & G. Ritter (Eds.), Classification, automation and new media (pp. 87–93). Berlin: Springer.
Chapter Google Scholar
Harshman, R. A., Green, P. E., Wind, Y., & Lundy, M. E. (1982). A model for the analysis of asymmetric data in marketing research. Marketing Science, 1, 205–242.
Article Google Scholar
Hartigan, J. (1972). Direct clustering of a data matrix. Journal of the American Statistical Association, 67, 123–129.
Article Google Scholar
Hubert, L. (1973). Min and max hierarchical clustering using asymmetric proximity measures. Psychometrika, 38, 63–72.
Article Google Scholar
Hubert, L. J. (1976). Seriation using asymmetric proximity measures. British Journal of Mathematical and Statistical Psychology, 29, 32–52.
Article Google Scholar
Hubert, L. J. (1987). Assignment methods in combinatorial data analysis. New York: Dekker.
Google Scholar
Hubert, L., Arabie, P., & Meulman, J. (2001). Combinatorial data analysis: Optimization by dynamic programming. Philadelphia: Society for Industrial and Applied Mathematics.
Book Google Scholar
Köhn, H.-F., Steinley, D., & Brusco, M. J. (2010). The p-median model as a tool for clustering psychological data. Psychological Methods, 15, 87–95. doi:10.1037/a0018535
Article PubMed Google Scholar
Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791.
Article PubMed Google Scholar
Lee, D. D., & Seung, H. S. (2001). Algorithms for nonnegative matrix factorization. In T. L. Keen, T. K. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems 13 (pp. 556–562). Cambridge: MIT Press.
Google Scholar
Madeira, S. C., & Oliveira, A. L. (2004). Biclustering algorithms for biological data analysis: A survey. IEEE Transactions in Computational Biology and Bioinformatics, 1, 24–45.
Article Google Scholar
Manning, S. K., & Shofner, E. (1991). Similarity ratings and confusability of lipread consonants compared with similarity ratings of auditory and orthographic stimuli. American Journal of Psychology, 104, 587–604.
Article PubMed Google Scholar
Okada, A., & Imaizumi, T. (1987). Nonmetric multidimensional scaling of asymmetric proximities. Behaviormetrika, 21, 81–96.
Article Google Scholar
Parker, J. G., & Asher, S. R. (1993). Friendship and friendship quality in middle childhood: Links with peer group acceptance and feelings of loneliness and social dissatisfaction. Developmental Psychology, 29, 611–621.
Article Google Scholar
Prelić, A., Blueler, S., Zimmermann, P., Wille, A., Bühlmann, P., Gruissem, W., . . . Zitzler, E. (2006). A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22, 1122–1129.
Ranyard, R. H. (1976). An algorithm for maximum likelihood ranking and Slater’s i from paired comparisons. British Journal of Mathematical and Statistical Psychology, 29, 242–248.
Article Google Scholar
Schepers, J., Ceulemans, E., & Van Mechelen, I. (2008). Selection among multi-mode partitioning models of different complexities. Journal of Classification, 25, 67–85.
Article Google Scholar
Schepers, J., & Hofmans, J. (2009). TwoMP: A MATLAB graphical user interface for two-mode partitioning. Behavior Research Methods, 41, 507–514. doi:10.3758/BRM.41.2.507
Article PubMed Google Scholar
Schepers, J., & Van Mechelen, I. (2011). A two-mode clustering method to capture the nature of the dominant interaction pattern in large profile data matrices. Psychological Methods, 16, 361–371. doi:10.1037/a0024446
Article PubMed Google Scholar
Schepers, J., Van Mechelen, I., & Ceulemans, E. (2011). The real-valued model of hierarchical classes. Journal of Classification, 28, 363–389. doi:10.1007/s00357-011-9089-5
Article Google Scholar
Steinley, D. (2006). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.
Article PubMed Google Scholar
Steinley, D., & Brusco, M. J. (2011). Choosing the number of clusters in K-means clustering. Psychological Methods, 16, 271--285.
Takeuchi, A., Saito, T., & Yadohisa, H. (2007). Asymmetric agglomerative hierarchical clustering algorithms and their evaluations. Journal of Classification, 24, 123–143.
Article Google Scholar
Totterdell, P., Wall, T., Holman, D., Diamond, H., & Epitropaki, O. (2004). Affect networks: A structural analysis of the relationship between work ties and job-related affect. Journal of Applied Psychology, 89, 854–867. doi:10.1037/0021-901.89.5.854
Article PubMed Google Scholar
Trejos, J., & Castillo, W. (2000). Simulated annealing optimization for two-mode partitioning. In W. Gaul & R. Decker (Eds.), Classification and information at the turn of the millennium (pp. 135–142). Heidelberg: Springer.
Chapter Google Scholar
Van Mechelen, I., Bock, H. H., & DeBoeck, P. (2004). Two-mode clustering methods: A structured overview. Statistical Methods in Medical Research, 13, 363–394.
Article PubMed Google Scholar
van Rosmalen, J., Groenen, P. J. F., Trejos, J., & Castillo, W. (2009). Optimization strategies for two-mode partitioning. Journal of Classification, 26, 155–181.
Article Google Scholar
Van Uitert, M., Meuleman, W., & Wessels, L. (2008). Biclustering sparse binary genomic data. Journal of Computational Biology, 15, 1329–1345.
Article PubMed Google Scholar
Vicari, D. (2014). Classification of asymmetric proximity data. Journal of Classification, 31, 386–420.
Article Google Scholar
Vichi, M. (2001). Double K-means clustering for simultaneous classification of objects and variables. In S. Borra, R. Rocchi, & M. Schader (Eds.), Advances in classification and data analysis—Studies in classification, data analysis and knowledge organization (pp. 43–52). Heidelberg: Springer.
Google Scholar
Wilderjans, T. F., Ceulemans, E., & Meers, K. (2013a). CHull: A generic convex-hull-based model selection method. Behavior Research Methods, 45, 1–15. doi:10.3758/s13428-012-0238-5
Article PubMed Google Scholar
Wilderjans, T. F., Depril, D., & Van Mechelen, I. (2013b). Additive biclustering: A comparison of one new and two existing ALS algorithms. Journal of Classification, 30, 56–74. doi:10.1007/s00357-013-9120-0
Article Google Scholar
Young, G., & Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika, 3, 19–22.
Article Google Scholar
Zielman, B., & Heiser, W. J. (1996). Models of asymmetric proximities. British Journal of Mathematical and Statistical Psychology, 49, 127–146.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Florida State University, Tallahassee, FL, USA
Michael J. Brusco
University of Ljubljana, Ljubljana, Slovenia
Patrick Doreian
University of Pittsburgh, Pittsburgh, PA, USA
Patrick Doreian
University of Missouri, Columbia, MO, USA
Douglas Steinley
Department of Marketing, College of Business, Florida State University, 821 Academic Way, Tallahassee, FL, 32306-1110, USA
Michael J. Brusco

Authors

Michael J. Brusco
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Doreian
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Steinley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael J. Brusco.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brusco, M.J., Doreian, P. & Steinley, D. Biclustering methods for one-mode asymmetric matrices. Behav Res 48, 487–502 (2016). https://doi.org/10.3758/s13428-015-0587-y

Download citation

Published: 22 April 2015
Issue Date: June 2016
DOI: https://doi.org/10.3758/s13428-015-0587-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Biclustering methods for one-mode asymmetric matrices

Abstract