A common data structure is a ℜN×N matrix, X = [x ij ], where x ij is some measure of proximity (similarity or dissimilarity) between objects i and j, for all 1 ≤ iN and 1 ≤ jN objects. In some instances, X is a symmetric matrix, with x ij = x ji for all 1 ≤ iN and 1 ≤ jN objects. For example, when the N objects are measured on a collection of metric variables, a nonnegative N × N symmetric dissimilarity version of X can be established on the basis of the pairwise Euclidean distances between pairs of objects. Symmetry is present because the Euclidean distance between objects i and j is the same as the distance between j and i. The dissimilarity interpretation is appropriate because larger matrix elements (i.e., distances) imply greater dissimilarity (rather than greater similarity) among the pairs of objects.

Although symmetric matrices are common, there are also many psychological applications for which X is an asymmetric matrix. For example, in experimental psychology, when studying confusion among a set of stimulus objects, x ij (for ij) is commonly a measure of the number of instances (or the percentage of instances) for which subjects mistakenly responded with stimulus object j when stimulus object i was actually presented (see Brusco & Steinley, 2006). For any given pair of objects (i, j), asymmetry can potentially occur because a presented stimulus i could be mistaken more frequently for j than stimulus j would be mistaken for i, or vice versa. Likewise, for brand-switching applications in consumer psychology, where x ij (for ij) is a measure reflecting the degree to which the consumers in a sample switch from brand i to brand j, asymmetry occurs because switches from i to j might be more or less frequent than those from j to i. Asymmetry is also apt to be present in matrices associated with social network ties among schoolchildren (in developmental psychological studies—e.g., Anderson, Wasserman, & Crouch, 1999; Parker & Asher, 1993), social groups (in social psychological studies—e.g., Gibbons & Olk, 2003), and organizational members (in studies from industrial/applied psychology—e.g., Totterdell, Wall, Holman, Diamond, & Epitropaki, 2004). Regardless of whether these network examples concern friendship, trust, advice seeking, information sharing, or some other type of relational tie, the potential for asymmetry arises because ties need not be reciprocated: Actor i could identify actor j as a friend, someone they trust, or from whom they seek advice, but actor j might not reciprocate such ties.

Although the rows and columns of the matrix pertain to the same set of objects, the asymmetry in X generally stems from the objects having two different roles. In confusion matrices, the role of the row objects is the “presented stimulus,” whereas the role of the column objects is the “response to a presented stimulus.” Similarly, for brand-switching applications, the role of the row objects is “previously purchased brand” (or “switched from”), with the role of the column objects being “most recently purchased brand” (or “switched to”). In social network situations, the roles of the row and column actors are commonly “tie senders” and “tie receivers,” respectively. For journal citation networks, the row journals could be “cited journals” (or “producers”), with the column journals being “citing journals” (or “consumers”).

There are a variety of methods for analyzing asymmetric proximity data (a recent review of some of these approaches has been provided by Vicari, 2014). Moreover, regardless of the type of method selected, transformation or decomposition processes are sometimes applied prior to analysis. For example, one popular approach is to decompose X into its symmetric (Ψ) and skew-symmetric (Λ) components and to pursue analysis of these components independently or jointly (Brusco & Stahl, 2005a; Hubert, 1987, chap. 4; Hubert, Arabie, & Meulman, 2001, chap. 4; Zielman & Heiser, 1996). The elements of the symmetric component are ψ ij = ψ ji = (x ij + x ji )/2, whereas the elements of the skew-symmetric component are λ ij = (x ij x ji )/2 for all 1 ≤ i, jN. The decomposition of X is then provided by X = Ψ + Λ.

The methods for analyzing asymmetric data include graphical approaches, seriation, unidimensional scaling, multidimensional scaling, hierarchical clustering, and nonhierarchical clustering. Graphical representation procedures have been described by Constantine and Gower (1978) and Chino (1978). Seriation methods, which seek to develop orderings of the objects associated with an asymmetric matrix, have also been proposed by numerous authors (Baker & Hubert, 1977; Brusco, 2001; Brusco & Stahl, 2005b; DeCani, 1972; Flueck & Korsh, 1974; Hubert, 1976; Hubert et al., 2001, chap. 4; Ranyard, 1976). Multidimensional-scaling approaches for asymmetric matrices have been developed by Harshman, Green, Wind, and Lundy (1982), Okada and Imaizumi (1987), and Zielman and Heiser (1996). Here, however, we restrict attention to methods for clustering or partitioning the objects associated with the one-mode proximity data. Hubert (1973) provided one of the earliest investigations of this topic in his discussion of the application of hierarchical clustering methods to asymmetric data. A more recent review of hierarchical methods was provided by Takeuchi, Saito, and Yadohisa (2007), and Brusco and Steinley (2006) discussed approaches within the framework of partitioning.

The presence of asymmetry in a one-mode proximity matrix presents a challenge when one is seeking to identify a partition of the N objects. One approach is to collapse X into a symmetric proximity matrix using some type of transformation procedure, and then to apply traditional one-mode partitioning methods to obtain a solution (see, e.g., Brusco & Steinley, 2006, for a discussion of this approach within the context of confusion data). Unfortunately, this approach discards information associated with the asymmetry. For example, two journals might have very similar roles as producers for the journals that cite them, but markedly different roles as consumers of the journals they cite (see, e.g., Brusco, 2011; Brusco & Doreian, 2015b). A second approach is to sequentially establish partitions based on the row objects and column objects. In the first stage, a symmetric nonnegative matrix of Euclidean distances between the row objects could be established by treating the columns as variables, and a partition of the row objects could then be obtained using hierarchical or nonhierarchical clustering. This process would then be repeated for the column objects by treating the rows as variables. A caveat associated with this approach is that the main diagonal elements are often arbitrary and, therefore, not appropriate for use when computing the distance matrices.

A preferable approach is to adopt a biclustering (Madeira & Oliveira, 2004; Prelić et al., 2006; Van Uitert, Meuleman, & Wessels, 2008; Wilderjans, Depril, & Van Mechelen, 2013) perspective that seeks to simultaneously establish two distinct partitions of the objects: (i) one based on their role as row objects, and (ii) one based on their role as column objects. The term biclustering is used broadly here, and it is noted that alternative terminology, such as two-mode clustering or two-mode partitioning, is also used in some instances. As with the sequential method, biclustering creates flexibility by allowing different numbers of clusters for the row and column objects. However, unlike the sequential approach, the entire asymmetric proximity matrix is used when establishing the row-object and column-object partitions.

Succinctly, the approach adopted herein is to treat the one-mode asymmetric matrix X as though it is two-mode. Strictly, a two-mode matrix has data in which the N row objects are completely distinct from the M column objects. Psychological examples of two-mode matrices include memberships of CEOs on boards of directors and the scores of examinees on test items. A formidable research effort has been devoted to two-mode partitioning problems (Brusco & Doreian, 2015a, b; Brusco, Doreian, Lloyd, & Steinley, 2013; Brusco, Doreian, Mrvar, & Steinley, 2013; Brusco & Steinley, 2007, 2009, 2011; Doreian, Batagelj, & Ferligoj, 2004, 2005; Doreian, Lloyd, & Mrvar, 2013; Schepers, Ceulemans, & Van Mechelen, 2008; Schepers & Van Mechelen, 2011; Schepers, Van Mechelen, & Ceulemans, 2011; Van Mechelen, Bock, & DeBoeck, 2004; van Rosmalen, Groenen, Trejos, & Castillo, 2009; Vichi, 2001; Wilderjans, Depril, & Van Mechelen, 2013). An especially important aspect of applying a biclustering approach to a one-mode matrix concerns the handling of the main diagonal. A two-mode matrix has no main diagonal, but a one-mode matrix clearly does. In some situations, having main diagonal elements is nonsensical, as in the case of social network ties (e.g., are people friends with themselves?). However, in brand-switching applications, the main diagonal can reflect the degree of retention of customers. Similarly, in confusion data, the main diagonal exemplifies correct responses to the presented stimulus.

Ideally, psychological researchers would have easy access to biclustering procedures that could be used for one-mode data and that would have the flexibility to include or exclude the main diagonal, depending on the goals of their studies. Unfortunately, such methods are generally unavailable in commercial software packages. Accordingly, in the spirit of other attempts to identify and make accessible the clustering models and methods that are important for psychological applications (Brusco & Steinley, 2006; Köhn, Steinley, & Brusco, 2010; Schepers & Hofmans, 2009; Schepers & Van Mechelen, 2011), we seek to achieve three interrelated objectives in this article: (i) to briefly describe several methods that can be used for biclustering one-mode matrices, (ii) to present a suite of MATLAB programs that implement these biclustering methods, and (iii) to provide two psychologically oriented examples to demonstrate and compare the procedures and programs. The next section presents three different useful methods for biclustering and describes MATLAB m-files for their implementation. These m-files are available in the Web supplement associated with this article. This is followed by a description of a general process for model selection with biclustering based on ideas gleaned from Schepers et al. (2008), Wilderjans, Ceulemans, and Meers (2013), and Brusco and Steinley (2014). Subsequent sections provide illustrations of the software programs for a real-valued asymmetric matrix and a binary asymmetric matrix, respectively. The article concludes with a brief summary.

Biclustering methods

Two-mode KL-means partitioning

Two-mode KL-means partitioning (TMKLMP) is a generalization of K-means partitioning. Whereas a K-means method (see Steinley, 2006, for a review) partitions a single set of objects into K clusters on the basis of minimization of the sum of squared errors, TMKLMP simultaneously establishes distinct partitions of the row and column objects using an analogous criterion. Although similar approaches have been discussed using different names (Brusco & Steinley, 2007; Hartigan, 1972; van Rosmalen et al., 2009), TMKLMP was recently adopted by Brusco and Doreian (2015a, b), such that the “KL” component of the name reflects the fact that the number of row clusters (K) need not be the same as the number of column clusters (L). The goal is to find row-object and column-object partitions that minimize the within-submatrix sum of squared deviations from the within-cluster means. Formally, we denote P = {S 1, . . . , S K } as a K-cluster partition of the N objects associated with the rows of a data matrix, X, where S k is the set of objects assigned to cluster k for all 1 ≤ kK. The standard conditions of a partition having clusters that are nonempty (S k ≠ ∅, for all 1 ≤ kK), mutually exclusive (S k ∩ S l = ∅ for all 1 ≤ klK), and exhaustive (S 1 ∪ . . . ∪ S k = S) apply. Similarly, we define Q = {T 1, . . . , T L } as an L-cluster partition of the N column objects of X, where T l is the set of objects assigned to cluster l for all 1 ≤ lL. The standard conditions of a partition hold also for Q.

Denoting Π as the set of all partitions of N objects into K clusters and Ω as the set of all partitions of N objects into L clusters, the optimization problem associated with TMKLMP can be specified as follows:

$$ \underset{P\in \varPi, Q\in \varOmega }{\mathrm{Minimize}}:f\left(P,Q\right)={\displaystyle \sum_{k=1}^K{\displaystyle \sum_{l=1}^L{\displaystyle \sum_{i\in {S}_k}{\displaystyle \sum_{j\in {T}_l}{\left({x}_{ij}-{\overline{x}}_{kl}\right)}^2,}}}} $$
(1)

where

$$ {\overline{x}}_{kl}=\frac{{\displaystyle \sum_{i\in {S}_k}{\displaystyle \sum_{j\in {T}_l}{x}_{ij}}}}{\left|{S}_k\right|\left|{T}_l\right|}, $$
(2)

and ∣S k ∣ (1 ≤ kK) and ∣T l ∣ (1 ≤ lL) are the numbers of row objects and column objects in clusters k and l, respectively. Together, the row objects (S k ) in cluster k and the column objects (T l ) in cluster l define a submatrix of X. The value of \( {\overline{x}}_{kl} \) is the mean of the elements in that submatrix, and the degree of homogeneity for the submatrix is a variance type with a measure defined by the sum of the squared deviations of each element from the submatrix mean, (\( {x}_{ij}-{\overline{x}}_{kl} \))2. Perfect homogeneity of a submatrix is achieved when all elements in the submatrix are the same (e.g., all 0 s or all 1 s, in the case in which X is defined by the presence or absence of network ties). From an interpretive standpoint, the row objects (or column objects) in the same cluster can be perceived as being similar, in the sense that they tend to have comparable measures within the submatrices.

Brusco and Doreian (2015b) recently developed an exact algorithm for the TMKLMP that can be successfully applied to problems of size N = 20. A number of heuristic algorithms have been designed for TMKLMP, including simulated annealing (Trejos & Castillo, 2000), genetic algorithms (Brusco & Doreian, 2015a; Hansohm, 2002), and variable neighborhood search (Brusco & Steinley, 2007). However, a two-mode generalization of the K-means algorithm designed by Baier, Gaul, and Schader (1997) has been shown to be competitive with these more sophisticated procedures (van Rosmalen et al., 2009). Accordingly, we make available the MATLAB m-file tmklmp.m, which implements this procedure. The key inputs to tmklmp.m are X, K, and L. The default setting is for 500 restarts of the algorithm from different initial partitions, which is based on the implementations in previous studies (Brusco & Steinley, 2007; van Rosmalen et al., 2009). The principal outputs are the partitions of the row and column objects. Also, the raw objective function value corresponding to Eq. 1 is reported, as well as a normalized measure representing the total variance accounted for (vaf) by the partition, computed as follows:

$$ vaf=1-\frac{f\left(P,Q\right)}{{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=1}^N{\left({x}_{ij}-\overline{x}\right)}^2}}}, $$
(3)

where

$$ \overline{x}=\frac{{\displaystyle \sum_{i=1}^N}{\displaystyle \sum_{j=1}^N}{x}_{ij}}{N^2} $$
(4)

The program tmklmp.m is applicable directly to either two-mode matrices or asymmetric one-mode matrices. However, when applied to the latter data structures, it is important to recognize the inclusion of the main diagonal elements of X in the computation of the submatrix means in Eq. 2, as well as in the sum-of-squares objective function in Eq. 1. As we noted previously, a compelling argument for ignoring the main diagonal elements can be made for many applications. Therefore, we also developed an alternative version, tmklmp_nodiag.m, which ignores the main diagonal elements when computing the submatrix means in Eq. 2 and the sum of squares in Eq. 1. The main diagonal is also ignored when computing the overall mean in Eq. 4 and the denominator term in Eq. 3. The inputs and outputs of this program are identical to those for tmklmp.m.

Nonnegative matrix factorization

The second approach for biclustering of one-mode asymmetric matrices has its roots in pioneering work on matrix decomposition (Eckart & Young, 1936; Young & Householder, 1938). More specifically, low-dimensional least-squares approximations of a two-mode matrix can be obtained using singular value decomposition (SVD) or principal component analysis (PCA). Given that the one-mode asymmetric matrix can be treated as two-mode data, these methods are applicable here, as well. However, two limitations are readily apparent: (i) the potential requirement of ignoring the main diagonal and (ii) the possibility of negative elements in the factors, which can hinder interpretability (Lee & Seung, 1999).

As with the SVD and PCA approaches, nonnegative matrix factorization (NMF) seeks a minimum-sum-of-squares, low-dimensionality approximation of X. However, in contrast to the eigenvectors obtained by SVD and PCA, which can assume both positive and negative values, these components are required to be strictly nonnegative in NMF. The rationale for this constraint, as described by Lee and Seung (1999), is to preserve the condition of having components be the sum of their parts. The arbitrary sign of the elements of a PCA factorization presents the following difficulty, noted by Lee and Seung (1999, p. 789): “As the eigenfaces are used in linear combinations that generally involve complex cancellations between positive and negative numbers, many individual eigenfaces lack intuitive meaning.” Expanding on this point, Fogel, Hawkins, Beecher, Luta, and Young (2013, p. 207) found that NMF “often leads to substantial improvements in the interpretability of the factors,” relative to attempts to transform PCA scores of mixed signs into meaningful factors.

To formalize NMF, we denote D as the desired dimensionality of the factorization. The problem requires the determination of two nonnegative matrices, G and H, the product of which will generate a D-dimensional least-squares approximation of X. Matrix G = [g id ] is an N × D matrix of nonnegative coefficients for the row objects (1 ≤ iN) for each of the D dimensions (1 ≤ dD). Similarly, H = [h dj ] is a D × N matrix of nonnegative coefficients for the column objects (1 ≤ jN) for each of the D dimensions (1 ≤ dD). The optimization problem for NMF is

$$ \mathrm{Minimize}:Z={\left\Vert \mathbf{X}-\mathbf{G}\mathbf{H}\right\Vert}^2={\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=1}^N{\left({x}_{ij}-{\displaystyle \sum_{d=1}^D{g}_{id}{h}_{dj}}\right)}^2,}} $$
(5)
$$ \mathrm{Subject}\ \mathrm{t}\mathrm{o}:\ \begin{array}{ll}{g}_{id}\ge 0,\hfill & \mathrm{f}\mathrm{o}\mathrm{r}\;1\hfill \end{array}\le I\le N\;\mathrm{and}\;1\le d\le D, $$
(6)
$$ \begin{array}{ll}{h}_{dj}\ge 0,\hfill & \mathrm{f}\mathrm{o}\mathrm{r}\ 1\le d\le D\;\mathrm{and}\ 1\le j\le N.\hfill \end{array} $$
(7)

Essentially, the optimization problem is to obtain G and H while minimizing the least-squares loss function in Eq. 5, subject to the nonnegativity constraints in Eqs. 6 and 7. Lee and Seung (2001) described a rescaled gradient descent algorithm for solving the optimization problem stated in Eqs. 57 (see also Brusco, 2011, for a description and extension to the case of an asymmetric one-mode matrix with an arbitrary main diagonal). Once the optimization problem has been solved by obtaining G and H, there are several possibilities for constructing partitions of the row and column objects. Perhaps the simplest approach is to assign each row object i to the cluster d for which g id is maximum, and each column object j to the cluster d for which h dj is maximum. Of course, this approach imposes the restriction that K = L = D. An alternative approach, which is employed herein, is to apply K-means partitioning (with, say, 5,000 restarts) to the G and H matrices in order to obtain K-cluster and L-cluster partitions of the row and column objects, respectively.

We developed a MATLAB m-file implementation of the NMF algorithm, nmf.m. The nmf.m program is similar to the module that is available in the MATLAB system, yet it is differentiated by distinct parameter settings for the descent algorithm and, more importantly, by the inclusion of a K-means clustering subroutine to cluster the rows and columns on the basis of G and H, respectively. The inputs to the program are X, K, L, and D. The outputs of the nmf.m program are the percentage reduction of error (PRE) provided by the D-dimensional factorization, the partition of the row objects, and the partition of the column objects. As was the case for TMKLMP, a second version of the NMF program, nmf_nodiag.m, was prepared that ignores the main diagonal elements in the estimation process. The inputs and outputs of this program are identical to those of nmf.m.

Two-mode blockmodeling

Although the previously described methods can be applied to binary data matrices (i.e., x ij ∈ {0, 1} for all i and j), some analysts might prefer a method that is explicitly designed for such data. Two-mode blockmodeling methods are especially well-suited for binary data (Brusco & Steinley, 2007, 2011; Brusco, Doreian, Lloyd, & Steinley, 2013; Brusco, Doreian, Mrvar, & Steinley, 2013; Doreian et al., 2004, 2005). These methods seek partitions of the row and column objects such that the ideal submatrices formed by the row and column clusters are either complete (having all 1 s) or null (having all 0 s) to the greatest extent possible. Accordingly, the typical objective is to obtain partitions minimizing the number of inconsistencies with this ideal structure, as measured by a count of the total number of violations. More formally, the objective of the two-mode blockmodeling problem (TMBP) that we consider is to minimize the total of the sum of 1 s in submatrices that are mostly 0 s plus the sum of 0 s in submatrices that are mostly 1 s. The optimization problem is:

$$ \underset{\left(P\in \varPi, Q\in \varOmega \right)}{\mathrm{Minimize}}:g\left(P,Q\right)={\displaystyle \sum_{k=1}^K{\displaystyle \sum_{l=1}^L \min }}\left\{{\eta}_{kl},{\rho}_{kl}\right\}, $$
(8)

where

$$ \begin{array}{ll}{\eta}_{kl}={\displaystyle \sum_{i\in {S}_k}{\displaystyle \sum_{j\in {T}_l}{x}_{ij},}}\hfill & \forall 1\le k\le K\; and\;1\le l\le L,\hfill \end{array} $$
(9)

and

$$ \begin{array}{ll}{\rho}_{kl}={\displaystyle \sum_{i\in {S}_k}{\displaystyle \sum_{j\in {T}_l}\left(1-{x}_{ij}\right)}}\hfill & \forall 1\le k\le K\; and\;1\le l\le L.\hfill \end{array} $$
(10)

Brusco, Doreian, Lloyd, and Steinley (2013) designed an exact algorithm for TMBP that is scalable for applications in which N ≤ 20. Heuristic methods for TMBP include tabu search (Brusco & Steinley, 2011) and variable neighborhood search (Brusco, Doreian, Mrvar, & Steinley, 2013). However, recent computational results reported by Brusco, Doreian, Mrvar, and Steinley have shown that a two-mode relocation heuristic designed by Doreian et al. (2004, 2005) often produces solutions that are as good as those obtained by these more sophisticated procedures. For this reason, we prepared a MATLAB m-file, tmbp.m, implementing the relocation heuristic procedure. The key inputs to tmbp.m are X, K, and L. The default setting is for 5,000 restarts of the algorithm from different initial partitions. The principal outputs are the objective function value (Eq. 8) and the partitions of the row and column objects.

Model selection

All three of the methods described in the previous section require the selection of a model. In the case of TMKLMP and TMBP, model selection is generally limited to the selection of K and L. We recommend running these algorithms for all possible different combinations of K and L obtained from the intervals K 1KK 2 and L 1LL 2. The total number of clusters for any given combination is ξ = K + L, which represents the level of complexity of the model. To measure the improvement in the objective function with respect to an increase in complexity, we employ the convex hull (CHull) approach, which has been widely adopted in the context of multimode clustering (Ceulemans & Van Mechelen, 2005; Schepers et al., 2008; Schepers & Van Mechelen, 2011; Wilderjans, Ceulemans, & Meers, 2013).

To illustrate the CHull approach for TMKLMP, we consider the vaf as the objective criterion of interest. The process begins with a deviance plot of the vaf values obtained from all combinations of K and L. The vertical axis of this plot is vaf, and the horizontal axis is the total number of clusters (ξ = K + L). The second step retains only those solutions falling on the upper boundary of the convex hull and places them in an ordered list based on complexity. We use B to denote the total number of solutions falling on the boundary, vaf(b), as the vaf for the solution b (1 ≤ bB), and ξ(b) to denote the complexity of solution b (1 ≤ bB). The upper boundary is used because the goal is to maximize vaf. For a minimization objective function, the lower boundary would be used. The third step is to select one of the B solutions on the basis of visual inspection of the deviance plot and/or the use of measures based on the slopes of segments of the convex hull. A visually based selection of a solution from the boundary is made via a search for an “elbow” in the plot, in a manner similar to the use of a scree plot in factor analysis. Ceulemans and Van Mechelen (2005) have provided two slope-based measures to augment visual inspections. The difference measure, DiffCH, is used to choose the solution b that maximizes

$$ \frac{\left|vaf(b)-vaf\left(b-1\right)\right|}{\left|\xi (b)-\xi \left(b-1\right)\right|}-\frac{\left|vaf\left(b+1\right)-vaf(b)\right|}{\left|\xi \left(b+1\right)-\xi (b)\right|}. $$
(11)

Similarly, the ratio measure, RatioCH, leads to a choice of the solution b maximizing

$$ \frac{\left[\left|vaf(b)-vaf\left(b-1\right)\right|/\left|\xi (b)-\xi \left(b-1\right)\right|\right]}{\left|vaf\left(b+1\right)-vaf(b)\right|/\left|\xi \left(b+1\right)-\xi (b)\right|}. $$
(12)

Although the absolute value signs in Eqs. 11 and 12 are not required in the case of a maximization objective such as vaf, they are included to avoid any sign confusion that might arise in a minimization context, such as g(P, Q) in Eq. 8. Ceulemans and Van Mechelen (2005) found that RatioCH outperformed DiffCH in their simulation study. Nevertheless, some caution regarding the use of RatioCH is advisable, because it can be extremely sensitive to very small changes in the criterion function. To illustrate this, suppose that the vaf(b) values for four possible solutions B – 3, B – 2, B – 1, and B are .6, .9, .92, and .921, respectively. Assuming that ξ(b) = b for all 1 ≤ bB, solution B – 2 has DiffCH = (.9 – .6) – (.92 – .9) = .28. In contrast, solution B – 1 has DiffCH = (.92 – .9) – (.921 – .92) = .019. Solution B – 2 would be preferred to B – 1 on the basis of DiffCH. However, B – 1 is preferred to B – 2 on the basis of RatioCH, because (.92 – .9)/(.921 – .92) = 20 exceeds (.9 – .6)/(.92 – .9) = 15. Arguably, the B – 2 solution is the proper choice. Yet the very small change when moving from B – 1 to B makes B – 1 preferred according to the RatioCH measure. For this reason, we recommend consideration of both measures, along with visual inspection to select a model. Examples of doing this are provided in our empirical examples.

In addition to K and L, NMF also requires the selection of the dimensionality of the factorization. We recommend decoupling the selection of D from the selection of K and L. More specifically, a choice for D is made first on the basis of the analysis of PRE at different values of D. We recommend choosing D by using visual inspection of the plot of PRE measures at different values of D together with the DiffCH and RatioCH measures. Once D is selected, the numbers of clusters (K and L) for NMF can be chosen in a variety of ways. One approach would be to obtain a set of K-means clustering solutions for G using different Ks and to select K on the basis of any of the host of different indices (see, e.g., Steinley & Brusco, 2011). This process would then be repeated for H to choose the best value for L. An alternative approach, adopted here, is to select K and L on the basis of the results obtained using TMKLMP (TMBP could also be used). Although this approach does make NMF dependent on TMKLMP, it has the advantage of enabling the joint (or simultaneous) selection of K and L.

Example 1

Lipread consonant data

The first example comes from a study of confusion among N = 21 lipread consonants (Manning & Shofner, 1991, p. 596). The elements (x ij ) of the data matrix correspond to the proportions of responses given as letter j when letter i was the presented stimulus. All elements of the lipread consonant data are nonnegative real-valued numbers. Accordingly, two-mode KL-means clustering and nonnegative matrix factorization can be applied for analyzing these data. Since their original publication, these lipread consonant data have been reanalyzed using clustering or seriation methods (Brusco & Stahl, 2005b; Brusco & Steinley, 2010). However, the latter studies analyzed the data subsequent to transforming the asymmetric confusion proportions to a symmetric matrix. In contrast, our analysis herein preserves the asymmetry in these data.

The main diagonal of the confusion matrix represents the proportions of correct responses for the corresponding stimulus objects. For most consonants the proportions are, by far, the largest numbers in the rows. At the extreme, the letters {y} and {w} have main diagonal entries of .974 and .968, respectively. However, a few of the less frequently used letters have high confusability with more popular letters, and accordingly, their main diagonal entries are much smaller. Examples include the letters {x} and {z}, which have main diagonal entries of .091 and .151, respectively. Given that the focus of this type of study is usually on the patterns of confusion, as well as on the artificial inflation in variation that is likely to be incurred from large diagonal elements, we emphasize that we used tmklmp_nodiag.m and nmf_nodiag.m for our analyses. However, our presentation of the results will culminate with a brief discussion of the effects of including the main diagonal in the analysis, via the use of tmklmp.m.

Results

We began our analysis by applying tmklmp_nodiag.m to the confusion matrix for all combinations of 2 ≤ K ≤ 9 and 2 ≤ L ≤ 9. The vaf for each combination is displayed in Table 1, and the deviance plot of the vaf values is provided in Fig. 1. An inspection of the upper boundary of the convex hull reveals three levels of complexity (ξ = K + L = 5, ξ = K + L = 7, and ξ = K + L = 9) as not producing a solution on the boundary. Moreover, visual inspection of the plot shows the sharpest elbow occurring at ξ = K + L = 8. The selection of ξ = 8 is supported strongly also by its DiffCH measure of .0743, which is far larger than the corresponding measures for the other values of ξ. The ξ = 8 solution on the upper boundary has the second largest RatioCH measure (2.65) and is appreciably larger than the measures at most other values of ξ. The lone exception is ξ = 16 (RatioCH = 4.29). However, its large measure arises because of the issue noted in the Model Selection section. In this instance, the very small increase in vaf when moving from ξ = 16 to ξ = 17 inflates the RatioCH measure for ξ = 16. Considering this caveat with respect to the RatioCH measure, along with the visual inspection of Fig. 1 and the DiffCH measure, ξ = 8 is a more appropriate level of model complexity. The particular solution on the upper boundary at ξ = 8 corresponds to K = L = 4, and this solution was selected for interpretation. The computation time for tmklmp_nodiag.m for K = L = 4 was approximately 3.3 s on a 2.2-GHz Pentium 4 PC, which is a circa 2000–2002 hardware platform. The best-found vaf was identified on 97 % of the 500 restarts of the algorithm, which is a high attraction rate suggesting (but not guaranteeing) that the global optimum has been found.

Table 1 Two-mode KL-means partitioning results for the Manning and Shofner (1991, p. 596) lipread consonant confusion data: Variance accounted for at different combinations of K and L
Fig. 1
figure 1

Convex hull for the two-mode KL-means partitioning (TMKLMP) application to the lipread consonant data

Next, we applied nmf_nodiag.m to the lipread consonant confusion matrix using 1 ≤ D ≤ 6. A plot of the PRE values is displayed in Fig. 2. Visual inspection of the plot reveals a sharp elbow at D = 3, with a lesser elbow at D = 4. On the basis of the DiffCH measure, D = 3 (DiffCH = 0.17) would be preferred to D = 4 (DiffCH = 0.08). However, on the basis of the RatioCH measure, D = 4 (RatioCH = 4.83) is preferred to D = 3 (RatioCH = 2.70). Once again, we recommend caution when using the RatioCH measure because of its sensitivity to very small changes on the right. In this instance, the PRE = 95.63 % at D = 4, so there is very little room for further improvement by increasing the dimensionality to D = 5. The move to D = 5 only improves the PRE to 97.77 %. This small improvement drives the inflated RatioCH measure for D = 4. Considering the visual inspection of Fig. 2, together with the slope measures overall, D = 3 is more reasonable for the dimensionality. We applied K-means clustering, independently, to the G (using K = 4) and H (using L = 4) matrices obtained by the factorization using D = 3. The selection of K = L = 4 was made for these K-means analyses, to facilitate a comparison with the TMKLMP results. The total computation time for nmf_nodiag.m using the settings of D = 3, K = L = 4, and 500 restarts of the K-means algorithm to obtain the row and column partitions was 21.9 s on the 2.2-GHz Pentium 4 PC.

Fig. 2
figure 2

Convex hull for the nonnegative matrix factorization (NMF) application to the lipread consonant data. The plot represents the proportions of reductions in error (PRE) as a function of the number of dimensions in the factorization (D)

The K = L = 4 partitions obtained using tmklmp_nodiag.m and nmf_nodiag.m (D = 3) were identical. This solution is displayed in Fig. 3. Two observations regarding the solution are immediately observable: (i) There is considerable symmetry in the partitions—that is, the partition of the consonants as stimuli is strikingly similar to the partition of the same consonants as responses—and (ii) the partitions each consist of three small clusters and one large cluster. The consonants {b, p} form both a row (stimulus) cluster and column (response) cluster. These two letters were highly confused with one another but were not confused strongly with any of the other consonants. The consonant “p” was frequently (.440) a mistaken response for the stimulus “b,” and similarly, “b” was frequently (.424) a mistaken response for the stimulus “p.” The cluster {c, d, t, z} also emerges in both the row and column partitions, because of the generally modest levels of (symmetric) confusion among these four letters. Unlike “b” and “p,” which exhibited high symmetry in their confusion, the consonants “s” and “x” show strong asymmetry in confusion. The consonant “s” was frequently (.582) a mistaken response for stimulus “x,” but “x” was seldom (.074) a mistaken response for stimulus “s.” For this reason, {s, x} emerges as a cluster in the row (stimulus) objects, and {s} is a singleton cluster in the column (response) objects.

Fig. 3
figure 3

The (K = 4, L = 4) bipartition for the lipread consonant confusion study by Manning and Shofner (1991, p. 596), which was obtained using both two-mode KL-means partitioning and nonnegative matrix factorization (D = 3). The values in bold are the main diagonal elements, which are ignored in the computation of the variance accounted for (vaf)

One final analysis of the lipread consonant data was conducted by applying tmklmp.m to the confusion matrix. This program includes the main diagonal elements (i.e., the proportions of correct responses for each stimulus letter) in the computation of the submatrix means and the sum of squared deviations. For comparative purposes, we selected K = 4 and L = 4. The tmklmp.m algorithm produced a four-cluster partition of the stimulus (row) letters, whereby the three letters {f}, {w}, and {y} were each a singleton cluster and all other letters were placed in the fourth cluster. The partition of response (column) letters was the same as the row partition. The explanation for the extraction of the singleton clusters is that those three letters have the largest main diagonal elements and, therefore, their isolation reduces the total variation. Clearly, the tmklmp.m solution is far less useful and interesting than the tmklmp_nodiag.m solution. This highlights the importance of having available software that permits the exclusion of the main diagonal.

Example 2

Friendship ties among third-grade students

The second asymmetric one-mode matrix comes from a study of friendship ties and peer group acceptance among elementary schoolchildren (Parker & Asher, 1993). The particular classroom used in this example corresponded to 22 third-grade students (14 boys and eight girls). The data were collected using the roster method: The schoolchildren were provided with a list of their classmates and asked to identify their “very best friend,” three best friends, and as many other friends as they liked. Anderson et al. (1999) analyzed the resulting data using p* models. For simplicity, they ignored the information regarding the degree or strength of the friendship ties. Our analysis herein focuses on the binary matrix corresponding to the friendship ties as published in Anderson et al. (1999, p. 42), where the rows and columns of the data matrix correspond to the senders and receivers of friendship ties, respectively. A value of x ij = 1 indicates that student i identified student j as one of his or her friends, whereas x ij = 0 indicates the lack of a friendship tie. The main diagonal of the friendship matrix is arbitrary: Students did not identify themselves as friends. Accordingly, methods ignoring the main diagonal are appropriate.

Results

Given the binary nature of the friendship ties matrix, all three methods (TMKLMP, TMBP, and NMF) can be applied. These data were analyzed using tmklmp_nodiag.m, tmbp_nodiag.m, and nmf_nodiag.m. The tmklmp_nodiag.m program was implemented for all combinations of 2 ≤ K ≤ 5 and 2 ≤ L ≤ 5. The vaf results are reported in Table 2, and the deviance plot is displayed in Fig. 4. Visual inspection of the deviance plot reveals an elbow at ξ = 7, which is supported by having the maximum values of both DiffCH (0.0219) and RatioCH (1.81) achieved at ξ = 7. The solution on the upper boundary of the convex hull for ξ = 7 corresponds to having K = 4 and L = 3. The computation time for tmklmp_nodiag.m using K = 4 and L = 3 was approximately 3.9 s on the 2.2-GHz Pentium 4 PC. Once again, the attraction rate was high, such that the the best-found vaf was identified on 97 % of the 500 restarts of the algorithm.

Table 2 Two-mode KL-means partitioning results for the Parker and Asher (1993) third-grade classroom friendship data: Variance accounted for at different combinations of K and L
Fig. 4
figure 4

Convex hull for the TMKLMP application to the third-grade friendship data

Next, we applied the tmbp_nodiag.m program for all combinations of 2 ≤ K ≤ 5 and 2 ≤ L ≤ 5. The g(P, Q) results and the numbers of equally well-fitting partitions (shown in parentheses) are reported in Table 3, and the deviance plot is displayed in Fig. 5. Unlike the vaf measure associated with TMKLMP, which we seek to maximize, the goal is to minimize g(P, Q). Therefore, when examining the deviance plot, the lower boundary of the convex hull is the measure of interest. Visual inspection of the deviance plot in Fig. 5 reveals elbows at both ξ = 6 and ξ = 7. There were two equally well-fitting partitions for both the ξ = 6 and 7 levels of model complexity, and both levels also produced a DiffCH measure of 2.0. Although the ξ = 6 solution is more parsimonious, given the competitiveness of ξ = 6 and ξ = 7, we opted for the ξ = 7 solution, to provide consistency with the selected model for TMKLMP. The solution on the lower boundary of the convex hull for ξ = 7 corresponds to K = 4 and L = 3. The computation time for tmbp_nodiag.m using K = 4 and L = 3 was approximately 2 min on the 2.2-GHz Pentium 4 PC. The best-found vaf was identified on only 0.1 % of the 5,000 restarts of the algorithm, which is an appreciably lower attraction rate than what was achieved by tmklmp_nodiag.m.

Table 3 Two-mode blockmodeling (TMBP) results for the Parker and Asher (1993) third-grade classroom friendship data: Numbers of inconsistencies relative to the ideal block structure for different combinations of K and L
Fig. 5
figure 5

Convex hull for the two-mode blockmodeling (TMBP) application to the third-grade friendship data

Finally, we applied nmf_nodiag.m using 1 ≤ D ≤ 5. A plot of the PRE values is displayed in Fig. 6. Visual inspection of the plot reveals a clear and sharp elbow at D = 2—indeed, the only elbow. Moreover, D = 2 is preferred to all other dimensionalities for the factorization based on both the DiffCH (DiffCH = 0.1458) and RatioCH (RatioCH = 2.94) measures. Considering the visual inspection of Fig. 6 and the slope measures, we selected D = 2 as the dimensionality and obtained a (K = 4, L = 3) partition for comparison with the TMKLMP and TMBP results. As we noted previously, the selection of K and L for NMF was based on the TMKLMP solution. The computation time for nmf_nodiag.m using D = 2, K = 4, and L = 3 was approximately 17.8 s on the 2.2-GHz Pentium 4 PC.

Fig. 6
figure 6

Convex hull for the NMF application to the third-grade friendship data. The plot represents the proportions of reductions in error (PRE) as a function of the number of dimensions in the factorization (D)

The partition obtained by tmklmp_nodiag.m is displayed in Fig. 7. This partition is identical to one of the two equally well-fitting partitions produced by tmbp_nodiag.m (the other equally well-fitting partition differed only by the relocation of one student in the partition of columns). The eight girls form one of the three clusters for the column objects (friendship tie receivers), and the 14 boys are split into two clusters of approximately equal size. The row clusters (friendship tie senders) are a bit more complex: There is one cluster of seven boys, one cluster of four girls, one singleton cluster consisting of one girl, and one cluster consisting of seven boys and three girls.

Fig. 7
figure 7

The (K = 4, L = 3) biclustering solution obtained using TMKLMP and TMBP for the friendship ties among 22 third-grade students (see Anderson et al., 1999; Parker & Asher, 1993). The students are labeled using the numbering scheme used in the source, with “b” and “g” being used to label boys and girls, respectively. The solid lines distinguish the row and column clusters

The friendship tie-sending cluster of four girls exhibits a strong linkage to the tie-receiving cluster consisting of all eight girls, because the resulting submatrix contains mostly 1 s. However, the same cluster of girls exhibits no friendship ties to any of the 14 boys, as is shown by the two null submatrices. Similarly, the friendship tie-sending cluster consisting only of boys (b2, b5, b11, b8, b20, b16, and b19) identifies most of the other boys in the class but very few of the girls. However, the boys in this tie-sending cluster are more strongly linked to the first cluster of tie-receiving boys (b2, b5, b11, b8, b13, and b21) than to the second. The largest friendship tie-sending cluster, consisting of boys and girls (b1, b3, b4, b7, b10, b13, b21, g14, g6, and g22) also has strong linkages to the first cluster of tie-receiving boys (b2, b5, b11, b8, b13, and b21), but not to the other two tie-receiving clusters. Accordingly, the first cluster of tie-receiving boys might be characterized as the “popular boys,” since they are much more heavily linked to the two largest clusters of tie senders than to the other cluster of tie-receiving boys. Finally, we note that g12 emerges as a singleton cluster of tie senders, because she is unique in her identification of almost everyone in the class as friends.

Although interpretable, the partition obtained from using nmf_nodiag.m, shown in Fig. 8, exhibited some marked differences from the TMKLMP and TMBP partition in Fig. 7. The tie-receiving cluster of girls remains intact, as does the tie-sending cluster of four girls. However, the sending and receiving clusters of boys are carved up somewhat differently. Moreover, g12 is folded into a tie-sending cluster with two other girls, despite being appreciably different from those two girls with respect to the pattern of ties. The net result is that the submatrices associated with the NMF solution in Fig. 8 are appreciably less homogeneous than those in the TMKLMP/TMBP solution in Fig. 7: Whereas the number of inconsistencies [g(P, Q)] in Fig. 7 is 77, there are 104 in Fig. 8. This substantial difference in submatrix homogeneity raises concerns regarding the effectiveness of NMF in this application.

Fig. 8
figure 8

The (K = 4, L = 3, D = 2) biclustering solution obtained using NMF for the friendship ties among 22 third-grade students (see Anderson et al., 1999; Parker & Asher, 1993). The students are labeled using the numbering scheme used in the source, with “b” and “g” used to label boys and girls, respectively. The solid lines distinguish the row and column clusters

Implementation

Acquiring the software

The software programs associated with this article are accessible in the folder ASYM, which can be downloaded from the following website: http://myweb.fsu.edu/mbrusco. Six primary m-file scripts are included in the folder: (i) tmklmp.m, (ii) tmklmp_nodiag.m, (iii) tmbp.m, (iv) tmbp_nodiag.m, (v) nmf.m, and (vi) nmf_nodiag.m. In addition, two m-script subroutines are called by the NMF programs, which are named hkmeans.m and ssquares.m, respectively. There are also two data files, lipread.prn and 3rdGrade.prn, which contain the input matrices for Examples 1 and 2, respectively. Succinct nontechnical pseudocodes for TMKLMP, NMF, and TMBP are provided in Fig. 9. More rigorous descriptions are provided by Brusco and Doreian (2015a), Brusco (2011), and Doreian et al. (2004) for the respective algorithms.

Fig. 9
figure 9

Nontechnical pseudocode for TMKLMP, NMF, and TMBP

A document in the ASYM folder also contains the source code for all eight m-file scripts. Moreover, for each of the six primary m-scripts, there are descriptions of how to perform the function calls in MATLAB, as well as the primary inputs and outputs of the program. The parameter settings that can be adjusted by users familiar with MATLAB are also identified. Finally, the Word document contains small numerical examples illustrating some of the procedures. As we noted previously, these examples were solved using a 2.2-GHz Pentium 4 PC (circa 2000–2002), and accordingly, the displayed computation times for the examples should be considered conservative.

Choosing among the methods

Two questions concerning the properties of the available asymmetric one-mode data can be used to guide the selection of which method to choose: (i) Should the main diagonal be considered? and (ii) Are the elements of the network matrix binary or nonnegative real-valued? In our experience, the most common answer to question (i) is “no.” Consider, for example, asymmetric matrices that stem from social network analyses, in which it is generally illogical for someone to identify him- or herself as a friend, someone they trust, someone from whom they seek advice, and so forth. Accordingly, in these circumstances, the “nodiag” versions of the programs would seem more appropriate. In other situations the decision might be less clear. For example, in a confusion matrix or brand-switching matrix, the main diagonals represent correct responses and successive purchases of the same brand repeatedly. These measures do have a logical interpretation; however, they often tend to be rather large and could be retained. Therefore, if the goal of the study were to analyze patterns of confusion (or switching), then it still might be preferable to use the “nodiag” programs to avert undue influence from the large diagonal terms. This issue also dovetails with the decision to possibly normalize the input matrix in some manner, which is a thorny matter that is beyond the scope of this article.

If the input matrix consists more generally of nonnegative real-valued integers, then the choice of method is restricted to the TMKLMP and NMF programs, because TMBP is limited to binary data. Our experience is that both TMKLMP and NMF perform well in most instances for nonnegative real-valued data. However, whereas both methods require the specification of the number of row and column clusters (i.e., K and L), NMF has the additional requirement of selecting the dimensionality (D) of the factorization. Moreover, we recommend the use of TMKLMP to select K and L for the NMF program, and therefore, the former procedure would have to be used anyway. Therefore, a reasonable strategy in the case of a nonnegative real-valued asymmetric matrix might be to run TMKLMP first, identify appropriate values for K and L based on CHull, and store the solution corresponding to those K and L values. Subsequently, NMF can be applied using the same K and L, but evaluating different values of D. Upon selection of the appropriate D, the NMF solution could be compared directly to the TMKLMP solution. This basic approach was applied gainfully in the case of the lipread consonant confusion data (Example 1), for which NMF produced a readily interpretable partition comporting well with the TMKLMP solution.

In the case of a binary input matrix, the computational evidence provided herein, as well as in other sources (Brusco, Doreian, Lloyd, & Steinley, 2013; Brusco, Doreian, Mrvar, & Steinley, 2013; Brusco & Steinley, 2007, 2011; Doreian et al., 2004, 2005), reveals the efficacy of TMBP. Accordingly, our general recommendation is that TMBP should typically be evaluated in the case of binary data, despite the fact that its attraction to the best-found objective criterion value across multiple restarts is commonly less than that in TMKLMP. However, the computational evidence reported by Brusco and Steinley (2007) and Brusco and Doreian (2015a) also revealed good performance for TMKLMP for binary data, as well as for nonnegative real-valued data. Although NMF can also be applied to either binary or nonnegative real-valued input matrices, our experience is that it tends to produce more easily interpretable results in the latter case. When applied to the binary friendship social network data among third graders, the NMF solution was less interpretable and, therefore, we do not recommend it highly for binary data.

To summarize, TMKLMP appears to produce good results for both nonnegative real-valued and binary matrices. Nevertheless, it would seem prudent to augment a TMKLMP analysis with either NMF or TMBP in the cases of nonnegative real-valued or binary matrices, respectively. In the examples provided herein, the TMKLMP and NMF methods both produced interpretable results for the lipread consonant confusion data, whereas TMKLMP and TMBP both produced interpretable results for the binary social network data.

Conclusions

Summary

Examples of one-mode asymmetric proximity data abound in the psychological sciences. Examples include data obtained from free association tasks, stimulus recognition tasks, brand selection, and social network analyses. When approaching the problem of partitioning objects in these applications, we suggest that it is generally advisable to adopt a biclustering perspective. More specifically, when the data are arranged in a one-mode asymmetric matrix, two distinct partitions of the objects must be obtained: one partition based on their role in the context of the rows of the matrix, and one partition based on their role in the context of the columns of the matrix. Effective methods for simultaneously establishing these partitions are not readily available in most commercial software packages. Accordingly, our goals were (i) to present three alternative methods for biclustering one-mode asymmetric matrices, (ii) to make available a suite of MATLAB m-files that implement these methods, and (iii) to demonstrate these methods and software using psychologically oriented examples in the literature. Furthermore, the methods are applicable to many areas of behavioral research.

The MATLAB m-file software programs included in the Web supplement associated with this article fall into three categories of methods: (i) two-mode KL-means partitioning (TMKLMP), (ii) nonnegative matrix factorization (NMF), and (iii) two-mode blockmodel partitioning (TMBP). Within each category, two m-file programs are provided, differentiated by the inclusion or exclusion of the main diagonal in the analysis. For example, in the case of TMKLMP, the program tmklmp.m includes the main diagonal in the analysis, whereas tmklmp_nodiag.m ignores the main diagonal. The former program can also be used more generally for any two-mode matrix.

Limitations and extensions

The limitations of the methods presented herein can be characterized along three dimensions: (i) scalability, (ii) suboptimality, and (iii) model selection. Regarding scalability, it is important to recognize that we have made the programs available as MATLAB m-files. Although MATLAB is a user-friendly environment, m-files are not compiled, and therefore they run much slower than comparable codes written in Fortran or C. It also appears that the TMKLMP programs are appreciably more efficient than the TMBP and NMF programs. Nevertheless, most of the m-files should scale for object set sizes of N ≤ 200. Larger matrices can also be tackled; however, it might be necessary to scale back on the number of restarts.

All of the m-files use heuristic procedures to produce solutions to their respective optimization problems. For this reason, a globally optimal solution is not guaranteed. However, as we noted previously, computational results reported in the literature suggest that each of the methods performs well and produces solutions competitive with those obtained by more sophisticated metaheuristics, such as simulated annealing, tabu search, genetic algorithms, and variable neighborhood search. Despite this evidence, we deemed it useful to allow the TMKLMP and TMBP programs to count the numbers of restarts for which the best-found solution was obtained. If the best-found solution was obtained only once or twice out of 500 or 5,000 restarts, it is quite possible that the global optimum was not located. Greater confidence (although no guarantee) of global optimality would be afforded by a larger number of discoveries of the best-found objective function across multiple restarts. Our limited analyses suggest that TMKLMP has a greater attraction to the best-found solution across multiple restarts than does TMBP. A related issue is the agreement among different locally optimal partitions. A capability for the measurement of agreement has not been integrated into the programs at the present time, because it is not entirely clear how users should best interpret such information.

Perhaps the most challenging aspect of all of the procedures described in this article is model selection. For TMKLMP and TMBP, model selection requires the choices of K and L and the decision to include or exclude the main diagonal. The NMF method also requires these decisions, in addition to a choice of D. Furthermore, for real-valued X matrices, transformations of the asymmetric proximity matrix prior to application of the method must be a concern. For example, Brusco and Doreian (2015b) considered applications to journal citation and brand-switching matrices in which a transformation was employed to adjust for scale differences prior to the implementation of TMKLMP.