Background

Complex biological systems and processes such as tissue homeostasis [1, 2], neurotransmission [3, 4], immune response [5], ontogenesis [6], and stem cell niches niche [7, 8] are composed of cell–cell interactions (CCIs). Many molecular biology studies have decomposed such systems into constituent parts (e.g., genes, proteins, and metabolites) to clarify their functions. Nevertheless, more sophisticated methodologies are required because CCIs essentially differentiate whole systems from functioning merely as the sum of their parts. Accordingly, micro-level measurements of such parts cannot always explain macro-level biological functions.

Previous studies have investigated CCIs using technologies such as fluorescence microscopy [9,10,11,12,13], microdevice-based methods such as microwells, micropatterns, single-cell traps, droplet microfluidics, and micropillars [14,15,16,17,18,19,20,21,22], and transcriptome-based methods [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52]. In particular, the recent single-cell RNA-sequencing (scRNA-seq) studies have focused on CCIs based on ligand–receptor (L–R) gene co-expression. By investigating the detected cell types through scRNA-seq and the L–R pairs specifically expressed in the cell types, CCIs can potentially be understood at high resolution.

Despite their wide usage, the analytical methods based on L–R pairs are still not mature; such methods implicitly assume that CCIs consist of one-to-one relationships between two cell types and that the corresponding L–R co-expression is observed in a cell-type-specific manner. One study even removed ligand and receptor genes expressed in multiple cell types from their data matrix, assuming one-to-one CCIs [53]. In real empirical data, however, each ligand and receptor gene can be expressed across multiple cell types, and some studies have actually focused on many-to-many CCIs [25, 33, 36, 48, 54]. Such a difference between actual CCI patterns composed of real data and the hypothesis assumed by a model will cause severe bias in the detection of CCIs.

For the above reason, we propose scTensor, which is a novel CCI prediction method based on a tensor decomposition algorithm. Our method regards CCIs as hypergraphs and extracts some representative triadic relationships consisting of ligand-expression, receptor-expression, and related L–R pairs. The main contributions of this article are summarized as follows.

  • We developed a novel simulator to model the CCIs as hypergraphs and quantitatively evaluate the performance of scTensor and other L–R detection methods.

  • We re-implement some L–R detection methods from scratch in order to analyze the same L–R database with all of these methods and focus on only the performance of L–R detection methods, not the slight differences in data pre-processing and the L–R database used.

  • We show that scTensor’s performance with respect to its accuracy of many-to-many CCI detection, computation time, and memory usage are superior to the other L–R detection methods.

  • We describe the implementation of scTensor as an R/Bioconductor package to enable the reproducibility of data analyses as well as continuous maintenance and improvements. We provide some original visualization functions and a function to generate an HTML report in scTensor to enable detailed interpretation of the results. We have extended our framework to work with 125 species.

Results

CCI as a hypergraph

One of the simplest CCI representations is a directed graph, where each node represents a cell type and each edge represents the co-expression of all L–R pairs (Fig. 1a, left). The direction of each edge is set as the ligand expressing cell type \(\rightarrow\) the receptor-expressing cell type. Such a data structure corresponds to an asymmetric adjacency matrix, in which each row and column represents a ligand-expressing cell type and receptor-expressing cell type, respectively. If some combinations of cell types are regarded as interacting, the corresponding elements of the matrix are filled with 1 and otherwise 0. If the degree of CCI is not a binary relationship, weighted graphs and corresponding weighted adjacent matrices may also be used. The previous analytical methods are categorized within this approach [23, 24, 26,27,28,29,30,31,32,33,34, 36, 40, 43,44,45,46, 48, 49, 51, 52, 55].

The drawback of using an adjacency matrix to describe CCIs is that multiple L–R co-expression scores are collapsed into a single value by summation or averaging. Because the average is simply a constant multiple of the sum, here we discuss only the sum. The summed value has no meaning in which L–R pairs are related to the CCI, and therefore CCIs and the related L–R pair lists cannot be detected simultaneously.

In contrast to an adjacency matrix (i.e., graph), the triadic relationship of CCIs also can be described as directed hypergraphs (i.e., CCI as hypergraph; CaH), where each node is a cell type but the edges are distinguished from each other by the different related L–R pair sets (Fig. 1a, right). Such a context-aware edge is called a “hyperedge” and is described as multiple different adjacency matrices. The set of matrices corresponds to a “tensor”, which is a generalization of a matrix to expand its order.

Overview of scTensor

Here we introduce the procedure of scTensor. Firstly, a tensor data is constructed through the following steps (Fig. 1b). A scRNA-seq matrix and the cellular labels specifying cell types are supposed to be provided by users. Firstly, the gene expression values of each cell are normalized by count per median of library size (CPMED [56,57,58]) and logarithm transformation, for variance-stabilization, is performed to the data matrix [i.e., \(\log _{10}{(\textrm{CPMED} + 1)}\)].

Next, the data matrix is converted to a cell-type-level average matrix according to the cell type labels. Combined with an L–R database, two corresponding row-vectors of an L–R pair are extracted from the matrix. The outer product (direct product) of the two vectors is calculated, and a matrix is generated. The matrix can be considered as the similarity matrix of all possible cell-type combinations for each L–R pair. Finally, for each L–R pair, the matrix is calculated, and the tensor \(\mathcal {\chi } \in \mathbb {R}^{J \times J \times K}\), where J is the number of cell types and K is the number of L–R pairs, is generated as the merged matrices. In this work, this tensor is called the “CCI-tensor”.

After the construction of the CCI-tensor, we use the non-negative Tucker2 decomposition (NTD-2) algorithm [59, 60]. NTD-2 decomposes the CCI-tensor as a core tensor \(\mathcal {G} \in \mathbb {R}^{R1 \times R2 \times K}\), and two factor matrices \(\varvec{A}^{\left( 1\right) } \in \mathbb {R}^{J \times R1}\) and \(\varvec{A}^{\left( 2\right) } \in \mathbb {R}^{J \times R2}\), where R1 and R2 are the NTD-2 rank parameters. The factor matrix \(\varvec{A}^{\left( 1\right) }\) describes the R1 of ligand gene expression patterns in each cell type and the factor matrix \(\varvec{A}^{\left( 2\right) }\) describes R2 of receptor gene expression patterns in each cell type, and core tensor \(\mathcal {G}\) describes the degree of association of all the combination (\(R1 \times R2\)) of the ligand and receptor expression patterns of each L–R pair.

The result of NTD-2 is considered the sum of some representative triadic relationships. In this work, each triadic relationship is termed \(\texttt {CaH}{}\left( r1,r2\right)\), which refers to the outer product of three vectors, \(\varvec{A}_{:r1}^{\left( 1\right) }\), \(\varvec{A}_{:r2}^{\left( 2\right) }\), and \(\mathcal {G}_{r1,r2,:}\), where r1 (\(1 \le r1 \le R1\)) and r2 (\(1 \le r2 \le R2\)) are the indices of the columns of the two factor matrices (Fig. 1c). The CaHs are extracted in a data-driven way without the assumption of one-to-one CCIs. Therefore, this approach can also detect many-to-many CCIs according to the data complexity.

Evaluation of many-to-many CCIs detection

To examine the performance of the CCI methods in terms of detecting CaH, we validated the CCI methods using both simulated and empirical datasets (Fig. 2).

We first prepared 90 simulated datasets considering five numbers of cell types (3, 5, 10, 20, or 30), two CCI styles (one-to-one or many-to-many including one-to-many and many-to-one), three numbers of CCI types (1, 3, or 5), and three threshold values (E2, E5, or E10) for recognition of differentially expressed genes (DEGs). According to these conditions, ground truth CCIs were determined (Additional file 1).

Next, we prepared five real empirical datasets (FetalKidney [36], GermlineFemale [25], HeadandNeckCancer [54], Uterus [33], and VisualCortex [48]), each of which focused on many-to-many CCIs in their respective original papers.

There are many L–R scoring methods to quantify the degree of co-expression of ligand and receptor genes. We re-implemented four scores used in many CCI prediction methods to evaluate performance independent of software implementation (Table 1 and Additional file 1). Here, we selected as the four methods sum score (CellPhoneDB [37], Giotto [61], CrossTalkR [62], and Squidpy [63]), product score (NATME [64], FunRes [65], ICELLNET [66], and TraSig [67]) Halpern’s score [68], and Cabello \(\mathrm {-}\) Aguilar’s score (SingleCellSignalR [69] and CellTalkDB [70]), each of which is widely used in many studies.

To differentiate significant CCIs from non-significant CCIs, many CCI methods introduce a label permutation test with a random permutation of cell-type labels to simulate the null distribution of CCIs. This process is considered a kind of binarization (1 for significant CCIs, 0 for non-significant CCIs). For scTensor, binarization was realized by median absolute deviation (MAD) thresholding against each column vector in factor matrices calculated by tensor decomposition.

To quantitatively evaluate how selectively each CCI method was able to detect the ground truth CCIs before and after binarization, nine evaluation measures were introduced. Four of them were applied both before and after binarization, and the remaining five were applied to the results only after binarization.

scTensor could selectively detect many-to-many CCIs in simulated datasets

The area under the curve of precision–recall (AUCPR) and Matthews correlation coefficient (MCC) values of 30 datasets with an E10 threshold value are shown in Fig. 3. For the details of all the evaluation results for all the conditions, see Additional files 312. Figure 3 shows that the AUCPR values can vary among the CCI methods. When the CCI-style was one-to-one (Fig. 3a, left), scTensor (NTD-2) achieved the highest AUCPR scores, and Halpern’s score obtained the second-highest AUCPR values on average. For Halpern’s score, however, binarization has significantly reduced the significant CCIs. This may be explained by Halpern’s score having the lowest FPR (Additional file 10) and the highest FNR (Additional file 11), and it suggests that Halpern’s score is a quite conservative method to detect one-to-one CCIs. When the CCI-style was set as many-to-many, both the previous and current versions of scTensor (NTD-3 and NTD-2, respectively) achieved higher AUCPR values on average (Additional file 4), compared with the other methods (Fig. 3a, right).

Figure 3b shows that the MCC values also varied among the CCI methods. When the CCI-style was one-to-one (Fig. 3b, left), Halpern’s score achieved the highest MCC values and scTensor (NTD-2) obtained the second-highest values on average (Additional file 8). When the CCI-style was many-to-many, scTensor (NTD-2) and sum score obtained the highest MCC values, compared with the other methods (Fig. 3b, right).

Characteristics of scTensor (NTD-2), Halpern ’s score, and sum score

The comprehensive validation described that the three methods (scTensor (NTD-2), Halpern’s score, and sum score) performed better than the others under certain conditions. To further examine the characteristics and trends of each method, we aggregated the number of CCIs detected in three datasets in which each of the three methods excelled (Fig. 4).

scTensor (NTD-2): this method performed well when the CCI style was many-to-many. For example, in Fig. 4a, most many-to-many CCIs could be detected. Although there were some false negative (FN) CCIs that were not detected, there were fewer false positive (FP) CCIs. In contrast, Halpern’s score was too conservative against this dataset and failed to detect most of the CCIs by the label permutation test. At a first glance of Fig. 4a, the sum score appears to work well with these data, but under scrutiny at the level of individual L–R pairs, sum score results contain many FP and FN CCIs (Additional file 13).

Sum score: This method performed well when the number of cell types was small and the style of CCI was restricted to one-to-one (Fig. 4b). Even though Halpern’s score and scTensor (NTD-2) were able to detect similar CCIs, Halpern’s score was quite conservative and contained many FN CCIs because it considered many CCIs to not be significant. For sum score, there seemed to be a bias toward FP CCIs. If the degree of co-expression of an L–R pair is high between two cell types, this method seems to detect FP pairs in which only one of the L–R is highly expressed. In such cases, cross-shaped patterns were observed in the heatmap in Fig. 4. In our simulated datasets, this cross-shaped pattern of FP CCIs were observed more frequently in the sum score.

Halpern’s score: In most data sets, Halpern’s score was found to be too conservative, with many FN CCIs, but in a very specific situation, that is, when the CCI-style was one-to-all (or all-to-one), it outperformed the other methods (Fig. 4c). In contrast, scTensor (NTD-2) inferred many FN CCIs among these data, while the sum score identified many FP CCIs (Additional file 13).

scTensor could selectively detect many-to-many CCIs in real datasets

Next, we applied these CCI methods to real empirical datasets (Table 2 and Additional files 312). As expected from the results of simulated datasets, scTensor (NTD-2) outperformed the other methods on these real datasets, which contain many-to-many CCIs. Regarding AUCPR (Fig. 5a) and MCC (Fig. 5b) values, scTensor (NTD-2) achieved higher values compared with the other methods, although the difficulty of detecting the CCIs was highly dependent on the dataset (Additional files 4, 8). We further investigated the real empirical datasets and found that the known CCIs reported by the original papers were reproduced by scTensor (Table 3). Additionally, some predicted many-to-many CCIs can be considered biologically plausible because the CCIs are related to the same signaling pathways of known CCIs, although the original papers did not refer to the CCIs. These results can be interactively investigated using the HTML report generated by scTensor (Additional files 1317).

Computational complexity and memory usage

We also assessed the orders of computational complexity and memory usage of all the CCI methods (Table 4). All the L–R score methods require \(\mathcal {O}(N^{2}L)\) order both in the computation and in memory usage, where N is the number of cell types and L is the number of L–R pairs. The label permutation tests combining any L–R scores require \(\mathcal {O}(N^{2}LP)\) in computation, where P is the number of random shuffles of cell-type labels. In many cases, P is typically set as a large value greater than 1000 [37, 69], making this a very time-consuming calculation.

In contrast, scTensor (NTD-2) does not perform the label permutation; instead, it simply utilizes the factor matrices after the decomposition of the CCI-tensor. Hence, the order of computational complexity is reduced to \(\mathcal {O}(N^{2}L(R1+R2))\), where R1 and R2 are the number of columns or “rank” parameters for the first- and second-factor matrices, respectively. The rank parameters are typically set as small numbers (e.g., 10), this leads to a substantial computational advantage compared with the label permutation test. The computation time and memory usage when analyzing the simulated and real empirical datasets show that scTensor has an advantage in computational complexity compared with the label permutation test (Additional files 5, 6).

Method comparisons

The first method similar to scTensor is CellChat. This method uses the communication probabilities (3rd-order tensor), which is a CCI tensor constructed with the authors’ original score and is normalized so that the sum in the second mode is 1. In addition to the label permutation test on each L–R pair in the third-order tensor, NMF on the matrix data summarized by the second mode of the 3rd-order tensor is performed to detect global CCI patterns. However, this summarization reduces the order of tensor (i.e., 3rd to 2nd) and loses information on which L–R pairs contributed to the CCI. In particular, as this study has shown, the label permutation test tends to detect one-to-one CCI, whereas NMF may also detect many-vs-many CCI, making it difficult to consider many-vs-many CCI and the L–R pairs that contribute to it simultaneously, even when the two methods are combined. Therefore, models like scTensor that can handle higher-order data as is are preferable.

Inspired by our method, another method Tensor-cell2cell [71] extended our approach to higher-order CCI tensors (e.g., 4th-order tensors) to consider CCIs and the CCI contexts (e.g., disease state, organismal life stage, and tissue microenvironment) simultaneously. Other than its effectiveness for such a higher-order CCI tensor, the main differences between Tensor-cell2cell and scTensor may include the following. First, Tensor-cell2cell is implemented by Python but scTensor is implemented by R. Python offers a wide range of machine learning/deep learning packages, while R offers data preprocessing and visualization with TidyVerse and bioinformatics-related packages with Bioconductor.

Second, there are differences in tensor decomposition models; Tensor-cell2cell performs CANDECOMP/PARAFAC-type non-negative tensor decomposition but scTensor’s model is NTD-2. The difference between these models is the number of rank parameters. Tensor-cell2cell has only one rank parameter, while scTensor has rank parameters for the number of tensor orders (i.e. 3). This difference can be an advantage or a disadvantage; a small number of ranks reduces the computational time required to estimate the optimal ranks but it might make the model too simple.

Third, there are differences in the related tools. Tensor-cell2cell assumes their text file for input, it only supports major species such as mouse and human, and it seems to assume to be used with LIANA [72, 73], another CCI tool of the authors. On the other hand, scTensor supports 124 species (September 5, 2024) in the Bioconductor package LRBase, and can be combined with various other single-cell packages via the SingleCellExperiment object and Seurat (see Implementations and Fig. 6). With an understanding of these differences, users should choose the tool they want to use according to what they want to do.

Implementations

scTensor is implemented as an R/Bioconductor package that is freely available. Both a scRNA-seq dataset and L–R database are required for scTensor execution. The default format for a scRNA-seq dataset is SingleCellExperiment, in which the gene IDs correspond to NCBI’s Gene database to allow links with other databases (Fig. 6a). A scRNA-seq dataset can also be converted from a Seurat object. We provided instructions for this data conversion (https://bioconductor.org/packages/release/bioc/vignettes/scTensor/inst/doc/scTensor_1_Data_format_ID_Conversion.html#case-iii-umi-count).

LRBase, which is the L–R database for scTensor, is stored on a remote server called AnnotationHub and is downloaded to the user’s machine on demand, only when called by the user (Fig. 6a). To extend out method to a wide range of organisms, in this work, we originally constructed and are providing the L–R lists for 125 organisms (https://github.com/rikenbit/lrbase-workflow/blob/master/sample_sheet/sample_sheet.csv). The details of the data processing pipeline are summarized in the README.md of lrbase-workflow (https://github.com/rikenbit/lrbase-workflow), which is a workflow for constructing the LRBase for each of the species. For data sustainability, we offer the data files, including older versions, on the AnnotationHub server. The data files are bi-annually updated in conjunction with Bioconductor updates and are provided using lrbase-workflow. Users can specify which version of the data is used for analysis, thus ensuring data reproducibility.

NTD-2 was implemented as within the function of nnTensor [74] R/CRAN package and internally imported into scTensor. scTensor constructs the CCI-tensor, decomposes the tensor by the NTD-2 algorithm, and generates an HTML report.

To enhance the biological interpretation of CaH results, we implemented some visualization functions (Fig. 6a) and these plots can be interactively investigated via web browser. A wide variety of gene-wise information is included in the report and can be linked to the L–R lists through the use of other R/Bioconductor packages; the gene annotation is assigned by biomaRt [75] (Gene Name, Description, Gene Ontology [GO], STRING, and UniProtKB), reactome.db [76] (Reactome) and MeSH.XXX.eg.db [77] (Medical Subject Headings [MeSH]), while the enrichment analysis (also known as over-representative analysis [ORA]) is performed by GOstats [78] (GO-ORA), meshr [77] (MeSH-ORA), ReactomePA [79] (Reactome-ORA), and DOSE [80] (Disease Ontology (DO)-ORA, Network of Cancer Genes (NCG)-ORA, DisGeNET-ORA).

To validate that the detected the co-expression of L–R gene pairs is also consistently detected in the other data including tissue- or cell-type-level transcriptome data, the hyperlinks to RefEx [81], Expression Atlas [82], SingleCell Expression Atlas [83], scRNASeqDB [84], and PanglaoDB [85] are embedded in the HTML report, facilitating comparisons of the L–R expression results with the data from large-scale genomics projects such as GTEx [86], FANTOM5 [87], the NIH Epigenomics Roadmap [88], ENCODE [89], and the Human Protein Atlas [90]. Additionally, in consideration of users who might want to experimentally investigate detected CCIs, we embedded hyperlinks to Connectivity Map (CMap [91]), which provides relationships between perturbations by the addition of particular chemical compounds/genetic reagents and the resulting gene expression change.

Discussion

In this work, we regarded CCIs as CaHs, which represent the triadic relationships of ligand-expressing cell types, receptor-expressing cell types, and the related L–R pairs. We implemented a novel algorithm scTensor based on a tensor decomposition algorithm for detecting such CaHs. Our evaluations using both simulated and real empirical datasets suggest that scTensor can detect many-to-many CCIs more accurately than the other conventional CCI methods. Additionally, the calculation time and memory usage performances of scTensor are also superior to those of the other CCI methods.

To extend the use of scTensor to a wide range of organisms, we also created multiple L–R datasets for 125 organisms. scTensor has been published as an R/Bioconductor package, facilitating the reproducibility of data analysis and the maintainability of datasets. We also implemented an HTML report function that simplifies checking the analysis results of scTensor. Like many CCI tools, scTensor can import an external L–R database.

In the development of many CCI tools, the authors also develop their own L–R databases and investigate the differences among various L–R databases, particulaly when comparing their method with other conventional methods [72]. This makes it difficult to distinguish whether the performance of a method is caused by differences in algorithms or databases. Although the primary CCI resources used for existing L–R tools are highly duplicated, even slight differences can influence the detection of CCIs [72]. Therefore, to separate these two comparisons and to focus only on the algorithmic differences, in this work, we compared several existing CCI algorithms, by re-implementing them and anchoring them to a common L–R database.

We were also able to examine several strengths and weaknesses of the methods other than scTensor. For example, Halpern’s score was found to be too conservative with many FN CCIs, but it was superior to the other methods with respect to the detection of one-to-all (or all-to-one) CCIs. A possible reason for this is that since the formula for this score includes the square root of the chi-square distribution with two degree of freedom (or an exponential distribution with an expected value of 2), and these distributions are known to be heavy-tailed to some extent, thus potentially inflating the number of significant L–R pairs.

The permutation test implicitly assumes that the interactions occur between very few cell types because the larger the observed L–R score is than the empirical distribution computed by label permutation, the more significant the test result is. However, if the expression levels of ligand and receptor genes are high in any cell of any cell type, the L–R scores calculated by the label permutation are will also be high, and thus, the observed value of the L–R score will be regarded as not a particularly high value in the empirical distribution; consequently, such a test result will be not significant. Therefore, detection of many-to-many CCIs by label permutation test is difficult in principle. In the extreme case of all-to-all CCIs, the current approaches (although it also includes scTensor) cannot avoid FN CCIs.

There are still some plans to improve scTensor to build on the advantages of this current framework. For example, the algorithm can be improved by utilizing acceleration techniques such as randomized algorithm/sketching methods [92], incremental algorithm/stochastic optimization [93, 94], or distributed computing on large-scale memory machines [95] for tensor decomposition, as is now available.

To reduce the memory usage of scTensor, we are developing DelayedTensor [96], which is an R/Bioconductor package to perform various tensor arithmetic and tensor decomposition algorithms based on DelayedArray [97], another R/Bioconductor package for handling out-of-core multidimensional arrays in R. We intend to reduce the memory usage of scTensor by supporting this data format.

Tensor data formats are very flexible ways to represent heterogeneous biological data structures [98], because they easily integrate supplemental information about genes or cell types in a semi-supervised manner. Such information could extend the scope of the data and thus improve the accuracy of inferences. For example, there are some attempts to use the following additional information for CCI detection as well (for more details, see Additional file 1).

  • CCI inference via receptor-receptor and extracellular matrix data [51, 99].

  • Consideration of multi-subunit complexes [37].

  • Comparison of CCIs across multiple conditions [62, 71, 100,101,102,103,104,105,106].

  • Consideration of downstream transcriptional factors, target genes, and signaling pathways [69, 107,108,109].

  • Integration with bulk RNA-Seq or other type of omics datasets [37, 64, 65, 107,108,109,110].

  • Integration with pseudo-time [67, 111, 112].

  • Integration with spatial transcriptome data [113].

In particular, in a recent benchmark study [113], the proximity of spatial coordinates on tissue sections measured by spatial transcriptome technology and the CCI detected by L–R data were correlated, and some studies have attempted to integrate these two kinds of datasets a single model ([113] and Additional file 1). Auxiliary Information such as the proximity in spatial coordinates can be incorporated as a regularization term to extend the tensor decomposition model [114,115,116].

Although it is beyond the scope of the present paper to cover all of the above-mentioned topics, considering these in the framework of tensor decomposition is a promising research direction, so we aim to continuously work on these through the development of updates and releases of scTensor for Bioconductor.

Conclusion

In this work, we present and evaluate scTensor, a new method for detecting CCIs based on L–R co-expression in scRNA-seq datasets. We also revealed that the widely used label permutation test has a bias that impedes the detection of many-to-many CCIs and demonstrated that the proposed method is a viable alternative.

Materials and methods

Simulated datasets

The simulated single-cell gene expression data were sampled from the negative binomial distribution \(NB\left( f_{gc}m_{g},\phi _{g}\right)\), where \(f_{gc}\) is the fold-change (FC) for gene g and cell type c, and \(m_{g}\) and \(\phi _{g}\) are the average gene expression and the dispersion parameter of the expression of gene g, respectively.

The \(m_{g}\) value and gene-wise variance \(v_{g}\) were calculated from a real scRNA-seq dataset of mouse embryonic stem cells (mESCs) measured by Quartz-Seq [117], and the gene-wise dispersion parameter \(\phi _{g}\) was estimated as \(\phi _{g} = \left( v_{g} - m_{g}\right) /m_{g}^2\).

For the determination of differentially expressed genes (DEGs) and non-DEGs, \(f_{gc}\) values were calculated based on the non-linear relationship of FC and the gene expression level \(\log _{10}{f_{gc}} = a \exp (-b \log _{10}{\left( m_{g}+1\right) })\). To estimate the parameters a and b, we detected the DEGs using edgeR. By setting the threshold values (i.e., false discovery rate) of edgeR as \(10^{-2}\) (E2), \(10^{-5}\) (E5), and \(10^{-10}\) (E10) and using the resulting DEGs, a and b values for each threshold were estimated as (0.701, 0.363), (1.907, 0.666), and (4.429, 0.814), respectively.

For genes identified as DEGs based on a threshold according to the non-linear relationship above, the estimated \(f_{gc}\) value was used, otherwise, 1 is specified as \(f_{gc}\). If a ligand gene of a cell type and a receptor gene of a cell type were both DEGs, we defined the relationship between these cell types as the ground truth CCIs and used them for quantitative evaluation.

To simulate the “dropout” phenomenon of scRNA-seq experiments, we also introduced the dropout probability \(p_{dropout}^{gc} = \exp (-c f_{gc}m_{g}^2)\), which is used in ZIFA [118] (default, c=1), and the expression values were randomly converted to 0 according to the dropout probability.

To simulate various situations, we set many different CCI tensors, considering the number of cell types (3, 5, 10, 20, or 30), the style of CCIs (one-to-one or many-to-many including one-to-many and many-to-one), the number of types of CCIs (1, 3, or 5), and the DEG threshold value (E2, E5, or E10); in total, 90 synthetic CCI tensors were generated.

scRNA-seq real datasets

The gene expression matrix of human FetalKidney data was retrieved from the GEO database (GSE109205), and only highly variable genes (HVGs: http://pklab.med.harvard.edu/scw2014/subpop_tutorial.html) with low \(P\) values (\(\le\) 1E−1) were extracted. The cell-type label data were provided by the authors upon our request.

The gene expression matrix of human GermlineFemale data was retrieved from the GEO database (GSE86146), and only HVGs with low \(P\) values (\(\le\) 1E−7) were extracted.

The gene expression matrix and the cell-type labels of human HeadandNeckCancer data were retrieved from the GEO database (GSE103322), and only HVGs with low \(P\) values (\(\le\) 1E−1) were extracted.

The gene expression matrix of mouse Uterus data was retrieved from the GEO database (GSE118180), and only HVGs with low \(P\) values (\(\le\) 1E−1) were extracted. The cell-type labels were provided by the authors upon our request.

The gene expression matrix and the cell-type labels of mouse VisualCortex data were retrieved from the GEO database (GSE102827), and only HVGs with low \(P\) values (\(\le\) 1E−1) were extracted.

The gene expression values of each cell are normalized by CPMED [56,57,58] and logarithm transformation, for variance-stabilization, is performed to the data matrix. For analyzing these real datasets, known L–R pairs in DLRP [119], IUPHAR [120], and HPMR [121] were searched in the data matrix. We defined the ground truth CCIs between two cell types if the CCIs were reported by the original studies. The L–R pairs associated with the CCIs were used for quantitative evaluation.

scTensor algorithm

CCI-tensor construction

Here, data matrix \(\varvec{Y} \in \mathbb {R}^{I \times H}\) is the gene expression matrix of scRNA-seq data, where I is the number of genes and H is the number of cells. Matrix \(\varvec{Y}\) is converted into cell-type-wise average matrix \(\varvec{X} \in \mathbb {R}^{I \times J}\), where J is the number of cell types. The cell-type labels are assumed to be specified by the user’s prior analysis, such as clustering or confirmation of marker gene expression. The relationship between the \(\varvec{X}\) and \(\varvec{Y}\) is described below:

$$\begin{aligned} \varvec{X} = \varvec{Y} \varvec{A}, \end{aligned}$$
(1)

where the matrix \(\varvec{A} \in \mathbb {R}^{H \times J}\) converts cellular-level matrix \(\varvec{Y}\) to cell-type-level matrix X and each element of \({\varvec{A}}\) is

$$\begin{aligned} {\varvec{A}}_{hj} = {\left\{ \begin{array}{ll} 1/n_{j} &{} \left( h\mathrm{th\,\, cell\,\, belongs \,\,to \,\,}j \mathrm{th \,\,cell \,\,type}\right) \\ 0 &{} \left( \text {otherwise}\right) , \end{array}\right. } \end{aligned}$$
(2)

where \(n_{j}\) is the number of cells belonging to the j’s cell type.

Next, we search to determine whether any L–R pairs stored in the L–R database are both in the row names of matrix X, and if both IDs are found, corresponding J-length row-vectors of the ligand and receptor genes (\(\varvec{x}_{L}\) and \(\varvec{x}_{R}\)) are extracted.

Finally, a \({J \times J}\) matrix is calculated as the outer product of \(\varvec{x}_{L}\) and \(\varvec{x}_{R}\) and incrementally stored. The stacked \({J \times J}\) matrices can be considered as a three-dimensional array, which is also known as a three-order tensor. The following outer product in the kth L–R pair (\(L\left( k\right)\) and \(R\left( k\right)\)) found is stored as the frontal slice (sub-tensor) of the CCI-tensor \(\mathcal {\chi } \in \mathbb {R}^{J \times J \times K}\):

$$\begin{aligned} \mathcal {\chi }_{::k} = \varvec{x}_{L\left( k\right) } \circ \varvec{x}_{R\left( k\right) } \end{aligned}$$
(3)

Non-negative Tucker3 decomposition (NTD-3)

To extract the CaHs from the CCI-tensor \(\mathcal {\chi } \in \mathbb {R}^{J \times J \times K}\), we utilize NTD-3 and NTD-2, which are generalizations of non-negative matrix factorization (NMF) to tensor data [59, 60]. The NMF approximates a non-negative matrix data as the product of two lower rank non-negative matrices (also known as factor matrices). Similar to NMF, NTD-3 and NTD-2 approximate a non-negative tensor data as the product of some factor matrices and a core tensor.

To extend NMF to NTD-3, we consider iterative updating \(\varvec{A}^{(n)} \mathcal {G}_{(n)} \varvec{A}^{(-n)T} = \varvec{A}^{(n)} \mathcal {G}_{A}^{(n)} \left( n = 1,2,3 \right)\), which is the matricized expression of tensor decomposition. Here \(\varvec{A}^{(-n)}\) is Kronecker product of the factor matrices without \(\varvec{A}^{(n)}\) and \(\mathcal {G}_{(n)}\) is the mode-n matricization of the core tensor \(\mathcal {G}\). For example, if \(n=1\), these become \(\varvec{A}^{(2)} \otimes \varvec{A}^{(3)}\) and \(\mathcal {G}_{(n)}\), respectively. By replacing X in the multiplicative update rule [60] with \(\varvec{A}^{(n)} \mathcal {G}_{(n)} \varvec{A}^{(-n)T} \left( n = 1,2,3 \right)\), we can obtain the update rule for \(\varvec{A}^{(n)}\) as follows;

$$\begin{aligned} \begin{aligned} \varvec{A}^{(1)}&\leftarrow \varvec{A}^{(1)}\ *\frac{ \varvec{X}_{(1)} \mathcal {G}_{A}^{(1)T}}{\varvec{A}^{(1)} \mathcal {G}_{A}^{(1)} \mathcal {G}_{A}^{(1)T}} \\ \varvec{A}^{(2)}&\leftarrow \varvec{A}^{(2)}\ *\frac{ \varvec{X}_{(2)} \mathcal {G}_{A}^{(2)T}}{\varvec{A}^{(2)} \mathcal {G}_{A}^{(2)} \mathcal {G}_{A}^{(2)T}} \\ \varvec{A}^{(3)}&\leftarrow \varvec{A}^{(3)}\ *\frac{ \varvec{X}_{(3)} \mathcal {G}_{A}^{(3)T}}{\varvec{A}^{(3)} \mathcal {G}_{A}^{(3)} \mathcal {G}_{A}^{(3)T}}. \end{aligned} \end{aligned}$$
(4)

Similarly, the updating rule for core tensor \(\mathcal {G}\) is:

$$\begin{aligned} \begin{aligned} \mathcal {G}&\leftarrow \max \{\mathcal {\chi } \times _{1} \varvec{A}^{(1)T} \times _{2} \varvec{A}^{(2)T} \times _{3} \varvec{A}^{(3)T}, \epsilon \} \\ \mathcal {G}&\leftarrow \frac{\mathcal {\chi } \times _{1} \varvec{A}^{(1)T} \times _{2} \varvec{A}^{(2)T} \times _{3} \varvec{A}^{(3)T}}{\mathcal {G} \times _{1} \varvec{A}^{(1)T} \varvec{A}^{(1)} \times _{2} \varvec{A}^{(2)T} \varvec{A}^{(2)} \times _{3} \varvec{A}^{(3)T} \varvec{A}^{(3)}}, \end{aligned} \end{aligned}$$
(5)

where \(\epsilon\) is a small value included to avoid generating negative values (default value 1E−10).

Non-negative Tucker2 decomposition (NTD-2)

The NTD-3 has three rank parameters to be estimated, and it requires huge search space (\(R1 \times R2 \times R3\)). Additionally, the fewer the factor matrices, the more interpretable the results are. For these reasons, we further expanded the NTD-3 into a model called the NTD-2 [60] since v1.4.0 of scTensor.

In NTD-2, the third factor matrix \(\varvec{A}^{(3)}\), which is related to L–R pairs, is replaced by an identity matrix \(I_{K}\), where the K diagonal elements are all 1 and the iteration step of \(\varvec{A}^{(3)}\) is skipped as follows:

$$\begin{aligned} \begin{aligned} \varvec{A}^{(1)}&\leftarrow \varvec{A}^{(1)}\ *\frac{ \varvec{X}_{(1)} \mathcal {G}_{A}^{(1)T}}{\varvec{A}^{(1)} \mathcal {G}_{A}^{(1)} \mathcal {G}_{A}^{(1)T}} \\ \varvec{A}^{(2)}&\leftarrow \varvec{A}^{(2)}\ *\frac{ \varvec{X}_{(2)} \mathcal {G}_{A}^{(2)T}}{\varvec{A}^{(2)} \mathcal {G}_{A}^{(2)} \mathcal {G}_{A}^{(2)T}}. \end{aligned} \end{aligned}$$
(6)

Here, \(\mathcal {G}_{A}^{(1)} = \mathcal {G}_{1} [\varvec{A}^{(2)} I_{K}]\) and \(\mathcal {G}_{A}^{(2)} = \mathcal {G}_{2} [\varvec{A}^{(1)} I_{K}]\).

The updating rule for core tensor \(\mathcal {G}\) is

$$\begin{aligned} \begin{aligned} \mathcal {G}&\leftarrow \max \{\mathcal {\chi } \times _{1} \varvec{A}^{(1)T} \times _{2} \varvec{A}^{(2)T}, \epsilon \} \\ \mathcal {G}&\leftarrow \frac{\mathcal {\chi } \times _{1} \varvec{A}^{(1)T} \times _{2} \varvec{A}^{(2)T}}{\mathcal {G} \times _{1} \varvec{A}^{(1)T} \varvec{A}^{(1)} \times _{2} \varvec{A}^{(2)T} \varvec{A}^{(2)}}. \end{aligned} \end{aligned}$$
(7)

Rank estimation of NTD-2

To extract the CaHs, scTensor estimates the NTD-2 ranks for each matricized CCI-tensor (\(X^{(n)}\), \(n=1\) or 2). To be able to focus only on the dimensions that are informative and are not noisy, we used an ad hoc approach for NTD-2 rank estimation.

Because NMF is performed in each matricized CCI-tensor in scTensor, we estimated each rank of NMF based on the residual sum of squares (RSS) [122] as

$$\begin{aligned} \frac{\textrm{RSS}_{\textrm{max}} - \textrm{RSS}_{k}}{\textrm{RSS}_{\textrm{max}} - \textrm{RSS}_{\textrm{min}}} > \textrm{thr}_{\textrm{rank}}, \end{aligned}$$
(8)

where \(\textrm{RSS}_{\textrm{max}}\) is the RSS by full rank NMF, \(\textrm{RSS}_{\textrm{min}}\) is the RSS by rank-1 NMF, \(\textrm{RSS}_{\textrm{k}}\) is the RSS by rank-k NMF, and \(\textrm{thr}_{\textrm{rank}}\) is the threshold value, ranging 0 to 1 (the default value is 0.8). RSS by rank-k NMF is calculated between a data matrix X and the reconstructed matrix from W and H calculated by multiplicative updating rule [60] as follows:

$$\begin{aligned} \textrm{RSS}_{k} = \Vert X^{(n)} - W_{k} H_{k}\Vert _{F}^{2}. \end{aligned}$$
(9)

RSS by full-rank and rank-1 NMF is calculated by setting k as J and 1, respectively. With the estimated ranks \((\hat{R1}, \hat{R2})\), NTD-2 was performed, and only the pairs (r1,r2) with large core tensor values are selected as CaHs. In its default mode, scTensor selects CaHs that explain the top 20 pairs sorted by the core tensor values.

Binarization

To binarize each column vector of the factor matrices obtained by NTD-2, median absolute deviation (MAD), which is the median version of standard deviation (SD), was applied. Because we are only interested in the outliers of the elements of each vector in the positive direction, not the negative one, we focused only on the elements that deviate from the median in the positive direction as follows:

$$\begin{aligned} x_{i} = {\left\{ \begin{array}{ll} 1 &{} \left( x_{i} \ge \textrm{median}\left( x \right) + \textrm{thr}_{\textrm{bin}} \times \textrm{MAD}\left( x \right) \right) \\ 0 &{} \left( otherwise \right) . \end{array}\right. } \end{aligned}$$
(10)

Here, \(\textrm{MAD}\left( x\right)\) is \(\textrm{median}\left( \Vert x - \textrm{median}\left( x\right) \Vert \right)\) and \(\textrm{thr}_{\textrm{bin}}\) is the threshold value (the default value is 1.0).

L–R scoring

Several methods have been proposed to score the degree of co-expression of a given L–R pair between two cell types, as described below.

Sum score

The gene expression of a ligand gene l can be averaged over cells belonging to the sth cell type within J cell types as follows:

$$\begin{aligned} x_{l}^{C_{s}} = \frac{1}{N_{C_{s}}}\sum _{c \in C_{s}}{x_{lc}}. \end{aligned}$$
(11)

Here, \(C_{s} \in \left( C_{1}, C_{2}, \ldots , C_{J} \right)\) and \(N_{C_{s}}\) is the number of cells belonging to cell type \(C_{s}\).

Likewise, the gene expression of a receptor gene r is averaged over cells belonging to the tth cell type within J cell types as follows:

$$\begin{aligned} x_{r}^{C_{t}} = \frac{1}{N_{C_{t}}}\sum _{c \in C_{t}}{x_{rc}}. \end{aligned}$$
(12)

Here, \(C_{t} \in \left( C_{1}, C_{2}, \ldots , C_{J} \right)\) and \(N_{C_{t}}\) is the number of cells belonging to cell type \(C_{t}\).

Using these values, the sum score is calculated as follows:

$$\begin{aligned} Score_{sum}^{l,C_{s},r,C_{t}} = x_{l}^{C_{s}} + x_{r}^{C_{t}}. \end{aligned}$$
(13)

For example, some methods such as CellPhoneDB [37], Giotto [61], CrossTalkR [62], and Squidpy [63], essentially use this type of scoring (Table 1 and Additional file 1).

Product score

In some studies, the degree of co-expression is expressed as a product instead of a summation.

$$\begin{aligned} Score_{prod}^{l,C_{s},r,C_{t}} = x_{l}^{C_{s}} \times x_{r}^{C_{t}} \end{aligned}$$
(14)

For example, some methods such as NATME [64], FunRes [65], ICELLNET [66], and TraSig [67], essentially use this type of scoring (Table 1 and Additional file 1).

Halpern’s score

Derived from the sum score, Halpern et al. proposed a score described below (Table 1 and Additional files 1).

In this score, Z-scaling is firstly applied to both \(x_{l}^{C_{s}}\) and \(x_{r}^{C_{t}}\) over J cell types as follows:

$$\begin{aligned} Z_{l}^{C_{s}}= & {} \frac{x_{l}^{C_{s}} - \textrm{mean}\left( x_{L} \right) }{\textrm{std}\left( x_{L} \right) } \end{aligned}$$
(15)
$$\begin{aligned} Z_{r}^{C_{t}}= & {} \frac{x_{r}^{C_{t}} - \textrm{mean}\left( x_{R} \right) }{\textrm{std}\left( x_{R} \right) }. \end{aligned}$$
(16)

Here, \(x_{L} = \left( x_{l}^{C_{1}}, x_{l}^{C_{2}}, \ldots , x_{l}^{C_{J}} \right)\) and \(x_{R} = \left( x_{r}^{C_{1}}, x_{r}^{C_{2}}, \ldots , x_{r}^{C_{J}} \right)\). Then, the square root of the sum of squares of these values is used as the degree of co-expression as follows:

$$\begin{aligned} Score_{\texttt {Halpern}{}}^{l,C_{s},r,C_{t}} = \sqrt{\left( Z_{l}^{C_{s}} \right) ^2 + \left( Z_{r}^{C_{t}} \right) ^2}. \end{aligned}$$
(17)

Cabello \(\mathrm {-}\) Aguilar’s score

Derived from the product score, Cabello \(\mathrm {-}\) Aguilar et al. proposed a score described as follows (Table 1 and Additional file 1):

$$\begin{aligned} Score_{{\texttt {Cabello}\mathrm {-}{} \texttt {Aguilar}}{}}^{l,C_{s},r,C_{t}} = \frac{\sqrt{Score_{prod}}}{\mu + \sqrt{Score_{prod}}}. \end{aligned}$$
(18)

Here, \(\mu\) is the averaged value of the normalized read count matrix and is added as a scaling factor to avoid division by zero. This score is used in SingleCellSignalR [69] and CellTalkDB [70].

Label permutation method

To quantify the deviation of the observed scores obtained from real data, many studies employ \(P\) values in a statistical hypothesis testing framework. Typically, the label permutation method is widely used to calculate \(P\) values. In principle, this method can be used in combination with any L–R score as described above.

Here, we consider assigning a \(P\) value to any type of \(Score^{l,C_{s},r,C_{t}}\) above. In this method, the cluster labels of all the cells are randomly shuffled, and a synthetic score value is calculated. Performing this process 1000 times generates 1000 of the values. These values are used to generate the null distribution; for a combination of cell types, the proportion of the means which are “as or more extreme” than the observed mean is calculated as the \(P\) value. The label permutation test is performed as a one-tailed test; there is a focus on L–R scores with significantly higher values compared to the null distribution, and not on L–R scores with significantly lower values. Because separating significant CCIs from non-significant CCIs by hypothesis testing can be regarded as a binarization process, label permutation results were compared with the results of scTensor binarization.

Quantitative evaluation of CCIs

The CCIs detected by the various methods tested in this paper were compared with ground truth CCIs to quantitatively evaluate the performance of each method. To evaluate the results, we used the metrics below.

Evaluation of the scoring before and after binarization

Each CCI method uses each corresponding L–R score to quantify the degree of co-expression of a given L–R pair between two cell types. To quantitatively evaluate the performance of each score, area under the curve of receiver operating characteristic (AUCROC) and area under the curve of precision–recall (AUCPR) were used.

A receiver operating characteristic (ROC) curve is a plot of the true positive rate (TPR, or the sensitivity = \(\frac{TP}{TP + FN}\)) versus the false positive rate (FPR, or 1 - specificity, where specificity = \(\frac{TN}{TN + FP}\)) (where TP is the number of true positive CCIs, FP is the number of false positive CCIs, TN is the number of true negative CCIs, and FN is the number of false negative CCIs). The AUCROC value is the area under the ROC curve. AUCROC values range from 0 to 1, and the closer the value is to 1, the more the score indicates enrichment of the ground truch CCIs among the inferred CCIs.

A precision–recall curve is a plot of recall (i.e., sensitivity) versus precision (i.e., positive predictive value = \(\frac{TP}{TP + FP}\)). The AUCPR value is the value of the area under the precision–recall curve. AUCPR ranges from 0 to 1, and the closer the value is to 1, the more the score indicates enrichment of the ground truch CCIs among the inferred CCIs. AUCPR is known for its robustness against class imbalance, compared with AUCROC [123,124,125]. Hence, it seems that AUCPR is more appropriate than AUCROC because the number of significant CCIs are assumed to be less than that of non-significant CCIs in both simulated and real empirical data.

To evaluate whether the binarization was properly performed, we also applied these metrics to assess label permutation. As the label permutation test outputs \(P\) values, we utilized 1 − \(P\) value to quantify the degree of co-expression of elements of L–R pairs in the test. Because tensor decomposition is an unsupervised learning methods, we cannot distinguish which CaHs are enriched within the ground truth CCIs in advance. Additionally, we expected that scTensor could separate different styles of CCIs as multiple CaHs. Hence, we used the combination of CaHs from scTensor and ground truth CCIs with the maximum metrics values.

The calculation time and memory usage were evaluated by using the benchmark rules of Snakemake (https://snakemake.readthedocs.io/en/latest/snakefiles/rules.html?highlight=benchmark#benchmark-rules).

Evaluation of the scoring after binarization

Each CCI method uses a threshold value (e.g., \(P\) value, or MAD for scTensor) to differentiate significant CCIs from non-significant CCIs. This process is considered a kind of binarization (1 for significant CCIs, 0 for non-significant CCIs), so we evaluated how well each thresholding strategy could selectively detect the ground truch CCI by comparing the metrics below.

Fig. 1
figure 1

Cell–cell interaction (CCI) as a hypergraph (CaH). a Previous scRNA-seq studies have regarded CCIs as graphs, and the corresponding data structure can be expressed as an adjacency matrix (left). In this work, CCIs are regarded as context-aware edges (hypergraphs), and the corresponding data structure is a tensor (right). b The CCI-tensor is generated by users’ scRNA-Seq matrices, cell-type labels, and ligand–receptor (L–R) databases. NTD-2 is used to extract CaHs from the CCI-tensor. c Each CaH(r1,r2) is equal to the outer product of three vectors. \(A_{:r1}^{(1)}\) represents the ligand expression pattern, \(A_{:r2}^{(2)}\) represents the receptor expression pattern, and \(G_{r1,r2,:}\) represents the patterns of related L–R pairs

Fig. 2
figure 2

Evaluation scheme. To evaluate CCI methods, 90 simulated datasets and six real empirical datasets were prepared. Four ligand–receptor (L–R) scoring methods and six binarization methods were then evaluated. For the evaluation of these methods, area under the receiver operating characteristic curve (AUCROC), area under the precision–recall curve (AUCPRC), memory usage, and computational time were determined. For the evaluation of binarization methods, F-measure, Matthews correlation coefficient (MCC), positive ratio (PR), false positive ratio (FPR), and false negative ratio (FNR) were determined

Fig. 3
figure 3

Results of simulated datasets. a Area under the curve of precision–recall (AUCPR) of all the methods. b Matthews correlation coefficient (MCC) of the binarization methods

Fig. 4
figure 4

Analyses of three datasets in which each of the three methods excelled. Summary of the number of significant cell–cell interactions (CCIs) with a three cell types, one CCI types, one-to-one CCI style, and 1st-CCI type; b 20 cell types, five CCI types, many-to-many CCI style, 2nd-CCI type; and c 30 cell types, five CCI types, many-to-many CCI style, 5th-CCI type. The y-axis (L) and x-axis (R) indicate the ligand-expressing cell types and the receptor-expressing cell types, respectively. FN and FP indicate false negative and false positive CCIs, respectively

Fig. 5
figure 5

Cell–cell interaction (CCI) identification results from real empirical datasets. a Area under the curve of precision–recall (AUCPR) of all the methods. b Matthews correlation coefficient (MCC) of the binarization methods

Fig. 6
figure 6

Implementation of the scTensor package. a scTensor is an R package that requires the input of both an scRNA-Seq expression matrix (SingleCellExperiment or Seurat) and a ligand–receptor (L–R) database (LRBase). The LRBase is retrieved from the AnnotationHub remote server, after which a LRBase object is created. b Using these objects, scTensor generates an HTML report file, and the results of cell–cell interaction (CCI) analysis can be visualized with a wide variety of plots

Table 1 Correspondence between L–R scoring used in this study and previous tools
Table 2 Empirical datasets subjected to cell–cell interaction (CCI) identification
Table 3 Many-to-many cell–cell interactions (CCIs) detected by only scTensor
Table 4 Order of calculation time and memory space for cell–cell interaction (CCI) identification

F-measure is the harmonic mean of precision and recall and is defined as follows:

$$\begin{aligned} F\mathrm {-}measure = \frac{2\ \textrm{precision} \times \textrm{recall}}{\textrm{precision} + \textrm{recall}}. \end{aligned}$$
(19)

Matthews Correlation Coefficient (MCC) is a special case of Pearson correlation coefficient when two variables are both binary vectors. MCC is defined as follows:

$$\begin{aligned} MCC = \frac{\textrm{TP} \times \textrm{TN} - \textrm{FP} \times \textrm{FN}}{\sqrt{ \left( \textrm{TP} + \textrm{FP} \right) \left( \textrm{TP} + \textrm{FN} \right) \left( \textrm{TN} + \textrm{FP} \right) \left( \textrm{TN} + \textrm{FN} \right) }}. \end{aligned}$$
(20)

MCC is widely used for binary classification evaluation and especially known for its robustness against the class imbalance, compared with the other metrics such as accuracy, balanced accuracy, bookmaker informedness, markedness, and F-measure [126,127,128]. Hence, it seems that MCC is more appropriate to use than F-measure because the number of significant CCIs are assumed to be less than that of non-significant CCIs in both simulated and real empirical data.

To distinguish whether the F-measure and MCC values correspond to the number of detected CCIs or their selectivity in focusing only the ground truth CCIs, we also compared the positive rate (PR), false positive rate (FPR), and false negative rate (FNR) values of all the methods.

Availability and requirements

R packages

Snakemake workflows

  • scTensor-experiments (for the analyses conducted in this study): https://github.com/rikenbit/scTensor-experiments

  • lrbase-workflow (for the bi-annual updates of LRBase): https://github.com/rikenbit/lrbase-workflow

  • Operating system: Linux, Mac OS X, Windows

  • Programming language: Python (v\(-\)3.7.8 or higher), Snakemake (v\(-\)6.0.5 or higher), Singularity (v\(-\)3.8.0 or higher)

  • License: MIT

  • Any restrictions to use by non-academics: For non-profit use only