Background

Despite ongoing advancements in the field of oncology, cancer remains a significant threat to human health, primarily due to its inherent complexity. There is a broad consensus that cancer cells are not solitary entities; rather, they engage in intricate interactions with adjacent immune and stromal cells. This complex interplay contributes to the formation of an advanced tumor microenvironment that is fundamentally involved in both the onset and progression of cancer [1]. The advent of molecular targeted therapy and immunotherapy has brought hope for improved cancer treatment. However, the heterogeneous nature of tumors results in only a minority of patients responding effectively to these treatments [2]. Consequently, there is an urgent need for cutting-edge research techniques and innovative approaches in current cancer-related studies.

The spatial architecture of a tumor plays a critical role in its development and progression. This structural trait is markedly pronounced within certain biological niches, where tumor cells and the encompassing microenvironment engage in dynamic reciprocal interactions. These interactions often foster an environment conducive to immune suppression, thereby facilitating tumor immune evasion and posing a significant hurdle to effective cancer therapy [3]. Furthermore, the spatial structure of tumors not only influences their growth but also drives their metastasis [4]. The unique arrangement of specific tissues can significantly influence the metastasis and colonization abilities of tumor cells. Thus, a comprehensive exploration of tumor biology through a spatial lens is not merely beneficial but essential. It provides deeper insights into the intricate and inherent mechanisms that drive tumor development and progression. This understanding is fundamental in devising more targeted and effective therapeutic strategies, which aim to disrupt these spatially-influenced processes and provide potential for improved cancer treatment outcomes.

Since the emergence of next-generation sequencing (NGS) technology in 2005 [5], there has been ongoing innovation in the life sciences research. This technology has enabled scientists to explore previously uncharted areas of biomedical research. By 2009, single-cell resolution transcriptomic technology was first proposed [6], offering a more sensitive approach to transcriptome sequencing. This technique has uncovered previously unknown cell types, provided new insights into cellular heterogeneity and the tissue microenvironment [7, 8]. However, a notable limitation of single-cell sequencing is the loss of spatial information for cells due to tissue dissociation, impeding the study of cellular interactions between tumors and their surrounding components. To address this limitation, spatially resolved transcriptomics was developed, which is capable of preserving the spatial positioning of transcript molecules. This advancement allows for a more comprehensive understanding of complex tissues, such as the tumor microenvironment (TME), facilitating a deeper exploration of disease mechanisms and potential therapeutic targets.

In this review, we present a comprehensive overview of the developmental history of spatial transcriptomics (ST) technology up to early 2024, along with its latest applications in the field of cancer research. To aid clinical practitioners and biomedical professionals in mastering the ST data analysis, we comprehensively describe various aspects of ST data analysis and have created an online analysis guide for reference at https://github.com/SiyuanHuang1/ST_Analysis_Handbook. Finally, we synthesize current trends in this field to provide insights into the future directions of ST.

Developmental history of ST

ST technologies have been generally divided into two main categories according to their detection strategies: imaging-based methods and sequencing-based methods. Imaging-based ST methods encompass in situ hybridization (ISH) and in situ sequencing (ISS). The sequencing-based ST methods include laser capture microdissection (LCM)-based methods and methods based on spatial barcode. Main technologies in the developmental history of ST are given in Fig. 1.

Fig. 1
figure 1

Development history of ST

Imaging-based ST technologies

Imaging-based ST technologies, specifically those utilizing immunofluorescence imaging to detect spatial gene expression, can be categorized into two main categories: in situ hybridization (ISH) and in situ sequencing (ISS). ISH technology has emphasized in-situ hybridization of mRNA with fluorescently labeled gene-specific probes, while ISS technology has employed sequencing by ligation (SBL) to directly sequence amplified mRNA within tissue blocks or slices.

The foundation of ISH technology was traced back to the 1970s [9]. In 1969, Gall et al. utilized radiolabeled ribosomal RNA probes to visualize ribosomal RNA in African clawed frog egg cells [10]. A significant advancement came in 1998 with the introduction of single-molecule fluorescence in situ hybridization (smFISH) method by Femino et al., enabling the detection of individual mRNA molecules and providing quantitative molecular information at the single-cell level [11]. However, smFISH was limited in terms of spatial exploration of the cellular transcriptome due to the spectral overlap of fluorophores, which restricted the detection of only a limited number of genes simultaneously. Despite its high sensitivity, smFISH, considered a gold standard in RNA quantification methods, faced challenges in analyzing a broad range of transcripts. To overcome these limitations, new ISH methods emerged, such as sequential fluorescence in situ hybridization (seqFISH) [12], multiplexed error-robust fluorescence in situ hybridization (MERFISH) [13], and seqFISH + [14]. These methods used cyclic hybridization for combinatorial labeling, significantly expanding the detection range compared to smFISH. SeqFISH utilized the same probes in multiple hybridization rounds, each round using different combinations of four fluorophores to label corresponding target genes. However, the use of only 4–5 fluorescent dyes in seqFISH led to optical congestion, thus restricting the analysis of numerous transcripts [12]. It is important to note that as the number of imaging cycles increases, the detectable RNA species grew exponentially, but so did the potential for detection errors. To address this issue, MERFISH introduced a new encoding strategy that used binary-coded secondary probes with or without fluorescent groups rather than fluorescent dye sequences to mark RNA [13]. MERFISH evolved to the whole transcriptome level, demonstrating its applicability to expansive research areas [15]. SeqFISH + was an upgraded version of seqFISH that employs 60 pseudo-colors distributed across three fluorescence channels, allowing for the analysis of the transcriptome of 10,000 genes in a single cell, representing a significant advancement in ISH methods by achieving transcriptome-level detection for the first time [14]. ISH methods have shown remarkable sensitivity and subcellular resolution in transcript coverage. However, the considerable imaging time required by these methods necessitates the use of extremely thin tissue slices, often resulting in incomplete cell representation [16].

ISS methods, on the other hand, identify spatial transcriptome information through high-throughput sequencing on clusters generated in situ. A common procedure in these methods involves using rolling circle amplification (RCA) to generate DNA nanoballs, which not only preserve spatial information but also enhance detection signals. The first example of ISS was proposed in 2013 [17]. In the targeted in situ sequencing program of ISS, padlock probes were employed to specifically bind to cDNA reverse-transcribed from mRNA, followed by RCA for amplification. In 2014, fluorescence in situ sequencing (FISSEQ) was introduced alongside ISS [18]. In contrast to ISS, FISSEQ was a non-targeted method that initially reversed transcribing mRNA with labeled random hexamers. After circularization and RCA, cDNA formed single-stranded DNA nanoballs with diameters ranging from 200 to 400 nm, and SBL sequencing read the concatenated cDNA sequences. This technology specifically read the very short 3' end and, due to the low efficiency of in situ reverse transcription and cDNA circularization, was only capable of detecting transcripts at levels well below 1% of total cellular transcripts [19]. To mitigate background noise caused by single-probe errors, spatially-resolved transcript amplicon readout mapping (STARmap) utilized pairs of DNA probes to label cellular RNA [20]. BaristaSeq, in contrast to other methods that employed sequencing by ligation, made use of improved gap-filling padlock probes and employed Illumina synthesis sequencing, offering improved signal-to-noise ratio detection [21]. This technique was commonly used for sequencing a single barcode per cell [19]. ExSeq combined ISS or FISSEQ with expansion microscopy, allowing targeted or non-targeted detection of RNA [22].

Both ISH and ISS methods utilize pre-designed probes and high-resolution microscopy for detection, and enable the analysis of mRNA at subcellular resolution. RCA plays a critical role in enhancing the signal-to-noise ratio, enabling ISS to achieve fewer accumulated errors with fewer hybridization rounds and detect larger tissue areas with lower magnification. However, due to the lower efficiency of RCA amplification and its stronger selectivity [9], ISS can detect fewer genes than ISH. Single-molecule microscopy imaging technology is highly complex, requiring precise instruments and procedures. Additionally, the image data generated using this technology often reach terabytes. The incorporation of multiple imaging cycles and multiple hybridization rounds further consume longer experimental periods and numerous sample handling steps. These factors contribute to the complexity and generally higher experimental costs of ISH and ISS technologies.

Laser capture microdissection-based sequencing methods

Laser capture microdissection (LCM) transfer and tissue analysis were proposed in 1996, initially utilized for polymerase chain reaction amplification and enzyme recovery in specific tissue regions [23]. Subsequently, LCM was combined with comprehensive transcriptome analysis to facilitate the detailed examination of specific cell groups. Although this approach generally required a substantial number of cells, it presents a notable advantage by eliminating the need for tissue dissociation. The Tomo-seq technique was developed to attain spatially resolved, genome-wide expression profiles [24]. By combining LCM with full-length mRNA-sequencing, a robust and highly efficient strategy (LCM-seq) was developed for single-cell transcriptomics. This strategy provided biological insights into the characteristics and functions of similar neuronal populations [25]. Geographical position sequencing (Geo-seq) was a technique that fused laser capture microdissection with single-cell RNA-seq technology to investigate cellular diversity and spatial variability simultaneously [26]. Spatial-histopathological examination-linked epitranscriptomics converged to transcriptomics with sequencing (Select-seq) enabled the acquisition of both transcriptomic and epitranscriptomic data [27].

Several ST technologies employed light-controlled methods to select the region of interest (ROI) instead of involving tissue sectioning. These techniques could be categorized as LCM-like ST technologies. NICHE-seq enabled the isolation and examination of cells from visually chosen specific areas in transgenic mice expressing a photoactivatable green fluorescent protein [28]. This approach was utilized to pinpoint distinct niches for T and B cells within the lymph nodes and spleens of mice following viral infection. ZipSeq used patterned illumination and photocaged oligonucleotides to serially print barcodes onto live cells in intact tissues, allowing real-time and on-the-fly selection of patterns [29]. Digital Spatial Profiling (DSP) was an advanced technique well-suited for formalin-fixed, paraffin-embedded (FFPE) samples, providing detailed spatial analysis of proteins or RNAs [30]. Light-Seq was designed for the in situ spatial labeling of target molecules within specified ROIs, enabling ex situ next-generation sequencing without damaging the sample [31]. Although LCM methods had notable advantages, they typically offered limited spatial resolution and throughput compared to other ST techniques.

ST technology based on spatial barcode

The concept of ST technology based on spatial barcode was first proposed in 2016 [32]. This method identified messenger RNA transcripts by transferring RNA molecules onto a glass slide. The slide was coated with reverse-transcription oligo(dT) primers that contained a unique molecular identifier (UMI) and a spatial barcode, enabling the retrieval of original transcript locations. Although this technique has innovated the approach to transcriptome research, it faced challenges in terms of low RNA capturing efficiency and limited spatial resolution, with a spot diameter of 100 μm and a spot-to-spot distance of 200 μm. To address this limitation, 10 × Genomics reformed and commercialized the method. achieving a higher resolution with a spot diameter of about 55 μm and a spot-to-spot distance of approximately 100 μm. Another approach, Slide-seq, utilized polystyrene beads randomly distributed on a slide, with each bead carrying a unique barcode and attains a spatial resolution of 10 μm [33, 34]. High-definition spatial transcriptomics (HDST) employed a silicon wafer as the base, further improving the spatial resolution to 2 μm [35]. However, the advanced spatial resolution achieved by both Slide-seq and HDST hinges on the random allocation of spatially barcoded beads. Consequently, pinpointing their precise spatial arrangement necessitated labor-intensive imaging-based in situ sequencing techniques. The DBiT-seq (deterministic barcoding in tissue for spatial omics sequencing) method enabled the simultaneous mapping of mRNAs and proteins in formaldehyde-fixed tissue sections by combining microfluidic barcoding with next-generation sequencing [36]. Seq-Scope was a two-phase sequencing technology that first created a spatially barcoded oligonucleotide array for mRNA capture and then used Illumina NGS to sequence captured mRNAs, linking each to precise array coordinates for high-resolution spatial transcriptomics.[37]. Although ST techniques offered both multicellular and subcellular resolution, they did not directly provide spatial transcriptomic data at a true single-cell resolution. Sci-Space captured single-cell spatial transcriptomes by recording cells' original locations and adding spatial barcodes with poly(A) tails, allowing for the retrieval of single-cell transcriptomes while approximating true spatial coordinates [38]. Stereo-seq technology utilized a chip with densely packed active spots for high-definition mRNA capture, featuring a proprietary 25 nucleotide sequence for spatial coordinate identification [39]. This enabled the generation of detailed spatial transcriptomic maps with nanometer-scale resolution and extensive capture coverage. Pixel-seq achieved single-cell spatial transcriptomics by utilizing ultrafine clonal DNA clusters on Polony gels, which significantly enhanced spatial resolution by limiting template diffusion and preventing transcript spread between cells [40].

In recent developments in the field of ST based on spatial barcode, two significant breakthroughs emerged. Russell et al. developed Slide-tags, a novel method enabling single-cell (nuclei) resolution [41]. This technique involved 'tagging' cellular nuclei from tissue sections with spatial barcode oligonucleotides, which were derived from DNA-barcoded beads, under ultraviolet exposure. Slide-tags also offered the advantage of enabling the profiling using existing single-cell methods with the addition of spatial positions. Another significant advancement in ST technology based on spatial barcode was the introduction of the Visium HD spatial gene expression assay by 10 × Genomics. This technique employed a whole transcriptome probe panel and achieves single-cell scale resolution within intact tissue sections. The HD array was composed of approximately 12 million 2 µm × 2 µm spatially-barcoded areas without gaps. This significantly enhances spatial resolution compared to the previous Visium platform.

Spatial barcode-based spatial transcriptomics technologies stand out due to their innovative approach to mapping mRNA transcripts by translocating them onto slides with reverse-transcription primers that include UMIs and spatial barcodes. This enables precise mapping of each transcript's original location, providing a high-resolution spatial context to transcriptome data. Advancements in this domain, such as the development of Slide-seq and HDST, have dramatically enhanced spatial resolution down to 2 µm, enabling the detailed visualization of transcript distribution at near single-cell resolution. The deterministic nature of DBiT-seq and the single-cell spatial resolution of Slide-tags further underscore the rapid evolution of this field, offering unparalleled insights into the cellular composition of tissues while maintaining the integrity of spatial information. These methods surpass other spatial transcriptomics techniques by allowing for extensive multiplexing and the ability to handle whole transcriptomes, which is a significant leap forward in capturing comprehensive cellular behavior within intact tissue environments. Despite challenges such as lower RNA capture efficiency and the need for complex imaging to pinpoint bead arrangements, the benefits of these techniques, particularly their enhanced resolution and capacity for detailed spatial mapping, position them as powerful tools for advancing our understanding of complex biological systems.

Application of ST in cancer research

Understanding the TME and tumor heterogeneity is crucial for effective cancer treatment and prognosis. Traditional single-cell techniques allow researchers to examine and compare genetic and functional features at the single-cell level, unraveling the identities of distinct cell types within complex tissues. This is vital for revealing the intricacies of tumor heterogeneity and the multifaceted complexity of the TME. In order to overcome the limitations of spatial resolution in single-cell RNA sequencing, ST has emerged as a valuable tool for obtaining spatially distributed transcriptomic data from tissue slices. By generating a cell atlas with spatial resolution for solid tumors, this technique facilitates the inference of cell interactions based on co-localization patterns [42], the exploration of tumor infiltration regions and boundaries of the stroma [43], the analysis of spatial immune niches [44], and the identification of specific cell subpopulations with unique distributions within the tissue structure [45]. These insights are crucial for understanding spatial patterns of cell interactions, as well as the distribution of cell types and tumor heterogeneity within the TME. A summary of the applications of ST technology in cancer research is provided in Table 1.

Table 1 Application of ST technology in cancer research

Applications of ST technology in TME

The TME is primarily composed of a diverse range of cell types, including malignant cells, immune cells, stromal cells, neurons, smooth muscle cells, lymphatic vessels, and blood vessels. These cell types can be further classified into various subtypes based on their specific gene expression patterns. Within the TME, these diverse cell types interact spatially, occupying ecological niches to form a complex ecosystem [87]. ST has emerged as a powerful technique for elucidating the ecological structure of the TME by revealing cell types, cell states, and the interactions between different cell types.

ST elucidates the complex interactions within the immune microenvironment by revealing cell-type-specific interactions and structures that are critical for the immune response to cancer. For instance, in a study on HER2-positive breast cancer (HER2 + BC), Andersson et al. explored the spatial gene expression patterns and discovered specific interactions between macrophages and T cell subtypes during the type I interferon response. Furthermore, they identified co-localization between B cells and T cells, unveiling tertiary lymphoid structures (TLS) in a spatial context [46]. Liu et al. combined ST with single-cell RNA sequencing and multi-immunofluorescence to uncover the tumor immune barrier (TIB) structure—a spatial niche composed of SPP + macrophages and cancer-associated fibroblasts (CAFs) located near the tumor boundary. The study demonstrated that the impact of spatial structures on the immunotherapeutic efficacy in hepatocellular carcinoma (HCC) patients receiving anti-PD-1 treatment [53]. Spatial organizational information provided by ST has assisted researchers to propose TLS-50 features for precise spatial localization of TLS, highlighting their unique composition determined by their proximity to tumor cells [51].

The application of ST in cancer research has provided insights into the dynamic remodeling of the TME in response to cancer progression and metastasis, highlighting the evolving landscape of cellular populations and the formation of specialized niches. Hwang et al. conducted single-nucleus RNA sequencing and Digital Spatial Profiling (DSP) analysis on 43 primary pancreatic ductal adenocarcinoma (PDAC) tumor specimens. This approach enabled the construction of a high-resolution spatial map of cell community distribution within the PDAC microenvironment. Through their analysis, they identified expression programs prevalent in both PDAC malignant cells and fibroblasts, unveiling three multicellular communities comprising various cell subtypes within the TME [52]. Qi et al. explored the spatial landscape of primary and metastatic tumors in non-small cell lung cancer (NSCLC) brain metastases (BrMs), and revealed extensive remodeling of the TME in the brain, leading to the formation of an immunosuppressive and fibrotic niche for BrMs [49]. In another study of invasive TME in lung adenocarcinoma (LUAD), Zhu et al. conducted an integrated analysis using single-cell RNA-sequencing (scRNA-seq) and ST to characterize the cellular atlas of LUAD invasion trajectories. They observed a continuous increase in the UBE2C + cancer cell subpopulation during LUAD invasion, while mast cells, monocytes, and lymphatic endothelial cells decreased [50].

ST technology enables the identification of spatial genomic markers and their relation with treatment responses, offering potential predictive markers for therapeutic efficacy and mechanisms of resistance. In glioblastoma (GBM) research, Ravi et al. utilized ST and other omics data to decipher shared transcriptional programs in regions among patients. Their findings indicated that the tissue structure of GBM exhibits spatial separation in lineage states, which adapt to inflammation and/or metabolic stimuli [54]. In colorectal cancer (CRC) research, Ozato et al. combined patient ST data with public scRNA-seq data. They discovered that CRC cells located at the invasive front secreted human leukocyte antigen G (HLA-G), inducing the formation of SPP1 + macrophages specific to the invasive front, thereby endowing CRC cells with anti-tumor immunity and proliferative drive [55]. Wu et al. conducted scRNA-seq and ST sequencing on 97 matched samples from colorectal cancer liver metastasis (CRLM) patients, revealing the presence of highly metabolically activated MRC1 + CCL18 + M2-like macrophages and other immunosuppressive cells in the metastatic TME. Additionally, extensive cell-to-cell interactions were detected between metastatic tumor cells and these immunosuppressive cells [56].

Applications of ST technology in tumor heterogeneity

Tumor heterogeneity is a critical factor that impacts the development of cancer at both intertumoral and intratumoral levels. Intertumoral heterogeneity refers to variations observed among multiple tumors within a single patient or among patients with the same tumor type. On the other hand, intratumoral heterogeneity involves genetic, phenotypic, and metabolic variations among homogeneous or heterogeneous tumor cells within a patient [88]. Intratumoral heterogeneity can arise from intrinsic factors such as genomic instability and epigenetic genetic variations, as well as extrinsic factors which are primarily stemmed from the influence of the TME [89]. Tumor heterogeneity holds significant clinical relevance, as it affects crucial aspects of tumor initiation, disease progression, and treatment outcomes [90]. Precious studies have demonstrated the utility of single-cell transcriptomic techniques in elucidating tumor heterogeneity.

With the advancement of ST technology, researchers can further delineate regions with similar molecular profiles in spatial dimensions and elucidate high-resolution spatial features of both intertumoral and intratumoral heterogeneity. For intertumoral heterogeneity, Wang et al. conducted a study on multiple primary lung cancers (MPLC), utilizing scRNA-seq and ST to describe the spatial heterogeneity distribution among MPLC lesions as well as the spatial and cellular characteristics of MPLC-lung adenocarcinoma (LUAD) histopathological patterns [57]. Moreover, they applied ST and multiplexed immunohistochemistry (mIHC) in LUAD, identifying macrophage phenotypic heterogeneity as a key feature and revealing subtype-specific heterogeneity in the dedifferentiated state of LUAD [58]. Numerous studies have concentrated their investigative efforts on exploring intratumoral heterogeneity, leveraging ST to uncover the complexity within tumors. For example, Andersson et al. employed ST in the study of HER2 + BC to discover clusters enriched in immune response-related pathways, apoptosis, and regulatory pathways, all marked by the ERBB2 gene. These clusters were spatially separated and displayed varying degrees of immune infiltration, highlighting the differences in immune response across different lesions within the same patient. [46]. In a study on triple-negative breast cancer (TNBC), Bassiouni et al. combined biological annotation with spatial analysis to reveal the intrinsic tumor heterogeneity within TNBC, identifying distinct transcriptional clusters that suggest a conserved spatio-transcriptional framework across a racially diverse cohort [47]. In GBM research, Ravi et al. utilized ST to identify five spatially distinct transcriptional programs. They further explored the relationship between these programs and the subclonal structure of tumors, unveiling the dynamic interaction between the TME and spatiotemporal changes in transcriptional heterogeneity [54].

ST has also been employed to assist in modeling the spatial structure of tumor cells. Karras et al. combined scRNA-seq and ST to establish a spatially stratified model for proliferative and invasive states of melanoma cells, demonstrating that the ability to grow and metastasize is confined to distinct cellular communities [59]. Another study by Lin et al. reconstructed CRC in three dimensions (3D) by combining cyclic immunofluorescence (CyCIF), Hematoxylin and Eosin (H&E) images, scRNA-seq, and ST from CRC tissue samples [60].

Furthermore, ST has been used to decipher the impact of the tumor microorganism community on tumor heterogeneity. Galeano et al. employed ST to identify the types and locations of intratumoral microbial communities in patient, revealing highly organized spatial distribution and their role in promoting cancer progression [91].

Analyzing ST data

Currently, spatial barcode-based ST platforms have achieved notable commercial success, with examples like Visium and Stereo-seq leading the way. However, it's worth noting that imaging-based ST methods are less commonly employed in tumor research. Therefore, in the following section, we will exclusively focus on illustrating the common workflow for analyzing spatial barcode-based data, using Visium as a prime example. Common analysis modules and tools for ST data are summarized in Fig. 2 and Table 2, respectively.

Fig. 2
figure 2

Common analysis modules of ST data

Table 2 ST data analyses tools

Preprocessing and expression enhancement

In recent barcode-based ST studies, each sequenced spot is tagged with a UMI, designed to capture RNA sequences and measure the specific gene expression of the respective locations. However, empirical evidence has revealed a notable phenomenon known as ‘spot swapping’, referring to cross-contamination among adjacent spots, which may present challenges for ST analysis. Therefore, a probabilistic framework named SpotClean has been developed to mitigate the impact of spot swapping in ST experiments and further enhance downstream analysis [92]. This probabilistic model provides improved estimates of UMI counts for every gene at each spot by removing contamination from spot swapping. Furthermore, SpotClean has demonstrated significant improvements in marker gene identification and spatial domain detection.

The shallow depth of ST sequencing is associated with lower capture efficiency, consequently leading to a higher rate of dropouts. Recently, some computational tools have been developed to impute the missing expression levels due to dropouts by aggregating the expression data of the spatially neighbor spots. Furthermore, some of these methods consider the similarity of histological structures observed in histology images when selecting the spatially neighbor spots. One such method, Sprod, employs latent graph learning techniques to integrate gene expression data and imaging data, allowing for the precise imputation and denoising of ST gene expression [93]. Another method, MIST, is capable of region detection, boundary detection, and region-based imputation of missing expression values [94]. Based on the pre-determined regions, MIST utilizes a region-based mini-batch of low-rank approximation algorithm to estimate the final denoised expression values by averaging the outcomes from all mini-batches. GNTD, on the other hand, is a graph-guided neural tensor decomposition model designed for reconstructing whole spatial transcriptomes in tissues [95]. GNTD employs a hierarchical tensor structure and formulation to explicitly model high-order spatial gene expression data with a hierarchical nonlinear decomposition in a three-layer neural network. This model is enhanced by incorporating spatial neighborhood relations and prior gene functional modules, ensuring accurate reconstruction of ST expression data. The imputed spatial transcriptomes provide a more complete gene expression landscape, facilitating downstream analyses such as spatial domain detection, functional interpretation of spatially co-expressed gene clusters, differential expression analyses, pathway enrichment, and cell-to-cell interaction inferences.

Enhancing the resolution of ST data is also a form of expression enhancement, which aims to improve the resolution level of expression of sequenced area and recover the high-resolution expression of unsequenced area. The related algorithms can be mainly categorized into two types: (1) those leveraging spatial smoothness assumption for expression extrapolation; and (2) those utilizing high-resolution images to predict spot-level expression and further achieve high-resolution expression recovery. The first category of methods includes DIST [96] and BayesSpace [97]. DIST, a deep learning method, combines self-supervised and transfer learning techniques to refine gene expression data for both the originally measured spots and the unmeasured locations. BayesSpace dissects ST data at sub-spot resolution by adopting a Bayesian statistical method with a Markov random field which utilizes spatial neighborhood information. The second category leverages image information to obtain ST data with pixel-level resolution. XFuse integrates spatial transcriptome data with concurrent histological tissue images to infer gene expression profiles in areas that are not covered by the measured spots [98]. ImSpiRE employs an optimal transport approach to reassign expression profiles from existing spots, thereby creating refined transcriptional profiles with increased resolution [99]. ImSpiRE also has the ability to impute gene expression profiles in regions that have not been directly measured. TESLA generates super pixels and predicts the expression level of super pixels by aggregating the expression of spatially neighbor spots with similar histological image [100]. Subsequently, TESLA can annotate different tumor/TME cell types on the histology image with super pixel-level resolution.

Normalization

The normalization of ST data is essential for facilitating the comparison of expression values between spots or genes. This processing step can approximate the normalization used in single-cell transcriptomics data. One common approach is the library size normalization method in single-cell transcriptomic data analysis, which involves dividing the feature counts for each spot by the total counts for that spot and then multiplying by a scale factor. However, this method is not rigorous because the total read counts per spot in ST data inherently reflect relevant features of tissue morphology and local cell density [142]. The nature of data derived from single-cell or ST sequencing introduces substantial variability among individual cells or spots due to technical influences, thereby intertwining technical variations with biological diversity. To mitigate this, scTransform has been developed to normalize and stabilize variance in molecular count data [101]. This approach involves the utilization of Pearson residuals obtained through a regularized negative binomial regression, wherein the depth of sequencing is incorporated as a covariate within a generalized linear model. Technical biases in the data are effectively filtered out by applying this method, thereby preserving the true biological variability and enabling more accurate downstream analysis.

Spatially variable genes identification

Accurate identification of spatially variable genes (SVGs) is pivotal in locating different cell types and facilitating the clustering process. Methods for SVG detection can be broadly categorized into three types based on their utilization of spatial coordinates: spatial kernel methods, marked point process methods, and correlation-based methods. Spatial kernel methods, such as SpatialDE [102] and SPARK [103], predominantly utilize Gaussian kernels and Gaussian process regression models to define spatial relationships between cells or spots. Methods based on marked point processes include trendsceek [104] and scGCO [105]. For example, scGCO models gene expression as a marked point process, discretizing expression values into different bins to represent observed gene expression states. SPARK-X, the representative of the third category of methods, defines a class of correlations between gene expression values and spatial coordinates for effective detection of spatially expressed genes [106]. A recent addition to these methods is STAMarker, which embodies a three-stage ensemble framework by integrating graph-attention autoencoders, multilayer perceptron (MLP) classifiers, and saliency map computation through backpropagated gradient [107]. This method processes all genes simultaneously, and is adept at pinpointing SVGs that are specific to spatial domains.

Spatial domain detection

Compared to clustering analysis in single-cell transcriptomic data, ST clustering incorporates not only the expression values of each spot but also the spatial location information and histology image information. Methods to address this problem can generally be categorized into three types. The first category, exemplified by stLearn [108], involves spatial smoothing of gene expression based on morphological similarity and spatial location. Subsequently, it employs a standard Louvain clustering procedure to detect spatial domains. The second category of methods is developed based on the standard Hidden Markov Random Field (HMRF) framework, allowing latent states representing domain categories to be spatially continuous. Representative tools include BayesSpace [97], Giotto [109], SC-MEB [110], DR-SC [111] and BASS [112]. Taking Giotto as an example, it first selects spatially differentially expressed genes and then utilizes HMRF clustering to detect spatial domains. The third category employs graph neural networks (GNNs) to process graphs derived from spatial locations, expression data, and potentially histology image information. This approach enhances spot embeddings through learning the complex relationships between spots, ultimately leading to improved detection of spatial domains and gene expression patterns. Representative algorithms include SpaGCN [113], STAGATE [114], CCST [115], and GraphST [116]. For example, GraphST integrates graph neural networks with self-supervised contrastive learning to improve the representations of spots. It optimizes the embedding distances among spatially neighboring or distant spots to acquire informative and discriminative spot representations, finally enabling spatially informed clustering, integration, and deconvolution.

Deconvolution

Deconvolution, a widely employed method in bulk RNA-seq to differentiate mixed expression signals into identifiable cell types, becomes exceedingly useful in the realm of spatial transcriptomics, such as with Visium platforms, where each spot aggregates transcripts from several cells. This method typically uses a cell-type-annotated single-cell sequencing data as a reference, facilitating accurate estimation of cell type proportions within each spot, thereby illuminating the intricacies of tissue microenvironments and the underlying cellular architecture. Several tools in this field have been developed to perform deconvolution effectively, such as SPOTlight [117], SpatialDWLS [118], DSTG [130], cell2location [119], CARD [120], RCTD [123], destVI [121] and STRIDE [122]. SPOTlight utilizes seeded non-negative matrix factorization (NMF) regression to deconvolute the ST information with scRNA-seq data. DSTG generates pseudo-ST data by simulating cell mixtures from scRNA-seq data and utilizes graph-based convolutional networks for deconvolution. Cell2location, a Bayesian model, demonstrates the ability to resolve fine-grained cell types in ST data and integrate single-cell and ST with high sensitivity and resolution. To assess and compare the performance of these algorithms, benchmarking studies have been conducted, providing valuable insights for selecting appropriate methods for cell type deconvolution of spots [143, 144].

Another category of deconvolution methods, known as reference-free deconvolution methods, does not require single-cell transcriptomic data as a reference. Representative methods in this category include CARDfree [120], STdeconvolve [124], and SMART [125]. In spatial transcriptomic samples of tumors, the estimation of cancer cell abundance can be particularly challenging due to the heterogeneity of tumor cells. SpaCET [126] addresses this challenge by integrating a gene pattern dictionary of copy number alterations and expression changes to estimate cancer cell abundance. Furthermore, it employs a constrained regression model to determine the proportions of immune and stromal cell lineages. However, these methods have a limitation in that they can only estimate the proportions of cell types in a given spot without providing single-cell level deconvolution.

An alternative approach is to enhance ST data to achieve single-cell resolution. Tools like CellTrek [127] and CytoSPACE [128] assign the most probable spatial location in ST data to each single cell and subsequently estimate the cellular composition within the tissue. SpatialScope [129] utilizes the deep generative model to accurately decompose the observed gene expression at each spot into the single-cell level gene expression.

Cell-cell communications

Cell interactions govern a variety of biological activities, essential for preserving the functionality and microenvironmental homeostasis of cells in organs and tissues. To analyze cell interactions within single-cell transcriptomic data, several outstanding analytical software tools are available, such as CellPhoneDB [131] and CellChat [132]. These algorithms typically involve randomizing cell labels and use permutation tests to estimate the significance of cell interactions. Notably, these software tools have been updated to incorporate spatial information into the analysis.

CellChat employs a mass action-based model to calculate the likelihood of signal transmission between two groups of cells. This model considers the fundamental interactions between multi-subunit structured ligands and receptors, as well as the influence of cofactors. An enhanced version, CellChat v2, integrates the actual spatial positioning of cells to deduce communication between cells in close spatial proximity [133]. On the other hand, CellPhoneDB directly utilizes spatial information by restricting the interacting pairs of clusters or cell types to those sharing a microenvironment [131, 145]. Consequently, it only tests combinations of clusters if they coexist within the same microenvironment.

SpaOTsc uses optimal transport mapping between scRNA-seq data and spatial data to construct a spatial metric for cells in scRNA-seq data, and reconstruct space-constrained cell–cell communication networks [134]. The main author of SpaOTsc later proposed COMMOT, which considers the spatial distances between cells and aids in deducing the competitive dynamics between different ligands and receptors [135]. This approach provides a new perspective for examining the spatial interactions between immune cells and tumor cells in the tumor ecosystem.

Alignment and integration of multiple slices

The alignment and integration of multi-slice ST data represents a current challenge in spatial data analysis, which can be broadly categorized into three types: reconstructing three-dimensional (3D) structures, identifying spatial shared domain, and revealing dynamics in biological processes.

For the reconstruction of 3D structures, several methods such as PASTE [138], PASTE2 [139], SLAT [140], and SPACEL [136] can be utilized. These methods typically involve aligning spots between different slices or performing coordinate transformation and rotation of different slices to achieve a 3D structure composed of multiple slices. For example, PASTE utilizes an optimal transport formulation to compute pairwise alignments of slices, which are then combined to construct a stacked 3D alignment of a tissue [138]. PASTE2 allows for partial alignment and 3D reconstruction, accommodating partial overlap between aligned slices and/or slice-specific cell types [139]. SLAT, based on the minimum-cost bipartite matching problem, aligns spots with similar embeddings from two slices [140]. For multi-slice datasets with pre-labeled spatial domains, the Scube module of SPACEL facilitates the transformation of spatial coordinate systems across consecutive slices, allowing for the construction of a 3D architecture [136].

STAGATE [114], SPACEL [136], SpaGCN [113], STAligner [137], and STitch3D [141] are tools commonly used for the identification of spatial shared domain across multiple slices. STAGATE achieves the alleviation of batch effects between consecutive sections and 3D spatial domain identification by simultaneously considering the 2D spatial neighbor network (SNN) within each section and neighboring spots between adjacent Sects. [114]. The Splane module of SPACEL employs a graph convolutional network and an adversarial learning algorithm to identify spatial shared domains [136]. SpaGCN constructs a large graph by manually stitching and calibrating two adjacent slices, enabling joint dimensionality reduction and clustering using graph convolutional networks (GCN) [113]. STAligner, employing a graph attention neural network, enables spatially aware data integration and simultaneous spatial domain identification [137]. STitch3D enhances spot representations and estimates of cell-type proportions by jointly modelling multiple slices, facilitating the recognition of 3D spatial regions and the revelation of 3D cell-type distributions.

The third type of task is revealing dynamic biological processes in development-related spatial transcriptomic data, which can be achieved by algorithms such as SLAT [140] and STAligner [137]. For example, Zhou et al. reconstructed the developmental trajectory of each tissue structure during mouse organogenesis, by building connections between the same spatial domains in four successive stages and generating an acyclic directed graph [137]. Currently, the application is somewhat limited in the oncology field, owing to the challenges in acquiring tumor samples with temporal information.

Development directions of ST

In the past two decades, sequencing technologies have undergone revolutionary changes. The emergence and advancement of NGS and long-read sequencing have significantly enhanced the throughput and read length. Concurrently, the development of single-cell and spatial omics has provided unique perspectives for studying biological landscapes at various scales. The progress in sequencing methods has greatly propelled research and comprehension in critical domains, including tumor heterogeneity, TME, and tumor spatial architecture. ST is a burgeoning technology currently in its early stages of development. It exhibits intrinsic limitations that require careful consideration. Nonetheless, it also represents a field with substantial potential for improvement. By focusing on refining its capabilities and addressing current constraints, ST provides valuable opportunities for advancing our understanding in the field of oncology.

Enhancing resolution and throughput

Spatial barcode-based technologies enable transcriptomic characterization at the whole transcriptome level. Nonetheless, these approaches frequently encounter challenges in resolution, which impedes their capacity for discrimination at the single-cell level. This limitation poses a substantial impediment to the precise examination of tumor spatial heterogeneity. While certain sophisticated deconvolution tools can partially mitigate this challenge, they fall short of attaining the level of superiority inherent in single-cell ST data. Moreover, the depth of exploration in this type of data is quite limited, inhibiting in-depth downstream analyses like pseudo-time analysis often used in single-cell transcriptomics. These analyses are crucial for investigating cell state transitions during a biological process and identifying key transitional factors. The use of multicellular resolution data compromises the significance of these analyses, rendering the biological interpretation less meaningful. Additionally, although many imaging-based ST methods can achieve single-cell resolution, their reliance on pre-designed capture probes restricts the number of genes they can capture. Consequently, they cannot be deemed genuine spatial omics techniques and are incapable of probing intricate biological regulatory networks. A recent breakthrough is the launch of the Visium HD platform by 10 × Genomics, marking a significant advancement in ST. The platform achieves the reconstruction of complete transcriptomic spatial architecture at the single-cell scale, enabling comprehensive coverage across the entire tissue. The introduction of this cutting-edge technology marks a transformative milestone, ushering in a new era characterized by enhanced precision and depth in ST research.

Enhancing the capture efficiency of sequencing

In current single-cell and spatial omics technologies, a prevalent phenomenon known as 'drop-out' is observed, where certain transcripts are not sequenced due to low capture efficiency. Transcripts with low expression levels pose an even greater challenge for capture. Moreover, factors such as tissue differences, experimental procedures, and reagent choices may introduce biases in transcript capture during the sequencing process. For example, deviations from standard experimental procedures may lead to an increased proportion of mitochondrial or ribosomal transcripts captured in cells or spots. Therefore, it is crucial to significantly enhance the capture efficiency in ST sequencing. Such enhancement holds the potential to significantly augment the data's intrinsic value, thereby facilitating researchers in achieving a more comprehensive and profound understanding of tumor spatial heterogeneity.

Compatibility with FFPE samples

FFPE is a common method for preserving tissue samples. FFPE samples can be stored at room temperature for many years, facilitating long-term cohort and retrospective studies. The sample preparation process preserves the integrity of cellular and tissue structures, which is advantageous for subsequent morphological analysis, molecular experiments, and pathological diagnosis. However, the fixation process may cause degradation of nucleic acids and denaturation of proteins, hindering the application of subsequent spatial omics technologies. 10 × Genomics has improved the original Visium process to enable ST sequencing of FFPE samples. In the improved process, two specific probes are designed and bind to the mRNA target site. After ligation of the matched probes, the original mRNA template is degraded. Following Tissue Permeabilization, the ligated probes are captured by oligos on the slide, and then library preparation and sequencing analysis proceed. This workflow has been applied in some tumor studies [78, 146, 147]. Recently, stereo-seq has employed random probes for in situ capture of RNA molecules within FFPE tissues. This approach not only reconstructs the spatial localization of RNA molecules but also facilitates the study of both the whole transcriptome and the spatial expression of microbial transcripts [86]. However, some ST technologies, such as Slide-seq and HDST, are not yet applicable to FFPE samples [148]. In the future, spatial omics technologies adapted for FFPE samples will accelerate research related to tumors, especially in longitudinal studies on tumor progression or disease relapse.

Spatial multi-omics and full-length technology

Following the pioneering development of ST, other spatial omics technologies have increasingly garnered attention. Spatial proteomics technologies have emerged as powerful tools capable of detecting the levels of dozens of proteins in situ. Spatial proteomics primarily fall into two categories: mass spectrometry-based [149,150,151] and antibody-based approaches [152, 153]. Additionally, spatial metabolomics technologies enable non-targeted detection of metabolites and lipids on tissue sections using imaging mass spectrometry [154]. In recent years, several other types of spatial omics technologies have been developed. Spatial ATAC-seq enables in situ sequencing of chromatin to understand chromatin accessibility related to tissue structure and spatial position [155, 156]. Spatial TCR/BCR sequencing analyzes lymphocyte clonality from a spatial perspective, linking specific antigen receptors with tumor-associated antigens. This aids in identifying and leveraging antigen-specific clones to develop therapeutic approaches [157]. Slide-DNA-seq enables the capture of DNA sequences from complete tissue sections while accurately preserving the local tumor architecture, allowing for the detection and analysis of different tumor clones and their copy number variations in spatial distribution within the tissue [158]. To systematically study regulatory mechanisms from a spatial perspective, multiple spatial omics technologies are often used in conjunction. One strategy involves performing different spatial omics sequencing on adjacent consecutive sections, followed by a combined analysis and interpretation [86]. This approach may introduce heterogeneity among the sections and pose challenges in data analysis, particularly in aligning multiple sections. Another strategy involves performing multiple omics sequencing on the same section. Compared to the first approach, this method conserves samples and eliminates the necessity for aligning sections. However, it presents greater technical challenges, which is why such technologies are still relatively rare. For instance, spatial CITE-seq [159] and STARmap PLUS [160] add a protein dimension to the capture of spatial transcripts. Slide-TCR-seq enables simultaneous sequencing of TCR and RNA on a single tissue Sect. [161]. In the future, the integration of spatial multi-omics will revolutionize our understanding of how cellular programs are coordinated and the intricate mechanisms underlying tumor biology.

It is worth noting that the current ST approach employed by Visium can only detect information at the 3’ end of transcripts, thus neglecting full-length exploration. This limitation considerably restricts the investigation of immune cell receptor repertoire and alternative splicing events. To improve the accuracy and continuity of sequencing reads, long-read sequencing technologies from Pacific Biosciences (PacBio) [162] and Oxford Nanopore Technologies (ONT) [163] emerge as pivotal tools. Innovative sequencing strategies that integrate long-read sequencing with either bulk or single-cell sequencing [164, 165], along with computational methods [166, 167], have been developed. Similarly, ST requires the development of comparable technologies and algorithms to expand the capabilities of spatial omics studies.

Comprehensive benchmarking of ST data analysis methods

As the number of methods for analyzing ST data expands, researchers increasingly require guidance on selecting the most appropriate approaches. Consequently, a comprehensive comparison and evaluation of these analytical methods is critical. The preceding text details various aspects of ST data analysis. For cellular deconvolution in ST, there is abundant benchmarking work which entails a comprehensive comparison of existing methods [143, 144, 168, 169]. This comparative analysis employs both real and simulated datasets to evaluate metrics such as accuracy, robustness, and usability across diverse resolutions, technologies, spot numbers, and gene numbers. Despite variations in benchmark results among different research groups, Cell2location and RCTD consistently exhibit superior performance in cellular deconvolution tasks [143, 168, 169]. For spatial domain detection methods, a recent benchmarking study evaluated 13 computational techniques based on criteria such as accuracy, spatial continuity, marker gene detection, scalability, and robustness [170]. The results of this study indicate that no single method is universally effective across all datasets, and the optimal method depends on the data characteristics. Researchers may explore multiple methods to select the one that best aligns with the characteristics of their data. Besides these benchmarking studies, similar efforts for other analytical tasks remain sparse. Previous comparisons of methods for detecting SVGs have revealed significant variability in detection results, highlighting the need for further benchmarking to identify the most effective methods [171]. We advocate for increased benchmarking across various analytical dimensions of ST. A thorough assessment and comparison of these tools are essential for providing researchers with guidance on tool selection, thereby optimizing their research outcomes.

Computational challenges in ST

In the realm of computational tools, there are several promising avenues and challenges awaiting exploration to enhance ST capabilities. These include the development of algorithms for the precise identification of lesions, which promise to significantly improve the detection and classification of pathological changes within tissues. Algorithms for detecting mutations such as point mutations and gene fusions have been developed in single-cell transcriptomics [172, 173]. Looking ahead, developing algorithms for mutation detection in ST data is becoming increasingly essential. This development will significantly contribute to our understanding of the spatial variation in mutations. Furthermore, the integration of spatial multi-omics data stands as a pivotal development, aiming to offer a more comprehensive understanding of tumor biology by amalgamating gene expression with additional omics insights. The establishment of comprehensive analysis pipelines is also paramount, which is designed to provide seamless end-to-end processing, analysis, and visualization of spatial datasets. Additionally, merging ST with clinical and imaging data heralds a new era of tools capable of linking molecular signatures to phenotypic and clinical outcomes, thus advancing the precision medicine approach. Overall, these future developments are anticipated to substantially augment the utility of ST, rendering it an even more insightful tool for probing the complex underpinnings of tumor biology and expanding its clinical relevance. As these computational tools and methods continue to evolve, they are poised to unlock new perspectives and approaches in cancer research, thereby facilitating the advent of tailored therapeutic interventions.

Analysis platforms suitable for clinical practitioners

As mentioned previously, the analysis of ST data remains complex due to the diversity of data types, lacking a unified paradigm similar to single-cell transcriptomic analysis. This poses significant challenges for biologists and clinical practitioners. To alleviate this hurdle, several online platforms have emerged to streamline the showcase and exploration of spatial transcriptomic data from various sequencing platforms. Two noteworthy databases, Spatial Omics DataBase (SODB) [174] and Spatial TranscriptOmics DataBase (STOmicsDB) [175] stand out in providing comprehensive resources. In the realm of spatial omics, SODB is a pivotal online platform meticulously designed to provide researchers with an extensive collection of data resources and diverse interactive modules for sophisticated data analysis. Encompassing over 2,400 experiments from more than 25 different spatial omics technologies, SODB maintains data in a standardized format that is compatible with numerous computational tools, ensuring versatility and accessibility. Notably, SODB features a range of interactive modules for data analysis. Another key resource is STOmicsDB, a comprehensive database designed as a central resource for ST research. STOmicsDB contains 218 carefully curated datasets from 17 species, with detailed annotations of cell types, spatial regions, genes, and analyses of cell-to-cell interactions. Its user-friendly interface enables quick visualization of data involving millions of cells.

In summary, ST technology presents several areas for improvement and refinement in the future. As these advancements unfold, it is anticipated to emerge as a powerful tool for biologists and clinical scientists, enabling in-depth exploration of the intricate mechanisms underlying tumors.

Conclusions and perspectives

In spite of the clinical implementation and notable successes of targeted and immunotherapy, cancer remains one of the major health challenges faced by humanity. The effectiveness of cancer treatment is primarily hindered by the heterogeneity of tumors and the intricate nature of the TME. Factors such as tumor metastasis, the development of drug resistance, and the presence of an immunosuppressive microenvironment often contribute to treatment failures.

ST, an emerging technology in recent years, offers resolution comparable to single-cell transcriptomics while addressing the lack of spatial context in single-cell sequencing. ST has gained increasing attention from researchers as a powerful tool for studying tumor heterogeneity, the TME, and the spatial structure of tumors, thereby laying an important foundation for research in tumor biology. This review aims to provide a comprehensive overview of the developmental trajectory and recent applications of ST in cancer research. To facilitate comprehension among clinical practitioners and biomedical professionals, a thorough description of the ST data analysis process is included, supplemented by an online guide.

With ST advancing towards enhanced resolution, higher sequencing throughput, multimodality, and reduced costs, its significance in cancer biology research is anticipated to grow substantially. We believe that ST will establish a theoretical foundation for personalized precision medicine, making significant contributions to the development of targeted and individualized therapeutic approaches in the field of oncology.