Introduction

Gene fusion is the combination of two or more distinct gene coding regions at the DNA or RNA level, typically as a result of chromosomal rearrangements such as translocations, deletions, inversions, or tandem duplications [1]. The formation of hybrid chimeric genes is facilitated by a common collection of regulatory regions, including promoters, enhancers, ribosome-binding sequences, and terminators. Within the field of oncology, gene fusion is a crucial process that initiates unregulated cell proliferation and is a key factor in the development and progression of different types of malignancies [2]. The earliest recorded discovery of gene fusions was in 1960 by Peter Nowell and David Hungerford [3], who identified a unique microchromosome in chronic myeloid leukemia known as the Philadelphia chromosome. This landmark discovery marked the first time a recurrent chromosomal rearrangement was identified in cancer. In 1973, Janet Rowley’s groundbreaking research further revealed that the Philadelphia chromosome resulted from the rearrangement of segments from chromosomes 9 and 22, which created a gene fusion [4]. In 1985, Shtivelman et al. [5] conclusively demonstrated that the BCR-ABL gene fusion is the primary oncogenic driver in chronic myelogenous leukemia (CML).

Gene fusion and cancer relationship

Types of gene fusions

Gene fusion may occur at both the DNA and RNA levels. Cancer pathogenesis is typically caused by two mechanisms: gene deregulation and fusion events that result in the formation of chimeric proteins. Gene fusions can cause gene truncation, resulting in the loss of protein-coding regions. This leads to a decrease in the expression of tumor suppressor genes, ultimately leading to cancer (Fig. 1a) [6].

Fig. 1
figure 1

Mechanism and types of gene fusions. a Gene fusion is caused by structural chromosomal rearrangements. Translocations, which involve mutual translocations that create two derived chromosomes possibly containing the causative gene [left], and nonreciprocal translocations accompanied by the loss of short chromosomal arms and chromosome numbers [right]. Inversions, including inversions near centromeres occurring on one arm of the chromosome, not involving centromeres [left]; and inversions around the centromere center with two breakpoints on each arm, involving centromeres [right]. Tandem duplication involves the duplication of genes or gene segments within a specific region of a chromosome. With deletions, genes or gene segments on the chromosome are deleted or lost. A and B denote the locations of breakpoints, marked by small blue arrows, while the large black arrows indicate the resulting rearrangement of chromosomes. Reproduced with permission [7]. Copyright 2018, Elsevier. b The distribution patterns of various fusion types (gene-gene, gene-intergene, intergene-intergene) [8]. Copyright 2018, Springer Nature. c Schematic diagram of gene fusion breakpoints. The purple symbol indicates the 5’ gene partner, and the green symbol indicates the 3’ gene partner. For gene fusion, breakpoints can appear in the following genomic regions: 5’ UTR (first column from left to right), CDS (second column), 3’ UTR (third column), and noncoding region (fourth column). The dotted line indicates the junction between the breakpoints of the gene fusion, and the circle size and number indicate the number of occurrences of the corresponding fusion type [9]. Copyright 2018, Elsevier

A comprehensive analysis was conducted on Whole Exome Sequencing (WES) data from 268 tumor samples. The researchers focused on regions containing breakpoints and annotated them genomically. They categorized the observed gene fusion events into three main types: gene-gene fusion, gene-intergenic fusion, and intergenic-intergenic fusion. Among the 13,698 gene fusion events analyzed, approximately 38% were classified as gene–gene fusions, 28% as intergenic fusions, and the remaining 34% as intergenic-intergenic fusions. These findings offer deeper insight into the complexity of tumor genomes (Fig. 1b) [8]. Based on the analysis conducted by Gao et al. [9] of the 25,664 gene fusion breakpoints examined, the majority were found within the coding sequence (CDS) of the two chaperone genes (as shown in Fig. 1c).

Tumor-associated gene fusion

Overview

Gene fusion represents a significant somatic alteration in cancer [9] and plays a pivotal role in the development of various cancer types. Many gene fusion mutations exhibit a remarkable degree of specificity, making them key targets in molecularly targeted cancer therapies [10].

In the realm of cancer research, a considerable number of ongoing studies have concentrated on the identification of distinct variations in the DNA sequence, which are referred to as single nucleotide polymorphisms (SNPs), as well as mutations involving the insertion, deletion, or duplication of genetic material within the cancer genome [11]. Thorough investigations of changes in genetic material have greatly enhanced our understanding of the fundamental processes driving the development of tumors. It is evident that gene fusions play a considerable role in complex tumor environments. Fusion events occur in different types of tumors and are important biomarkers for specific cancers, such as anaplastic lymphoma kinase (ALK) and c-ros oncogene 1 kinase (ROS1), and are rearranged during transfection (RET). Proto-oncogene tyrosine-protein kinase fusions have become key targets for tyrosine kinase inhibitors (TKIs) in the context of non-small cell lung cancer (NSCLC) [12]. Extensive evidence shows that gene fusions contribute to the progression of cancer in 16.5% of patients and play a significant role in promoting cancer development in more than 1% of patients [9]. Picco et al. [10] used CRISPR gene editing to reveal the essential functions of several gene fusion categories in promoting the proliferation of cancer cells. In addition, their investigation revealed a unique gene fusion, YAP1-MAML2, which offers promising therapeutic targets for a wide range of malignancies, including brain and ovarian cancers.

Identification and treatment progress of ALK gene fusions in NSCLC

Significant strides have been made in the field of NSCLC research over the past few decades, particularly in the identification of ALK fusion events and their targeted treatments. ALK, a transmembrane receptor tyrosine kinase part of the insulin receptor superfamily, is gene-located in the chromosomal region 2p23 and encodes a receptor tyrosine kinase within the insulin receptor family (Fig. 2a). Since the first documentation of the TPM3-ALK fusion by Lamant et al. [13] in 1999 within a specific subtype of anaplastic large cell lymphoma, and the 2007 report by Soda et al. [14] on the presence and oncogenic role of ALK gene rearrangements in NSCLC patients, ALK fusion has become a key focus in NSCLC research. Table 1 integrates the numerous known ALK gene fusion forms within NSCLC, highlighting the complexity and diversity of this research area.

Fig. 2
figure 2

ALK gene fusion types in tumors. a ALK sequencing results from the full reference sequence of the Genome Browser (https://genome.ucsc.edu/). b Rearranged form of ALK with EML4 in NSCLC. c ALK fusion in patients with NSCLC [15]. Copyright 2020, De Gruyter. d The fusion positions of ALK are indicated by vertical bars, and five variants (V) of EML4-ALK have been identified, namely, 1, 2, 3, 4 and 5. Among these variants, exons 13, 20, 6, 14 and 2 of EML4 cDNA were fused with exon 20 (e20) of ALK cDNA. Reproduced with permission [16]. Copyright 2008, John Wiley and Sons

Table 1 ALK rearrangements in NSCLC

Echinoderm microtubule-associated protein-like 4 (EML4) is the most prevalent partner gene of ALK, with fusion occurring at the N-terminal end. Disruption of EML4 occurs approximately 3.6 kilobases downstream of exon 13 and is related to a location 297 base pairs upstream of exon 21 of ALK [14]. The detection rate of the EML4-ALK gene fusion is estimated to be between 3 to 7% among NSCLC patients. Various EML4 breakpoints can lead to the formation of over ten EML4-ALK gene fusion variants with ALK, with variants 1 and 3 being the most common (Fig. 2c) [24]. Approximately 5% of NSCLC patients test positive for ALK fusion, primarily with EML4 as the fusion partner. Initially, two EML4-ALK variants [14], known as variant 1 and variant 2, were identified, with further research unveiling variants 3a and 3b (Fig. 2b and c) [25, 26]. Takeuchi et al. [16] developed a single-tube multiplex reverse transcription polymerase chain reaction (RT-PCR) method [25], successfully detecting all known and new EML4 and ALK fusion variants, including variants 4 and 5 (Fig. 2d).

Recent years have seen significant breakthroughs in the treatment of ALK fusion-positive NSCLC [27]. Crizotinib, the first-generation FDA-approved small molecule ALK inhibitor, marked a new era in this field. With a deeper understanding of resistance mechanisms, second-generation (such as ceritinib, alectinib, brigatinib) [28] and third-generation (such as lorlatinib) ALK inhibitors were subsequently developed and introduced, significantly enhancing treatment outcomes and offering new hope to ALK fusion-positive NSCLC patients [29,30,31,32,33]. Table 2 provides an overview of several FDA-approved ALK inhibitors currently utilized in clinical practice. These advancements not only improved patient survival rates but also propelled a shift towards personalized medicine and precision treatment strategies.

Table 2 Several FDA-approved ALK inhibitors have been used in clinical settings

Breakthroughs in targeted therapies for cholangiocarcinoma (CCA): focusing on fibroblast growth factor receptor (FGFR) fusions

The field of cholangiocarcinoma (CCA), a subtype of liver cancer related to the biliary tract, has witnessed significant advancements in targeted therapies, particularly with the focus on fibroblast growth factor receptor (FGFR) fusions. FGFR, part of the receptor kinase family with four subtypes (FGFR1, FGFR2, FGFR3, and FGFR4), has become a key target in developing treatments for CCA. These receptors interact with 18 distinct fibroblast growth factors (FGFs) [42], crucial for cell survival, proliferation, differentiation, and motility, as well as playing vital roles in embryogenesis, wound healing, angiogenesis, and carcinogenesis [43,44,45]. Among these, FGFR2, especially noted in intrahepatic cholangiocarcinoma (iCCA), where its fusion occurs in about 10-15% of patients [46,47,48,49], has been identified as a significant biomarker and therapeutic target. This fusion typically involves FGFR2 and a chaperone gene, leading to the production of a functional protein that promotes dimerization and oligomerization [42, 46], with BICC1 among its common fusion partners [50, 51].

In recent years, the FDA approval of four targeted drugs for CCA has ushered in a new dawn of hope for patients. Pemigatinib, an orally administered inhibitor, became the first targeted medication approved for CCA treatment in the United States in April 2020 [52] and in China in April 2022. Following this, truseltiq (infigratinib), an ATP-competitive inhibitor of the FGFR tyrosine kinase family, was approved in May 2021 for patients with FGFR2 alterations [53]. Ivosidenib, the first isocitrate dehydrogenase-1 (IDH1) inhibitor, was approved in August 2021, introducing precision treatment options for patients with advanced CCA with IDH1 mutations [54]. In September 2022, futibatinib, targeting adults with FGFR2 gene fusions or rearrangements, further expanded the arsenal of targeted therapies for CCA. These advancements not only signify a leap forward in the personalized treatment of CCA but also highlight the growing importance of FGFR fusions as biomarkers and therapeutic targets, offering new pathways for research and patient care [55]. Table 3 summarizes the FDA-approved targeted therapies for the treatment of CCA.

Table 3 FDA-approved targeted drugs for the treatment of CCA

Gene fusions identified in other malignancies

Gene fusions are common not only in NSCLC and CCA but also in a variety of cancers. Gene fusions, including prominent examples such as HNRPA2B1-ETV1 [60], TMPRSS2-ERG, and Breakpoint Cluster Region (BCR)-Abelson Murine Leukemia Viral Oncogene Homolog 1 (ABL), are highly prevalent in cancers like prostate cancer and CML. Notably, the TMPRSS2-ERG fusion is detected in approximately 50% of prostate cancer patients [61], and the BCR-ABL gene fusion, a hallmark of CML, is found in the majority of CML cases. In colorectal cancer, the DNA repair gene RAD51C can fuse with ATXN7 at a fusion rate of 36% [62], these examples highlight the critical importance of gene fusions in various cancers as the list of identified gene fusions continues to grow, further establishing their link to cancer development.

Comparison of RNA and DNA level gene fusion assays and their advantages

Overview

DNA and RNA gene fusions are two distinct molecular events at the genomic and gene expression levels. At the DNA level, gene fusion typically refers to the merging of DNA segments or chromosomes from different sources. Such fusion events may result in novel genomic structures, including gene rearrangements, insertions, or deletions. At the RNA level, gene fusion involves the merging of two distinct RNA molecules that commonly occur during transcription and splicing processes. These fusion events play crucial roles in regulating the diversity of genomic structures and expression at the genetic level.

DNA-based gene fusion detection offers a more comprehensive view of the genome, yielding extensive sequencing data. This approach not only identifies gene fusion events across the entire genome but also uncovers complex structural changes, providing a deeper understanding of the genomic landscape. However, gene fusion breakpoints predominantly manifest within intron regions and tend to be longer than within exon regions [63]. This poses a challenge for creating breakpoint capture probes using DNA next-generation sequencing (NGS) technology, as these long intronic regions contain GC-rich and highly repetitive sequences, along with some complex rearrangements. This complexity increases the detection cost, and it also affects probe capture efficiency and sequence alignment accuracy.

In contrast, RNA-based NGS can directly detect exon-exon junctions, indicating fusion events. This approach aids in precise probe or primer design, increasing the accuracy and precision of recovery from high-GC content and repetitive regions. Additionally, when the tumor purity of a sample is low, the variant allele frequency (VAF) of mutations may be below detectable levels. However, at the transcript (RNA) level, the theoretical overexpression of fusions can potentially compensate for the low VAF. In many cases, RNA-based methods have the potential to overcome the limitations of DNA-based gene fusion detection techniques, providing a more effective approach for identifying gene fusions with significant advantages (Fig. 3a) [64, 65].

Fig. 3
figure 3

Comparison of RNA and DNA gene fusion detection. a Schematic diagram of the potential genomic complexity that could lead to false-negative gene fusion results in DNA-based NGS analysis. In some cases, RNA-based approaches may overcome the limitations of DNA-based detection [66]. Copyright 2019, American Association for Cancer Research. b A schematic diagram illustrating the primary NGS-targeting methods used for the gene fusion assays. Hybridization capture: Gene-specific enrichment is achieved using biotinylated DNA or RNA probes that target the region of interest through hybridization steps. Classical amplification methods: Target enrichment is accomplished through multiplex PCR employing fusion variant-specific primers. Anchor multiplex PCR (AMP): This method focuses exclusively on one fusion partner. By anchoring one side of the target-specific primer while the other end is randomly connected to the universal adapter, AMP enriches the region of interest with information from only one end [67]. Copyright 2020, Diagnostics

Advantages of RNA-based gene fusion detection

As previously mentioned, RNA sequencing (RNA-seq) probes are more straightforward than other probes and offer an impartial assessment of larger transcripts across a dynamic range [68, 69]. In regard to accuracy, sensitivity, and cost-effectiveness, RNA-level detection outperforms DNA-level methods.

In a study conducted by Seo et al. [70] in 2012, significant upregulation of RNA expression was observed for gene fusions (ALK, ROS1, and RET). Additionally, Beaubier et al. [71] highlighted that RNA-seq surpasses DNA sequencing (DNA-seq) in detecting gene fusions across pancancer populations. Sheikine et al. [72], in their study, identified the potential for false-negative results in DNA-based NGS, particularly when fusion breakpoints reside within extensive intron regions containing numerous repetitive elements. Li et al. [34] conducted comprehensive studies involving DNA-NGS and RNA-NGS with 3,787 patients with NSCLC. Notably, among the 140 patients initially classified as negative by DNA-NGS, RNA-NGS using the Hengte gene identified gene fusions/rearrangements in 10%, thus highlighting potential limitations of DNA-NGS in detecting specific fusion variants.

In summary, RNA-level detection offers a solution to challenges related to the presence of large intron regions and the concurrent involvement of multiple exon fusions in the identification of gene fusions [73]. Furthermore, in terms of predicting the most suitable targeted medication, RNA-level-based tests demonstrate exceptional efficacy [34].

This analysis clearly demonstrates that RNA-level detection is superior in identifying a diverse array of rare gene fusions compared to DNA-level methods. It emphasizes the critical importance of RNA-based techniques in achieving accurate and comprehensive gene fusion detection in practical testing environments.

Challenges and considerations in RNA-based gene fusion detection

Although RNA-level gene fusion testing offers heightened sensitivity and accuracy compared to DNA-level testing, it presents its own set of challenges. RNA-based targeted methods require direct analysis and quantification of fusion transcripts, yet their effectiveness can be limited by the availability and quality of the RNA [67].

Given its inherent single-stranded nature, RNA is more susceptible to instability compared to the double-stranded structure of DNA. This vulnerability is particularly pronounced when working with RNA extracted from formalin-fixed, paraffin-embedded (FFPE) specimens, which presents additional challenges due to its tendency towards degradation. Consequently, conducting comprehensive quality assessments becomes crucial prior to NGS analysis to ensure accurate and reliable results [74].

During RNA extraction, extremely small or excessively large sample sizes present challenges. Small samples may yield limited RNA from cell tissues, whereas large volumes can cause incomplete lysis and low yields. Clinically, tumor samples, especially FFPE specimens after pathological examination, often contain partially degraded and lower-quality RNA. Low-quality RNA may result in uneven gene coverage and higher false positive rates [75]. The ongoing challenges in RNA gene fusion detection involve low RNA content and compromised quality.

Methods for gene fusion detection

The reasonable selection of gene fusion diagnosis methods is crucial for effectively predicting patient benefit, and accurate tumor-targeted therapy depends on accurate gene fusion detection. Currently, commonly used gene fusion detection methods include fluorescence in situ hybridization (FISH), RT-PCR, immunohistochemistry (IHC), electrochemiluminescence (ECL) [76], NGS, and several NGS-based approaches [77]. These methods vary in their techniques and applications, providing different ways to detect and analyze gene fusions.

FISH uses fluorescently labeled DNA probes to hybridize with target DNA sequences within the cell nucleus, thereby obtaining information about the chromosome or gene status within the nucleus [78, 79]. IHC detects antigens within tissue cells through antigen–antibody specific binding and chromogenic reactions, enabling localization, qualitative, and relative quantitative analysis of gene fusions at the protein level [80]. PCR analysis of gene fusions is based on designing oligonucleotide primers on both sides of the breakpoint fusion regions, ensuring that the PCR product contains the tumor-specific fusion sequences [81, 82]. NGS detects gene fusions by using high-throughput sequencing and alignment analysis to identify reads spanning gene boundaries [83].

FISH gene fusion testing gold standard

FISH has long been considered the gold standard for detecting ALK and ROS1 rearrangements in NSCLC patients, as it detects gene rearrangements at the DNA level (Fig. 4a). Primarily, isolation probes that do not require prior knowledge of fusion partners can distinguish rearrangements from polyploidy and amplification but cannot be used to identify the specific fusion variant precisely. In lung cancer, FISH detects both known and unknown fusion variants. Erin E. and colleagues assessed gene fusion detection in two lung cancer biopsies diagnosed via FISH. Its reliability and stability make it essential for validating other detection methods [84, 85].

Fig. 4
figure 4

Common methods for gene fusion testing. a Gold standard for gene fusion detection. In the lung cancer sample MO-16–000393, FISH revealed rearrangement of the ROS1 gene. Positive signals consisted of a fused set of red and green dots in each cell, along with at least one isolated green dot. Fused dots are indicated by white arrows, while isolated green dots are indicated by gray arrows [78]. Copyright 2019, Springer Nature. b RT-PCR detection of gene fusions. c Hybrid-capture approach [78]. Copyright 2019, Springer Nature. d Conventional MPCR utilizes gene-specific primers (GSPs) to selectively amplify known fusion junctions [86]. Copyright 2020, John Wiley and Sons. e Design of a single-end duplex unique molecular identifier (UMI) adapter [87]. Copyright 2019, Springer Nature

Polymerase chain reaction (PCR) techniques for gene fusion detection

PCR is a molecular biology technique employed to amplify specific DNA fragments, significantly increasing the quantity of DNA. PCR technology is frequently applied in the detection of gene fusions.

RT-PCR is also used to detect gene fusions, as it boasts short cycling times and high sensitivity. However, new gene fusions partners or complex structural rearrangements cannot be identified. Figure 4b illustrates the detection principle of the RT-PCR method (Fig. 4b) [88]. Currently, digital PCR (dPCR) is emerging as the primary method for detecting gene fusions. However, traditional PCR techniques often struggle to detect new gene fusions when used alone. With the development of the GeneChip exon array, comprehensive genome-wide exon expression analysis has become feasible. Wada et al. [89] introduced an innovative approach involving exon arrays for detecting abnormal gene structures. By combining RT-PCR methods, they successfully discovered 3 new gene fusions in breast cancer and pancreatic cancer cell lines.

Multiplex polymerase chain reaction (MPCR) is a variant of PCR that simultaneously amplifies multiple nucleic acid fragments using several primer pairs in a single reaction. The principles, reagents, and operational procedures of MPCR are similar to those of conventional PCR. MPCR enhances throughput, conserves samples, and reduces costs, albeit at the expense of additional time and effort. However, this approach produces more comprehensive and valuable data. Figure 4d shows a schematic of the conventional MPCR principle [86].

With ongoing research and the advancing maturity of detection technologies, a growing array of innovative PCR methods has emerged. Notably, techniques such as anchored multiplex PCR and single-primer extension (SPE) will be discussed in detail below, especially when it comes to their application combined with NGS. These advancements signify significant progress in enhancing the precision, sensitivity, and efficiency of PCR-based assays.

NGS for comprehensive gene fusion detection

NGS, also known as high-throughput sequencing, is characterized by its high output and high resolution. This technology excels in large-scale parallel sequencing, offering ultra-high throughput, excellent scalability, and rapid data processing. Additionally, NGS can be used for rapid whole-genome sequencing (WGS) and provides deep sequencing of targeted regions.

NGS-based gene fusion detection methods identify fusion events at both DNA and RNA levels. Library preparation is crucial for successful NGS, achievable through hybrid capture or amplicon methods. Hybrid capture uses sequence-specific probes to target specific genomic regions, while amplicon methods rely on multiplex PCR to enrich target sequences. Although hybrid capture provides detailed information about adjacent regions, it is complex, time-consuming, and prone to errors. In contrast, amplicon methods are simpler and faster.

Currently, RNA-based NGS is the most widely used method for detecting multi-target fusions. Commonly used fusion detection platforms include Illumina’s AmpliSeq, ArcherDX’s AMP technology, and QIAGEN’s QIAseq. All of these platforms use amplicon-based methods for library preparation.

AmpliSeq technology facilitates multiplex amplification of 12 to > 24,000 amplicons, enabling the simultaneous capture of multiple targets in a single reaction. This method allows for comprehensive transcriptome assessment and represents an innovative targeted whole-transcriptome RNA-seq approach for gene expression analysis [90]. AmpliSeq proves highly sensitive and cost-effective, particularly for large-scale gene expression analysis and precise mRNA marker screening [15]. ArcherDX’s patented AMP technology not only enhances the specificity of PCR amplification but also captures unknown ends of gene fusions (using universal primer ends). This method enables the discovery of more unknown gene fusions in cancer samples [67]. When used in conjunction with NGS, Qiagen’s SPE technology can amplify and analyze DNA sequences linked to potential gene fusion events. Unlike traditional PCR amplicon methods, SPE allows up to 20,000 multiplex PCR reactions in a single tube while maintaining over 95% specificity and uniformity [87].

Hybrid-capture approach in NGS-based gene fusion detection

As previously discussed, although RNA-seq has many advantages for gene fusion identification, traditional RNA-seq methods may overlook low-expressed transcripts and fusion transcripts in samples with low tumor content or low tumor cell fractions due to sequencing depth limitations. To overcome the limitations of traditional RNA-seq in terms of resolution and yield, Mercer et al. [91, 92] developed a new targeted RNA-seq method using hybrid-capture approach. This approach uses biotinylated oligonucleotide probes to enrich RNA transcripts of interest and integrates the gold standard FISH technique (Fig. 4c). By targeting and capturing hundreds of genes in a single assay, this method increases sequencing coverage, enabling sensitive detection of rare or low-abundance transcripts. Compared to traditional FISH and RT-PCR methods, targeted RNA-seq has improved the diagnostic rate from 63-76%.

AMP Technology: advancements in NGS-based gene fusion detection

As researchers explore the field, various PCR-based technologies have been developed. In a 2014 report, the introduction of single-primer PCR technology, specifically for AMP, offered a rapid target enrichment method for NGS (Fig. 3b). The technology begins by fragmenting the sample DNA and attaching molecular barcodes and universal primer sites, which significantly reduces PCR amplification bias. It then designs two PCR primers (GSP1 and GSP2) for the target region and performs nested PCR using the universal primer sites. Using AMP technology, a single-end primer design is sufficient to detect all fusion events, regardless of whether the gene fusions are known or novel (Fig. 4d). AMP has demonstrated adaptability to FFPE specimens with low nucleic acid content, cost effectiveness, and suitability for large-scale sequencing approaches, such as WGS or comprehensive transcriptome sequencing [93].

Haas et al. [94] describe AMP as a rapid NGS enrichment method for RNA and DNA. In their study, the library remained functional after just two rounds of PCR, demonstrating its cost-effectiveness and suitability for large-scale sequencing approaches such as WGS or comprehensive transcriptome sequencing.

Advancing targeted NGS analysis with unique molecular identifiers (UMIs) and SPE technology

Amplicon-based NGS analysis provides advantages in targeted enrichment. However, as the number of PCR cycles and primer multiplexing increase, the risk of bias and errors also increase. To address these issues, QIAseq NGS plates incorporate UMIs for PCR bias correction. Additionally, they utilize SPE technology to enhance design flexibility and precisely enrich target regions. Notably, SPE accommodates variable amplicon sizes without predefined limits, providing an efficient solution for specific enrichment needs [87].

UMI technology ensures precise labeling of each molecule in the sample library, aiding in error correction during sequencing and greatly improving accuracy. Peng et al. [87] achieved both duplex sequencing and enhanced MPCR using a unique duplex UMI adapter. This method enables the accurate detection of SNVs at allele fractions down to 0.1-0.2%. In comparison to existing targeted duplex sequencing methods, this new targeted sequencing approach not only leverages duplex UMIs to eliminate NGS artifacts but also streamlines the workflow (Fig. 4e).

Despite the development of numerous RNA-based sequencing methods, the cost of RNA-seq remains relatively high, at 980 euros per case [95]. It is crucial to address this issue, as researchers are striving to develop faster, more precise, and affordable alternatives.

Electrochemiluminescence (ECL) for high-sensitivity gene fusion detection: exploring new frontiers

ECL, though not commonly employed for gene fusion detection, stands at the forefront of technological innovation. It utilizes the power of luminescent signals generated through electrochemical reactions to accurately identify specific biomolecules, thus offering a novel, albeit less conventional, approach for the detection of gene fusions. This technique is executed via ECL emitters in a process where light radiation is initiated electrochemically through the energy relaxation of excited species, leading to highly sensitive detection of a diverse array of biomarkers [96,97,98]. By integrating these advanced processes, ECL provides groundbreaking advantages in the field of bioanalytical detection, making it a valuable tool for identifying gene fusions and various other biomolecular interactions. ELC requires crafting and labeling bespoke probes that identify gene fusion sequences, which then specifically hybridize with corresponding target sequences. By applying an electrical potential, the tagged luminescent reporter molecules on the probes are excited, emitting a detectable light signal. This signal is captured by a detection apparatus and subjected to software analysis, effectively confirming the existence of gene fusions.

Cheng et al. [76] have developed an innovative ECL biosensor for detecting the BCR-ABL gene fusion, which utilizes a CeO2/MXene heterojunction and leverages a dual-toehold strand displacement reaction to amplify the signal (Fig. 5a). Created through a one-step hydrothermal method, the CeO2/MXene heterojunction enhances ECL emission and serves as an effective electrode material. The presence of the BCR-ABL gene initiates a sequence of strand displacements, culminating in an ‘on–off’ ECL signal controlled by quenching labels attached to Pt nanoparticle-functionalized polydopamine composites. This biosensor showcases a wide detection range from 1 fM to 100 pM and an impressively low detection limit of 0.27 fM, offering a promising approach for the molecular diagnosis of chronic myelogenous leukemia. Lv et al. [97] reported another ECL biosensor capable of sensitively detecting the BCR-ABL gene fusion by employing a highly efficient DNA walking strategy on electrode surfaces (Fig. 5b). This method constructs regulated DNA tracks using a supersand wich hybridization chain reaction to generate linear double-stranded DNA. These tracks, assembled on an electrode modified with an Au nanoparticle-enhanced g-C3N4 nanohybrid, facilitate the movement of bipedal DNA walkers. These walkers, triggered by the presence of the BCR-ABL gene fusion, replace quenched folic acid-labeled strands, thereby enhancing ECL signal for amplified detection. This precise control over the DNA walking process significantly improves signal amplification efficiency, achieving a detection limit of 0.18 fM for the BCR-ABL gene fusion. This breakthrough demonstrates substantial promise for clinical molecular diagnostics applications.

Fig. 5
figure 5

Electrochemiluminescence method for gene fusion testing. a Schematic illustration of an ‘on–off’ ECL biosensor based on configuration-entropy driven DT-SDR and CeO2/MXene heterojunction for highly sensitive detection of BCR-ABL gene fusion [76]. Copyright 2022, Elsevier. b Schematic Diagrams of the Ultrasensitive ‘on–off’ ECL Biosensing for BCR-ABL Gene fusion Determination on Basis of the Well-Regulated Track-Based DNA Walker and Au@g-C3N4 NHs [97]. Copyright 2020, ACS publication

ECL bioanalysis merges the advantages of electrochemical and luminescence technologies, making it highly effective for clinical diagnostics, biomarker discovery, and environmental monitoring due to its sensitivity and low noise. Despite its promise in gene fusion detection, ECL encounters several significant challenges, including the necessity for highly specific probes, complex sample preparation processes, and a heavy reliance on bioinformatics. Addressing these hurdles effectively requires leveraging advanced technologies to enhance ECL’s capabilities and overcome its limitations. These challenges indicate that while ECL is powerful, its application in gene fusion detection requires further innovation and strategic adaptation. Thus, ECL’s significant potential in detecting gene fusions will likely depend on tailored approaches and integration with other technologies.

Enhanced fusion transcript identification: bioinformatics tools for high-throughput gene fusion detection

Gene fusions are detectable via both DNA-seq and RNA-seq, although RNA-seq is preferred because of its simplicity and cost-effectiveness. As a result, several tools have been developed for detecting fusion transcripts from RNA-seq data, and many studies have compared these RNA-seq techniques [99,100,101,102,103]. However, these studies often focus on specific RNA-seq analysis steps, and workflow analysis is usually limited to only a few stages [104, 105].

The accurate identification of fusion transcripts is essential for obtaining a comprehensive understanding of the cancer transcriptome. Numerous bioinformatics tools, such as FusionSeq, deFuse, Fusion-Hunter, TopHat-Fusion, and STAR-Fusion, can be used to detect gene fusions. In Table 4, an overview of frequently used bioinformatics analysis software methods is presented.

Table 4 RNA-seq-based fusion transcript predictors evaluated [94]

In gene fusion analysis, high-throughput sequencing methods are frequently used to detect potential gene rearrangements. Software tools such as STAR-Fusion, Arriba, and FusionCatcher are part of the category of programs designed for discovering gene fusions from transcripts [113]. These software applications typically initiate the process by mapping reads to reference genomes through internal alignment. They subsequently identified soft-clipped reads and compared their sequences to the reference genome to identify potential fusion partners.

FuSeq: a computational breakthrough for high-throughput gene fusion detection

With research advancements, the introduction of paired-end RNA-seq has simplified the identification of gene fusions. However, current methods often involve extensive computational requirements, limiting their utility for routine analysis of large sample sets. To address this challenge, Vu et al. [114] have introduced FuSeq, a mapping-based approach tailored for the swift and precise detection of gene fusions. FuSeq efficiently aligns reads, extracts initial fusion candidates from segmented read groups, and subsequently applies multiple filters and statistical tests to pinpoint the final candidates. FuSeq stands out for its computational efficiency, speed, and ability to enable high-throughput gene fusion studies.

STAR-Fusion: fast fusion transcript detection

STAR-Fusion, designed for the rapid and precise identification of fusion transcripts from RNA-Seq data, utilizes the output of STAR for detection [110, 115]. It offers highly accurate and sensitive gene fusion assays, making it adept at detecting low-frequency or rare fusion events. This makes STAR-Fusion a valuable tool in various research domains, including cancer genomics and other disease-related studies. This approach capitalizes on the computational efficiency of STAR for sequence alignment, boasting an impressive end-to-end processing rate of 550 million Illumina reads using 12 threads. However, this speed, which is 450 times faster than that of its nearest competitor, TopHat2 [110], comes with high memory demands. Analyzing a single human genome sample requires approximately 30 GB of memory, which increases to approximately 40 GB for gene fusion detection, necessitating substantial computational resources. Although exceptionally efficient, STAR-Fusion has limitations, particularly in identifying breakpoints between genes and within introns, where tools such as Arriba may offer more precise results.

FusionCatcher: detecting somatic gene fusions in RNA-seq data

FusionCatcher is a versatile tool adept at identifying both known and previously unknown gene fusion events. Moreover, this approach is highly sensitive, particularly for detecting low-frequency gene fusions, making it suitable for a broad spectrum of gene fusion testing applications in cancer research and other disease contexts [111]. The tool incorporates an integrated quality control mechanism that automatically filters out artifacts and low-quality data, thereby enhancing the accuracy of its findings. FusionCatcher primarily conducts its analysis at the RNA level, with a specific focus on meticulously examining the transcriptome. The tool employs the Ensembl genome annotation [116] and Bowtie aligner [107] for processing sequencing reads, treating them as single reads within the transcriptome context. Demonstrating robust performance, FusionCatcher has proven effective in detecting fusion events and validating them via real-time PCR in RNA-seq data from tumor cells. Notably, this technique has achieved a remarkable success rate, identifying the DNAJB1-PRKACA fusion in 100% of fibrolamellar hepatocellular carcinoma patients, underscoring its efficacy and reliability [117].

Software tools such as STAR-Fusion and FusionCatcher lack the capacity to recognize breakpoints that occur between genes and inside introns. Additionally, duplications of exons and gene fusions that are inverted cannot be detected. Current gene fusion detection tools exclusively utilize RNA data to identify gene fusions, hence restricting their ability to detect gene fusions that exist in DNA but are not expressed in RNA.

Arriba: rapid and accurate gene fusion detection in RNA-Seq data

Arriba is a specialized software package for the detection of gene fusions within RNA-Seq data. It stands out due to its efficient algorithms and parallel computing capabilities, allowing it to process swiftly large datasets. This approach increases the overall efficiency and fault tolerance of the analysis, enabling it to handle data from diverse samples and experimental conditions, thereby enhancing its versatility in various research scenarios [118]. Unlike many other fusion detection pipelines, Arriba can utilize existing STAR alignments, eliminating the need for separate read alignments designed specifically for gene fusion analysis. Arriba excels in detecting intragene inversions, duplications, and translocations within both intron and intergene regions, demonstrating its remarkable performance, particularly when analyzing a small quantity of fusion transcripts [117]. Additionally, Arriba includes visualization tools, which greatly aid in the interpretation and understanding of gene fusion events.

Arriba excels in sensitivity across four benchmark dataset types, with remarkable specificity even at lower fusion transcript concentrations, approaching optimal true-positive enrichment. Like FusionCatcher, Arriba accommodates known fusions. Its workflow, similar to that of STAR-Fusion but more efficient, reduces the fusion candidate filtering time. By running Arriba sequentially, peak memory usage can be minimized, though at a slight runtime extension [118]. In summary, Arriba is a high-performance gene fusion detection tool that offers exceptional sensitivity, specificity, and multitype fusion detection capabilities. It also provides intuitive visualization tools for rapid and accurate fusion event detection.

Advancing gene fusion research through artificial intelligence (AI)

Artificial Intelligence (AI) enables computers to perform tasks that typically require human intelligence, such as voice recognition, natural language understanding, image recognition, problem-solving, and learning. AI aims to create intelligent systems that can adapt to new situations by mimicking human thought processes and learning capabilities. AI technologies are categorized into rule-based systems, machine learning (ML) [119], deep learning (DL) [120], natural language processing (NLP) [121], and computer vision [122], among others. Rule-based systems rely on logical rules, while ML and DL learn patterns through data. NLP allows machines to understand human language, and computer vision enables them to process information from images.

As mentioned earlier, despite its potential for gene fusion detection, ECL faces hurdles such as the need for specific probes, complex sample preparation, reliance on bioinformatics, and competition with established methods like NGS and RT-PCR. Therefore, the significant potential of ECL in detecting gene fusions could be fully realized by leveraging AI to address the challenges of complex probe design and bioinformatics analysis. The advantages of AI, including efficient automated design, optimization of specificity and sensitivity, rapid processing of large-scale data, and precise prediction of gene functions and disease associations, can overcome existing bottlenecks in ECL for gene fusion detection.

The complexity and diversity of gene fusions, coupled with the vast amount of genetic data, necessitate advanced computational methods for their identification, characterization, and functional analysis. DL, particularly convolutional neural networks (CNNs), recurrent neural networks (RNNs), and their variant, long short-term memory (LSTM) networks, have shown immense potential in the field of gene fusion research [123]. These sophisticated AI models, with their exceptional ability to recognize complex patterns in data, have been applied to identify gene fusions from genomic sequences and transcriptomic data. By learning from the inherent features of the data rather than relying on explicitly programmed genetic characteristics, these models aim to capture the linear structure and spatial dependencies within DNA and RNA sequences, providing powerful tools for the detection and analysis of gene fusions [124]. Furthermore, DL not only plays a crucial role in detecting gene fusions, it significantly advances our understanding of the functions of fusion proteins. By analyzing the genetic sequences of fusion proteins, AI models can predict their oncogenic potential, interactions within cellular pathways, and possible impacts on tumor behavior, thereby revealing new therapeutic targets and offering personalized and precise approaches for cancer treatment.

Conducted by Marta Lovino [125], the study employs CNNs for annotating chimeric transcripts using raw fusion sequence data, aiming to bypass potential biases from protein domain analysis. By feeding the actual amino acid composition of fused proteins into CNNs, the model learns classification features directly, eliminating the need for handcrafted descriptors. The network generates a 0–1 score reflecting the chimeric transcript’s potential oncogenic involvement, which classifies gene fusions into oncogenic or not, with associated confidence levels. This approach showcases the versatility of CNNs in pattern recognition and classification, highlighting their significant potential in bioinformatics without relying on additional protein domain data.

Pora et al. [126] have developed a deep residual neural network model named FusionAI, designed to predict potential gene fusion breakpoints based on RNA-seq data. FusionAI operates by solely utilizing raw DNA sequences as input, attempting to decipher the characteristics enriched in genomic breakpoint regions. This model leverages gene fusion data from The Cancer Genome Atlas (TCGA) to distinguish between fusion-positive and fusion-negative breakpoints. Through the analysis of approximately 26 K fusion breakpoints and application to multiple external validation datasets, FusionAI has demonstrated its potential to enhance the specificity of gene fusion prediction. Additionally, the model explores genomic features associated with fusion events, including specific DNA sequence motifs enriched in fusion-positive sequences. Overall, FusionAI represents an example of the application of DL in studying human genomic breakages and their related genomic regions, showcasing the powerful potential of DL in complex genomic data analysis.

Vipulkumar et al. [127] developed a DL-based algorithm to identify ERG rearrangement status in prostatic adenocarcinoma using only digitized slides of H&E morphology. The study utilized whole slide images from 392 cases, annotated with QuPath, and exported image patches of 224 × 224 pixels at 10 × , 20 × , and 40 × magnifications for input into a MobileNetV2 convolutional neural network model. Separate models were trained for each magnification using 261 cases for training and 131 for testing. The model output predicted ERG-positive (ERG rearranged) or ERG-negative (ERG not rearranged) status for each input patch. The models showed similar ROC curves with AUC results ranging from 0.82 to 0.85, and the 20 × model demonstrated sensitivity and specificity of 75.0% and 83.1%, respectively. This DL model successfully predicted ERG rearrangement status in the majority of prostatic adenocarcinomas, potentially eliminating the need for ancillary studies to assess ERG gene rearrangement.

While AI offers numerous advantages in gene fusion detection and analysis, it also has notable limitations. A significant challenge is the high false negative rate, as AI models may miss certain gene fusions. The need for specific probes and complex sample preparation can reduce the efficiency and practicality of AI applications. Additionally, the reliance on bioinformatics for data processing and analysis add to complexity, requiring specialized knowledge and resources. AI models are also vulnerable to biases in training data, which can affect the fairness and accuracy of their predictions. Addressing these limitations will be crucial for fully realizing AI’s potential in gene fusion detection and analysis.

Future prospects of AI in enhancing gene fusion research and personalized medicine

The future of AI in gene fusion research is promising, with advancements anticipated in several key areas. As DL technology continues to evolve, AI models are expected to become more accurate and sensitive, capable of detecting subtle fusion events within complex genomic data, even in early-stage cancers or low-abundance samples. AI will play a significant role in predicting the functions and disease relevance of gene fusion products, shedding light on their involvement in cellular signaling pathways, metabolic routes, and association with specific disease states. This insight will propel personalized medicine forward, utilizing individual genomic information, including gene fusion characteristics, to tailor treatment plans, match patients with the most effective drugs and therapies, and predict responses to specific treatments. Furthermore, AI will increasingly integrate data from various biological levels-genomics, transcriptomics, proteomics, and metabolomics-offering a more comprehensive biological perspective and deepening our understanding of complex biological processes. Advances in wearable devices and biomarker detection technology may also enable AI to play a role in real-time monitoring of gene fusion expression and activity, aiding in early disease diagnosis, treatment monitoring, and disease progression. Overall, the future trends in AI applications for gene fusion research and applications are set to focus more on precision medicine, multi-omics data integration, and real-time disease monitoring, driving new advancements in biomedical research and clinical treatments.

Summary and outlook

Various techniques are available for assessing gene fusion, which can be categorized as single-gene detection or multiplex detection, traditional or next-generation methods, and DNA-, RNA-, or protein-based methods. Single-detection methods include FISH, IHC, and RT-PCR [74]. With the rapid development of NGS technology, this approach has been applied to gene fusion detection. NGS applications encompass WGS, WES, and RNA-Seq, each offering different levels of research possibilities [127]. Table 5 presents a summary of commonly used gene fusion detection methods, comparing specificity, sensitivity, and cost based on recent reviews or articles in the field for each marker across various technology applications.

Table 5 Commonly used gene fusion detection methods

The advent of molecular diagnostic technology in tumor molecular pathology studies has been revolutionary, broadening the range of pathological research. It enables the examination of tumor source, structure, and biological characteristics at the molecular level. Many molecular diagnostic techniques have become increasingly mature.

Accurate gene fusion detection is essential for assisting clinicians in cancer diagnosis, classification, and treatment planning. Current gene fusion diagnostic methods, such as FISH, IHC, and RT-PCR, although sensitive, often lack the resolution and throughput required for comprehensive analysis. These traditional methods heavily rely on tester expertise and may not identify novel fusion partners or address complex structural rearrangements. They typically detect a single gene fusion, resulting in time-consuming, costly, and potentially prone to false-negative results.

By utilizing fluorescence, hybridization, and other detection techniques, the reliance on subjective assessment by technicians is reduced, hence improving the credibility and accuracy of the detection process. Nevertheless, despite its convenience and speed, the NGS-based anchoring amplicon method is prone to false positives. To improve the precision of these methods, it is crucial to incorporate bioinformatics methodologies to exclude gene fusions with low confidence levels.

Due to its cost-effectiveness, heightened sensitivity, broad applicability, and capacity to identify previously unidentified gene fusions, the DNA-targeted sequencing technique has gained extensive use. However, the identification of gene fusions using only DNA-targeted sequencing also has specific limitations: the sensitivity of the NGS technique is directly proportional to the coverage. If all intron length fragments need to be covered, the cost of detection will significantly increase. Furthermore, certain intron regions present challenges for probe design. Therefore, if a fusion breakpoint involves untargeted intronic regions that cannot be covered by the probe, the sensitivity of detection may be compromised.

In contrast to DNA-seq, RNA-targeted sequencing technology offers solutions to several of the aforementioned challenges in gene fusion detection. First, the design of probes for transcripts is not hindered by intron regions, thereby significantly enhancing sensitivity and cost-effectiveness. Second, RNA-Seq provides the additional benefit of assessing the function and expression of gene fusions while simultaneously revealing them, allowing for the secondary validation of gene fusions.

In the gene fusion analysis process, software tools are commonly employed to aid in the analysis. It is important to note that different analysis software may yield varying false positive rates. As a result, several studies have utilized multiple fusion analysis software packages simultaneously to enhance specificity [78]. The selection of a particular technology is not inherently a matter of superiority or inferiority; it is primarily driven by the specific requirements of the detection process. For certain clinical needs, such as the swift identification of known fusions or in situations that involve low-throughput and frequent testing, quantitative reverse transcription PCR (RT-qPCR) has become particularly relevant, especially within the realm of blood diagnostics. Furthermore, the use of FISH technology, especially when integrated with imaging data, is recognized as a convenient and effective method for gene fusion detection.

In contrast, NGS technology is suitable for high-throughput and unknown fusion detection; however, it has relatively lower sensitivity, higher costs, and longer processing cycles. When selecting a technology, it is advisable to balance and choose based on specific needs.

The prospects for AI in the realm of gene fusion research are exceptionally promising, with advancements in DL set to refine the precision of detecting nuanced fusion events amidst the complexity of genomic data. AI is rapidly extending its influence, promising to accurately predict the functionalities and clinical implications of gene fusion products. This progression is ushering in a new era of personalized medicine, where treatments are customized based on the unique genomic signatures of individuals. Moreover, the capacity of AI to amalgamate data across genomics, transcriptomics, proteomics, and metabolomics is offering a more nuanced and comprehensive understanding of biological mechanisms. The advent of innovative wearable technologies and advancements in biomarker detection are poised to enable real-time tracking of gene fusion activities, significantly improving early diagnosis, treatment monitoring, and the management of disease progression. Fundamentally, AI is on the brink of catalyzing breakthroughs in precision medicine, intricate data integration, and continuous health assessment, transforming both the research landscape of gene fusions and their clinical application.

Continuous advancements in technology are significantly enhancing our ability to perform gene fusion detection and related analyses. With the development of technology, we have gained increasingly comprehensive insight into the genesis, functions, and connections of gene fusions with various diseases. This progress not only boosts the accuracy of gene fusion detection but also amplifies its clinical significance. Consequently, ongoing technological innovations are opening up new vistas for a deeper understanding of gene fusions and broadening their potential clinical applications.