History of RNA biology

In 1958, Francis Crick established the central dogma of molecular biology by discovering the sequence of events in the passage of genetic material contained in DNA to the functioning of biological processes through proteins. However, with the development of new technologies and robust next-generation sequencing, large international consortiums such as the Functional Annotation of the Mammalian Genome (FANTOM) and the Encyclopaedia of DNA Elements (ENCODE) have described pervasive transcription (that 80% of the DNA is transcribed into RNA but only a 1.5% of that RNA translates into protein) (Carninci et al. 2005; Hangauer et al. 2013). Recent technological advances, like next-generation deep sequencing, have shown that the bulk of the genome is translated into RNAs. The universe of RNA is divided into two halves: (1) RNAs with coding potential and (2) RNAs without coding potential, sometimes known as non-coding RNAs, because of only 1 and 2% of the human genome codes for proteins (The ENCODE Project Consortium 2012). Although mRNAs have been studied in depth, most RNAs are ncRNAs. Even though ncRNAs were formerly regarded as “evolutionary junk,” new research shows that they substantially impact several molecular pathways. According to the hypothesis known as the “RNA universe,” RNA was the earliest form of life, and as DNA became more solid, RNA’s function as a messenger was left unfilled. However, it was eventually discovered that RNA is the most practical possibility in disease, epigenetics, and unknown regulatory features since it has a wide range of latent catalytic capabilities and can store genetic information (Bhatti et al. 2021). During evolution, RNA is thought to have evolved alongside proteins and DNA (Robertson and Joyce 2010). Understanding their intricate relevance in numerous biological processes, including homeostasis and development, is critical (Amaral et al. 2013). Figure 1 demonstrates the molecular events relate to non-coding RNA (Li et al. 2021a, b; Chhabra 2021).

Fig. 1
figure 1

Timeline of molecular discoveries of non-coding RNA

A relatively broad size criterion is used to classify ncRNAs into two subclasses. Small or short non-coding RNAs (ncRNAs) are ncRNAs that are less than 200 nucleotides (nt), while long non-coding RNAs are ncRNAs that are more than 200 nt (lncRNAs). These two groups are quite different from one another. LncRNAs can be as significant as several kilobases, and small ncRNAs can be as small as a few to 200 nt. The most well-known class of tiny ncRNAs, microRNAs (miRNAs), have a length of 20 nucleotides or less and have undergone substantial research (Kim et al. 2009). The other non-coding such as siRNA and piRNA. The complexity of these animals’ physiology, characteristics, and development, from lower non-chordates to humans, produces an increase in introns and intergenic sequences that are translationally modified by alternative splicing processes, leading to a further decrease in the size of this proteome (Mattick 2001).In addition, eukaryotes have more sophisticated and complex systems for RNA processing, trans induction, DNA methylation, imprinting, RNA interference (RNAi), post-transcriptional gene silencing, chromatin modification, gene editing, splicing, dosage compensation, gene regulation mechanisms, and transcriptional gene silencing (Mattick 2004). Non-coding RNA act as regulatory signal messengers for the stimuli received at sensory genetic elements (Guttman et al. 2011). The evolutionary history of prokaryotes supports their continued reliance on protein-based regulatory architecture, in contrast to eukaryotes, who have evolved new regulatory features and mechanisms to control the expression of phenotypic traits, the penetrance and expressivity of disease, and developmental programming using a variety of ncRNAs. Therefore, research on ncRNA about these linked pathways is essential to comprehend their function in health and disease (GAGEN 2005).

Distribution and types of ncRNA

RNA comes in a variety of forms in live cells. ncRNAs are typically split into two domains based on their transcript length: short ncRNAs (under 200 nucleotides) and long ncRNAs (over 200 nucleotides). ncRNA is important in several processes, including RNA maturation, RNA processing, signaling, gene expression, and protein synthesis (Kung et al. 2013; Morris and Mattick 2014). The amount of ncRNA and the degree of species conservation are remarkably correlated. According to estimates, each cell has 107 ncRNA molecules, most of which are snRNA, snoRNA, miRNA, rRNA, and lncRNA. Although about 53,000 distinct human lncRNAs identified, only about 1000 are present in adequate quantities to legitimately support their functional significance (Djebali et al. 2012). Other types of RNA and their specificities are mentioned in this study (Bhatti et al. 2021). The overview of non-coding RNA and its functions is mentioned in Table 1. The different types of RNA are mentioned in Fig. 2.

Table 1 Overview of non-coding RNA and its functions
Fig. 2
figure 2

Different types of RNA and major non-coding RNAs

Biogenesis and functions of different types of ncRNA

RNA molecules are much more than just a blueprint for protein production. Since non-coding transcripts are expected to function similarly to proteins and can regulate the majority of cellular functions, RNA may interact with DNA, proteins, and other RNA molecules to form three-dimensional (3D) structures. The two main regulatory RNA groups—small and long ncRNAs—are partly defined by their length. Additionally, functional ncRNAs with lengths between 20 and thousands of nucleotides have grown significantly in number and classification over the past ten years. This review focuses on significant ncRNAs such as miRNA, lncRNA, and circRNA. Few other RNA will be mentioned such as piRNA, snRNA, snoRNA, and siRNA. This ncRNA will play a significant role in developmental processes and disease conditions. Numerous genes are involved in the production of ncRNAs across the whole human genome, and there may potentially be distinct transcriptional units that function independently. Transcription, nuclear maturation, export to the cytoplasm for processing, and production of functional RNA are all steps in this biogenesis process. The detailed mechanism of non-coding RNA biogenesis is mentioned in this paper (Bhatti et al. 2021). The description of specific ncRNA and the description of biogenesis are mentioned in Table 2. Non-coding RNA is an integral part of genomics and proteomics. According to the “RNA world” hypothesis, RNA may have played a role in the emergence of life, which must be able to carry and duplicate its genetic material (Joyce 1989). In contemporary organisms that have evolved to use more effective methods to copy and express their genetic material along the central axis from DNA to RNA to protein, ncRNAs seem to have retained the majority, if not all, of their original characteristics and functions. Many RNA functions are transferred to proteins while others are kept because of the exploration of selective benefits of proteins and RNA during evolution. To grasp ncRNA function and mechanism, it may be instructive to compare ncRNA function with that of proteins.

Table 2 Description of ncRNA and its biogenesis

Comparison of miRNA, lncRNA, and circRNA in RNA biology

The mechanistic characterization of lncRNAs is far less thorough than that of miRNAs. This is partly because lncRNAs can control gene expression through intricate biochemical pathways at various levels inside the cell. Despite being present in a group of species (Guttman and Rinn 2012), such as plants (Swiezewski et al. 2009), yeast (Houseley et al. 2008), prokaryotes (Bernstein et al. 1993), and viruses (Reeves et al. 2007), lncRNAs are not as well conserved as miRNAs in terms of the nucleotide sequence. Even though lncRNAs with diverse nucleotide compositions can exhibit the same 3D structure and, consequently, the exact molecular function, this restricts the selection of cellular and animal models for researching lncRNA functions (Derrien et al. 2012). It is increasingly becoming clear that lncRNAs play a role in virtually every cellular process and that the expression of these non-coding molecules is carefully regulated in both normal conditions and several human diseases, including cancer (Tano and Akimitsu 2012).

Unlike coding genes, lncRNAs can be produced in many ways from practically any location in the human genome. Contrary to those that overlap coding genes on the antisense strand, unlike coding genes, lncRNAs can be produced in a wide range of ways from practically any location in the human genome. Contrary to those that overlap coding genes on the antisense strand, sense lncRNAs are made from segments that overlap one or more exons of another coding transcript (antisense lncRNAs); sense lncRNAs are made from segments that overlap one or more exons of another coding transcript. Other lncRNAs are produced by regulatory components like enhancers or non-coding DNA sequences like introns. Some have promoters and regulatory elements expressed from intergenic regions that do not overlap other known coding genes (Thum and Condorelli 2015). It becomes clear that just a tiny portion of the theoretically infinite number of lncRNAs that could exist have been studied thus far. However, those studied have demonstrated the capacity to control the transcriptional and post-transcriptional stages of gene expression by interacting with nucleic acids and proteins in a manner that is specific to both sequences and structures (Mercer et al. 2009; Wilusz et al. 2009). The categorization and annotation of putative lncRNAs must be carefully examined to remove protein-coding RNAs. While being categorized as non-coding molecules, some lncRNAs have recently been shown to be able to code for micro peptides (Anderson et al. 2015). Before concluding a lncRNA’s regulatory role, it is essential to prove that the skeletal muscle-specific RNA, which was previously thought to be a lncRNA, is encoded for a functional micro peptide. Evidence from recent studies revealed that conventional processes do not just regulate ncRNA expression. Circular RNAs are produced due to a back-splicing expression variation (circRNA). Since CircRNAs are made up of a covalently closed continuous loop, they lack a 5′ cap and a 3′ tail. This RNA species is more tissue-specific, moderately stable, and highly conserved (Jeck et al. 2012). The functions of each of these ncRNA were mentioned in this paper (Beermann et al. 2016). The discovery of associations between non-coding RNAs and diseases has created new therapeutic and diagnostic possibilities. Numerous miRNAs have already been effectively demonstrated to act as diagnostic or therapeutic targets for various diseases. There is specific evidence that circRNAs and lncRNAs behave similarly.

Non-coding RNA and human diseases

Functional RNA molecules known as non-coding RNA (ncRNA) cannot be translated into proteins (Djebali et al. 2012). Initially, there are only a few ncRNAs were found and studied. Later technological advancements, ncRNA types were classified into many, and each ncRNA has specific functions that lead to biomarkers and novel therapeutic approaches. Despite not all of their functions being understood, several ncRNA species play crucial roles in controlling the transcription and translation of genes and the transcription of ncRNAs. Therefore, it is no surprise that ncRNAs are crucial in normal physiologic functions, complex human traits, and human diseases (Li et al. 2018a, b). This review will mention the different types of diseases and their ncRNA as potential biomarkers and interactions in Table 3.

Table 3 Non-coding RNA and its biomarkers

Transposons: unexpected players in different diseases with different ncRNA

Transposable elements (TEs) are considered essential factors in the plasticity and evolution of the genome. Since TEs are so prevalent in the human genome, particularly the Alu and Long Interspersed Nuclear Element-1 (LINE-1) repeats, they are thought to be the molecular cause of several diseases. This encompasses a number of the molecular processes discussed in this article, including insertional mutation, DNA recombination, chromosomal rearrangements, changes in gene expression, and changes to epigenetic controls. Additionally, some of the more well-known and/or more recent cases of human disorders where TEs play a role are provided in this article (Chénais 2022). TEs are frequently linked to the genesis of human malignancies, whether through the insertion of LINE-1 or Alu elements that result in chromosomal rearrangements or epigenetic alterations. Numerous more clinical disorders may have their molecular roots in gene structure and/or expression changes or chromosomal recombination caused by TE. Hemoglobinopathies, metabolic, neurological, and joint disorders are among the many conditions this group of diseases represents.

Additionally, TEs may influence aging. The epigenetic derepression and mobility of TEs, which can result in disease development, appear to be significantly impacted by the pressures and environmental toxins that people are exposed to. As a result, a greater understanding of TEs may result in the development of novel possible disease diagnostic markers (Pradhan and Ramakrishna 2022).

Differences between exosomal and non-exosomal non-coding RNAs in human health and diseases

Circulating ncRNA transfer via exosomes is an intriguing method. As mediators for intercellular communication, ncRNAs can be enclosed by EVs (such as exosomes, microvesicles, and apoptotic bodies) and secreted from cells to control various diseases depending on the target cells (Li et al. 2021a). It has been demonstrated that ncRNAs exist in various bodily fluids, including serum, plasma, urine, saliva, and others, in addition to cells. The ncRNAs seen in biofluids are frequently called circulating or extracellular ncRNAs. The fact that extracellular ncRNAs are reasonably durable in plasma even though extracellular RNase activity is considerable in that environment suggests that circulating ncRNAs may be shielded from adverse circumstances. In this part, they examine how ncRNAs in exosomes and non-exosomes regulate physiological homeostasis and pathological events in health and disease (Li et al. 2021b).

Tools and methods

Investigating miRNA, lncRNA, circRNA, and other RNAs

The complete methods and investigation of ncRNA will be discussed. miRNA methods have already been thoroughly explained. Deep sequencing techniques or microarrays are the most used methods for miRNA detection. Deep sequencing is a more sensitive technique when compared to microarray-based techniques. Microarrays can lead to finding distinct RNA sequences despite using a fixed set of probes for detection (van Rooij 2011). However, the output analysis is more difficult because of the enormous volume of data and the critical requirement for bioinformatics expertise. Quantitative real-time PCR allows for the comparatively inexpensive and low-effort validation of screening results (qRT-PCR). Because the transcript is so brief, previous difficulties prompted the construction of the primer for reverse transcription. Target-specific stem-loop reverse transcription primers are currently offered on many platforms. Northern blotting and in situ hybridization are other techniques for identifying identified miRNAs. To find a miRNA’s targets, bioinformatics platforms are commonly implemented. The miRNA-related database is mentioned in Table 4. Luciferase tests are frequently used to verify expected targets of miRNAs following bioinformatics-based predictions of such targets. To completely comprehend the entire transcriptional regulatory scenario, small RNAs play a critical role in transcriptional regulation. Their abnormal expression profiles are believed to be linked to cellular dysfunction and diseases. Numerous studies are concentrating on detecting, predicting, or quantifying short RNA expression, particularly miRNAs, to better understand human health and disease.

Table 4 miRNA based tools and databases

The efficient and reasonably good next-generation sequencing approach allows the collection of large data sets with excellent accuracy. Appropriate bioinformatic procedures must be used to use the collected data and analyze for lncRNAs. Additionally, you can buy commercial arrays to look at the deregulation of a specific set of lncRNAs (e.g., Arraystar, Qiagen, Biocat). Another method to investigate the effect of lncRNAs is to use a genome-wide shRNA library to target a specific subset of lncRNAs. This library and additional studies might be used to ascertain how lncRNA inhibition influences signaling pathways or cell behavior. For instance, the lncRNA TUNA was discovered in mouse embryonic stem cells with Oct4-GFP using an shRNA library targeting 1280 lincRNA (Lin et al. 2014). The pros and cons of RNAi approaches are effectively summed up in a review written by Mohr et al. (Mohr et al. 2014).

Designing primers that only detect the ncRNA transcript is crucial for validating a screen’s results for lncRNAs. To identify coding from non-coding regions, this design is essential. A lncRNA often has modest levels of expression. In addition, lncRNA annotation is continuously evolving and may not be consistent across all databases (like Refseq, UCSC, and Ensembl). Since pseudogenes typically produce lncRNAs, the actual gene and the long non-coding transcript can be recognized using the same primers. Another difficulty arises when lncRNAs are expressed sense- or antisense-to a recognized protein-coding gene. LncRNAs are primarily found in cell nuclei. There are many challenges associated with pulling down lncRNA/protein complexes since it may provide false-positive outcomes. A highly reproducible RNA antisense purification (RAP) method was described in this paper (McHugh et al. 2015). In vitro, lncRNAs can be suppressed using a variety of compounds. It is also critical to confirm the length of annotated sequences for newly discovered lncRNAs. The rapid amplification of cDNA ends (RACE) method can amplify a lncRNA between a specific point inside the lncRNA and the sequence’s 3′ or 5′ end. The actual sequence can then be found or verified by cloning and sequencing this amplicon (Beermann et al. 2016). Detail-oriented loss- or gain-of-function studies are essential to comprehend a lncRNA’s activity in vivo (Bassett et al. 2014). Numerous lncRNA-related database was mentioned in Table 5.

Table 5 lncRNA based tools and databases

By searching current RNA-sequencing data for circular RNAs, a brand-new set of probable circRNAs can be predicted (Salzman et al. 2012). Data from long-read RNA sequencing can be utilized to look for possible circRNAs. This particular class of molecules requires a specific algorithm because their production may have involved back-splicing. Two studies demonstrate how to build a computational pipeline to identify new circRNAs (Guo et al. 2014). Using these new techniques to analyze RNA-sequencing data provides suggestions for existing circRNAs. Because the gene from which they are derived has a distinct orientation, the validation of these ncRNAs is particularly unique. Exonic circRNAs must be separated from other RNA molecules that have undergone backspacing. Divergent primers can be used in qPCRs to access the expression and access the predicted circRNAs.

Regarding the genomic area, these primers do not amplify toward one another but are somewhat away from one another. The circle can be amplified without amplifying the genomic areas (Jeck and Sharpless 2014). The functional circRNA can be accessed through previous RNA studies, which are still evolving. Other new approaches should be implemented for the circRNA. New tools and approaches to small ncRNA and circRNA were mentioned in Tables 6 and 7.

Table 6 Small ncRNA-based tools and databases
Table 7 circRNA-based tools and databases

Identifying non-coding RNAs (ncRNAs), which play a significant function in the cell, is a crucial topic in biological study. The discovery of ncRNAs is now conceivably feasible, thanks to recent developments in computational prediction technology and bioinformatics. This study introduces three key computational methods for ncRNA identification: homologous search, de novo prediction, and deep sequencing data mining. There are two methods for detecting the ncRNA identification Homologous information and machine learning approaches (i.e., common features)aforementioned computational detection techniques are mostly intended for short non-coding RNAs like miRNAs, tRNAs, siRNAs, and piRNAs. However, conventional methods like PT-PCR and Northern Blot are expensive. The calculation methods can never perform well when dealing with long non-coding RNAs (lncRNA). To the current knowledge, the primary lncRNA detection method is RT-PCR or CHIP-SEQ (Wang et al. 2013). The primary software tools and ncRNA discovery method tools are mentioned in Table 8. The techniques used for ncRNA discovery are mentioned in Table 9.

Table 8 Common techniques, databases, and tools used in ncRNA
Table 9 Different techniques used for ncRNA discovery

Applications of CRISPR/Cas9-mediated non-coding RNA editing in the targeted therapy of human diseases

Genome editing, also known as gene editing, refers to a range of scientific techniques that enable the modification of an organism’s DNA. These techniques enable adding, removing, or modifying genetic material at specific genomic regions. There are several genome editing methods, including ZFNs, TALENs, and CRISPR/Cas9. Comparison of these three approaches was mentioned in this article (Li et al. 2021a, b). The detailed structure and mechanism of these three different approaches were mentioned in this article (Li et al. 2021a, b).

CRISPR-Cas9, which stands for clustered regularly interspaced short palindromic repeats and CRISPR-associated protein 9, is a well-known example. The CRISPR/Cas9 system has evolved and developed quickly as a reliable, practical, user-friendly, and widely applied gene editing tool in just a few years. CRISPR/Cas9 has significantly impacted a wide range of industries, including agriculture, biotech, and healthcare. However, no industry has been affected by the technology more profoundly than cancer research, as indicated by the accumulating data in the rapidly expanding publications. The discovery and application of more specific Cas9 variants, limiting the duration of CRISPR/Cas9 activity, the use of inducible Cas9 variants, and the application of anti-CRISPR proteins (Zhang et al. 2021a, b). Further research is required to fully comprehend the governing principles of CRISPR/Cas9 specificity and to increase the sensitivity of off-target identification. Second, on-target mutagenesis typically occurs in double-strand breaks brought on by single-guided RNA/Cas9, leading to massive deletions (over several kilobases) and complex genomic rearrangements at the targeted loci, which can have pathogenic effects (Zhang et al. 2021a, b).

The research evidence accumulated to date has shown significant contributions made by genome editing systems to exploit therapeutic approaches for various types of human diseases, with the CRISPR/Cas9 system being particularly successful by directly affecting target gene loci or generating tools with multiple functions. There are other diseases these approaches were found to therapeutic drugs of clinical drugs mentioned in this article (Li et al. 2021a, b). The advancement of cell imaging, gene expression regulation, epigenetic modification, therapeutic drug development, functional gene screening, and gene diagnosis has also been aided by gene editing technologies at the same time. Innovative genome editing complexes and more focused nanostructured vesicles have improved efficiency and reduced toxicity during the delivery process, bringing genome editing technology closer to the clinic. It is reasonable to assume that genome editing technology has the potential to ultimately elucidate biological mechanisms behind disease development and progression, providing novel therapies and ultimately promoting the development of the life sciences, with further investigation into this technology (Li et al. 2020; Li et al. 2021a, b).

Non-coding RNA therapeutics

Long noncoding RNAs (lncRNAs) and microRNAs (miRNAs), as well as other types of non-coding RNAs (ncRNAs), are intriguing targets for therapeutic intervention in the treatment of cancer and a variety of other diseases. Many antisense oligonucleotides and small interfering RNAs have been used in the clinical use of RNA-based treatments over the past ten years, and several of these have acquired FDA approval. Trial findings, however, have been mixed up to this point, with some studies claiming strong effects and others showing minimal efficacy or toxicity. Clinical trials are being conducted on alternative entities like antimiRNAs, and interest is growing in lncRNA-based therapies (Winkle et al. 2021).In this review, the existing therapeutic RNA and clinical trial drugs will be mentioned in Table 10.

Table 10 Existing ncRNA therapeutic drugs and clinical trial drugs

Challenges in using ncRNA as biomarkers and therapeutic targets

Non-coding RNAs may be potential biomarkers and therapeutic targets because mounting data suggests they are critical regulators of the pathophysiological processes leading to many diseases. However, its clinical use has not been examined and may face numerous difficulties. First, non-coding RNAs are still being developed as biomarkers. Although RT-PCR, next-generation sequencing, and microarray analysis have been utilized in research examining the connection between non-coding RNAs and disease-specific traits, most of these investigations are still experimental. However, no research has examined the viability of choosing lncRNA/circRNA as novel biomarkers (Zhang et al. 2017a, b). The discovery of tissue- or organ-specific biomarkers would be beneficial for the early diagnosis, treatment, and intervention of organ failure, perhaps increasing the chance of disease-specific survival.

Because they differ from conventional medications, such as small-molecule and protein medicines, which are also known to work primarily on protein targets, RNA-based therapies are considered the next generation of therapeutics. First, RNA aptamers can produce pharmacological effects by blocking the activity of a particular protein target. Second, for controlling a specific disease, antisense RNAs (asRNAs), miRNAs, and siRNAs can be created to specifically target mRNAs or functional ncRNAs. Thirdly, to cure a monogenic condition, gRNAs may be used to precisely alter the target sequences of a particular gene. Thus, RNA therapies can potentially increase the number of druggable targets. On-coding RNAs are promising “next-generation” biomarkers since the issues mentioned earlier and difficulties can be resolved. Non-coding RNAs may one day serve as innovative treatment targets with the help of a more profound knowledge of the mechanism underlying those specific diseases.

Conclusion and future perspectives

The attractive new field of ncRNA research demonstrates a higher level of nature’s diversity. The complexity of ncRNA research results from the more significant than specified based on ncRNAs in cellular biology. Nevertheless, even though ncRNAs have recently been discovered, there have been significant advancements in clinical applications and diagnostic methods. This research will likely expand into a new area of more potent and particular medications and personalized medicine techniques, elevating patient care to a new level. Rapid developments in bioinformatics, sequencing technologies, proteomics, and microarrays have identified a wide variety of non-coding RNAs (ncRNA), which comprise most cellular mechanism regulators principally linked to eukaryotic complexity. It seems more difficult to comprehend the unique function of these non-coding RNAs with these varied ncRNAs having integrated, complicated networks and biological pathways. The use of ncRNA therapies in formal drug development will increase.

Further information has to be obtained, possible ncRNA medicines’ pharmacokinetics and dynamics need to be examined, and thorough toxicological studies are required. To advance the field, more tools are required. There will be more phase I/II clinical studies. This study aims to investigate and advance knowledge of the mechanisms and functions of ncRNAs in human health and disease and to pave the way for novel clinical diagnostic and therapeutic approaches. When dealing with the enormous quantity of ncRNAs that need to be analyzed, ML outperforms since it can quickly address the fundamental problem. By categorizing healthy and disease samples, the current analysis of ncRNAs using ML demonstrates reasonable accuracy, indicating that the differentiation pattern is apparent in those instances. Therefore, future research should concentrate on increasing the likelihood that the ML models will recognize the distinctive pattern of each disease. However, the use of ncRNAs may significantly rise in the following years, which will contribute to the development of successful precision medicine and more individualized therapies.