Introduction

The central dogma of molecular biology describes the basic principles of the transfer of genetic information between biological macromolecules in cells. Genetic information flows from genes to proteins, which comprise the material basis of life and are the main participants in life activities.1,2 Protein-coding genes make up <3% of the human genome, and only a small fraction in the remaining 97% of the genome (composed of noncoding genes, NCGs) is characterized.3 Many NCGs were previously defined as junk DNA, but they are truly functional elements.4 The emergent discovery of noncoding RNA returned NCGs into the focus of life scientists, encouraging them to view NCGs from a new perspective. Noncoding RNA plays a broad and important role in regulating gene expression and various life activities through the formation of RNA–protein complexes5,6 or through base complementation.7,8 Noncoding RNA is classified into many categories. Small nuclear RNA has been a recognized noncoding RNA for a relatively long time. Its main function is to participate in the processing of mRNA precursors. The RNA components in splicing bodies such as U1, U2, U4, and U6 are small nuclear RNAs.9 MiRNAs constitute a class of single-stranded RNA molecules encoded by endogenous genes, and are ~22 nucleotides in length. They are involved in the regulation of posttranscriptional gene expression. They can bind to the untranslated region (UTR) of target gene mRNA from which it guides either an RNA-induced silencing complex (RISC) to prevent mRNA translation or AGO proteins to cleave mRNA, to achieve endogenous gene expression.10,11 CircRNA was first discovered in viroids, in which the genome is a single-stranded circular RNA molecule.12 CircRNAs can act as molecular sponges to counteract the role of miRNAs. CircRNAs can also act as scaffolds for different molecular interactions.13 Long noncoding RNAs (lncRNAs) are considered noncoding because they lack obvious long protein-coding open-reading frames (ORFs), although new evidence shows that some lncRNAs are truly coded into proteins. LncRNAs have been proposed to have diverse functions, including transcriptional regulation, organization of nuclear domains, and regulation of gene expression.14 Currently, the NCG revolution has been leveraged to study all living organisms.15,16

Moreover, with the development of technologies such as ribosome profiling and high-throughput sequencing in addition to protein database searches for large-scale proteomic analysis, some novel peptide annotations have been found that do not match currently annotated protein-coding genes; in contrast, they correspond to the genes of noncoding RNAs, pseudogenes, UTRs, etc., which were previously considered to be NCGs.17,18 Recently, an increasing number of experiments have indicated that NCGs can indeed be translated,19,20 and that the translation products are mainly polypeptides or micropeptides.21,22 NCG peptides can be directly verified by western blotting (WB) using specific antibodies. In addition, NCG peptides can be combined with epitope tags such as FLAG, human influenza hemagglutinin (HA), or green fluorescent protein (GFP) to form fusion proteins. The resultant fusion proteins can be detected through WB or fluorescence imaging technology. Mass spectrometry techniques, such as liquid chromatography with a tandem mass spectrometer, can also confirm the presence of NCG peptides by analyzing the signals of the NCG peptides (Fig. 1). These peptides have a wide range of biological functions. Interestingly, some NCG peptides have significant tissue-specific distribution patterns and can undertake finely tuned local regulation in a tissue-specific manner.

Fig. 1: Many laboratory techniques support the idea that peptides derived from noncoding genes exist.
figure 1

a Design of antibodies against NCG peptides and verification by western blot analysis.38 b Mass spectrometry is used to identify specific signals of noncoding gene peptides.39 c Immunofluorescence images of peptide-FLAG fusion protein (red, NCG-peptide NoBody; DIC, differential interference contrast).75 d Immunofluorescence images showing expression of the FLAG-tagged NCG peptide (green, NCG-peptide CASIMO1; red, actin filaments; blue, nucleus).98 e Images of the immunolocalization of the NCG peptides (red, NCG-peptide miPEP171b; green, autofluorescence).113

In this review, we summarize the structure, action modes, and biological roles of peptides derived from NCGs (Fig. 2). The NCG-derived peptides (termed NCG peptides) discovered thus far are summarized in Table 1, and are critically discussed in this review. The appearance of these peptides suggests that a portion of the genome that encodes proteins or peptides is much larger than that previously recognized. Finally, we address the biological and medical significance of NCG peptides and propose future directions for studying NCG peptides to advance the field. We believe that a deeper exploration into this subject will explain some mysteries of life more precisely and in greater detail, and thus lead to new biomarkers for disease diagnosis and therapeutics.

Fig. 2: The biological functions of NCG peptides.
figure 2

The expression of noncoding genes is achieved through central rules. After transcription, alternative splicing results in a variety of transcripts, some of which are translated into peptides. These peptides play important roles in modulating muscle formation and performance, suppressing metabolic reprogramming, controlling epidermal morphogenesis, promoting pre-miRNA transcription, regulating mRNA translation, integrating aspects of the stress response, facilitating gastrulation formation, enhancing metabolic homeostasis, and inducing or suppressing tumorigenesis and/or tumor progression.

Table 1 NCG peptides summary

Action modes of NCG-derived peptides

NCG peptides are different from traditional proteins in hierarchical structures

The correct spatial folding of protein structures is the basis of formal biological function.23 The spatial conformation of the protein is described with four hierarchical structures. The primary structure, i.e., the order of the amino acid residues from the N-terminus to the C-terminus, is determined by the order of nucleic acid in the corresponding genes. On the basis of the primary structure, atoms on the peptide chain backbone form local substructures, known as the secondary structure. Several consecutive secondary structures can be combined into a “supersecondary unit”, and a plurality of such units further form a “structural domain”, which constitutes the tertiary structure.24,25 The structural domain is self-stabilizing and prominent such that the host proteins can maintain proper biological function.26,27 The tertiary structure is the spatial arrangement of all the atoms in one peptide chain. In the traditional sense, a protein is determined by the formation of a tertiary structure. The spatial arrangement and functional cooperation of the subunits result in the quaternary structure.28 The length of most NCG peptides contains fewer than 100 amino acid residues (aa), with the shortest being only 9 aa long.29 The number of amino acids is the basis for the formation of complex protein structures. To form even the simplest transmembrane α-helix (TMH) structure, 30 amino acids are needed, and unstructured spacer regions between different structures in the protein are also required.30 Hence, in contrast to conventional proteins, NCG peptides usually do not form a complicated structure, but have different modes of action, as described below. Although some circRNA-derived NCG peptides are composed of >100 aa, they are much smaller than most traditional proteins (for example, FBXW7 has 185 aa and β-catenin has 370 aa). Considering that most circRNAs are derived from exons, more evidence is needed to determine whether some circRNAs can be classified as other types of messenger RNA. The recently discovered circRNA-derived NCG peptides with clear mechanisms of action tend to function through interactions with other proteins and their mechanisms that are also discussed below.

NCG peptides function in a sequence-independent or sequence-dependent manner

Scanning by the 40S–Met-tRNAi complex (43S complex) is the major process before translation initiation and involves binding to mRNA.31,32 A part of a polypeptide is translated from an upstream open-reading frame (uORF) in the 5′UTR and is conserved among species according to phylogenetic analysis.33 A class of regulatory peptides translated from uORFs creates a peptide-sequence-independent ambuscade for the 43S complex, as it seeks a downstream start codon (Fig. 3). Through this ambuscade, the scanning process is blocked. However, a sequence-dependent approach is more common. Some NCG peptides can act as competitive inhibitors through the same sequence as the proteins with which they are homologous. Many of the circRNAs are derived from the back-spliced exon of their maternal genes.34,35 Therefore, different RNA forms of the same gene share partially repeated sequences that encode polypeptides. For example, the SNF2 histone linker PHD RING helicase (SHPRH)-146aa (Table 1) is a peptide translated from a cirRNA. Full-length SHPRH, encoded by the maternal gene of Circ-SHPRH, is an E3 ligase. It promotes ubiquitinated proteasome-mediated degradation of proliferating cell nuclear antigen (PCNA), which leads to inhibited cell proliferation.36,37 Another E3 ligase, denticleless E3 ubiquitin protein ligase (DTL), induces the ubiquitination of SHPRH. Two sites (K1562 and K1572) of DTL-initiated ubiquitination in SHPRH are also found in SHPRH-146aa. Therefore, SHPRH-146aa acts as a competitive inhibitor to suppress the ubiquitination of SHPRH, which results in the accumulation of SHPRH and the subsequent degradation of PCNA.38 The peptide translated from the circRNA of FBXW7 was named FBXW7-185aa (Table 1). FBXW7-185aa induces the accumulation of FBXW7α and the degradation of C-myc through the same mechanism as that used by SHPRH-146aa.39 Circ-0004194 originates from the β-catenin gene locus and is also known as circβ-catenin. Circ-0004194 can produce a a β-catenin isoform comprising 370 aa, termed β-catenin-370aa. β-catenin-370aa serves as an effective competitor by binding GSK3β to protect full-length β-catenin from being phosphorylated and subsequently degraded (Fig. 4).40

Fig. 3
figure 3

Scanning PICs that participate in the translation of uORFs can be reinitiated at the ORF in the coding region.

Fig. 4: Action mode of circRNA-derived peptides.
figure 4

CircRNA-derived peptides downregulate the ubiquitination of the full-length protein derived from the same maternal gene as a competitive inhibitor, which results in the accumulation of full-length proteins and the consequent effects.

NCG peptides function by binding other proteins to change their conformation

Myoregulin (MLN) (Table 1) is translated from LINC00948, and the small open-reading frame (sORF) encoding MLN is located on exon 3 in the parent gene of LINC00948. The secondary structure of MLN contains a C-terminal transmembrane alpha helix. The output of computational molecular modeling demonstrates that the α-helix interacts directly with the groove jointly shaped by the M2, M6, and M9 spirals in sarco-endoplasmic reticulum Ca2+-ATPase (SERCA) to modulate intracellular calcium metabolism.41 In addition to the biochemical data, cryo-electron microscopy has revealed the action mode of fungal arginine attenuator peptide (AAP) (Table 1) directly from a structural perspective. AAP is encoded by an uORF and can lead to stalled translation.42 Cryo-electron microscopy has shown that AAP interacts directly with ribosome tunnel components, including RNAs and proteins, which are sandwiched between residues L4 and L17 in the large subunit.43,44 Mutations in AAP residues that interact directly with the ribosome can abolish the stalling effect. In addition, the C-terminus of the AAP forms a helix, which may contribute to the conformational change that accommodates the peptidyl transferase center (PTC). Through the direct interaction of secondary structures, AAP changes the conformation of the PTC, causing translational stalling. NCG peptides can act as domain-specific adapters in addition to inducers of conformation changes of other proteins. The Drosophila MRE29 gene is considered a NCG and is also known as pri (polished rice).45 In fact, pri encodes a 11–32 aa polypeptide (Table 1).46 At the 13–16-day stage of embryonic development, pri peptides are expressed and act as a specific adapter that mediates the specific binding of E3 ligase Ubr3 to the N-terminus of Shavenbaby (Svb). Consequently, the N-terminus of the ubiquitinated Svb is truncated by a proteasome. In addition, two folded regions in the C-terminus prevent Svb from complete degradation.47 Pri peptides contribute to proper Svb processing and convert the suppressed Svb into an active form.

NCG peptides act as signaling pathway molecules

In humans, the mitochondrial genome is a circular and closed genetic system that includes encoding genes of 13 proteins and NCGs of rRNAs and tRNAs.48,49 However, previously unknown transcripts of nuclear and small RNAs were recently discovered in the mitochondria.20,50 Furthermore, there is a sORF in mitochondrial 12S rRNA that can be translated into a peptide of 16 aa, named MOTS-c (Table 1). MOTS-c inhibits the folate cycle, leading to accumulating AICAR (5-aminoimidazole-4-carboxamide ribonucleotide), which can activate the AMPK pathway. Through this signaling pathway, MOTS-c has an extensive impact on cellular and organismal metabolic homeostasis.51 Toddler RNA, also known as Apela/Elabela/Ende, which was initially considered a noncoding RNA, encodes a peptide (Table 1). Toddler peptide activates APJ/Apelin signaling by driving the internalization of G protein-coupled Apelin receptors and promotes cell movement during zebrafish gastrulation.52

In contrast to being the primary inducers of biological activity, these structurally simple peptides encoded by NCGs have more of a fine-tuning effect through many different mechanisms. Because of the particularities of the NCG-peptide origins, some action modes can be said to be unique, such as those of competitive inhibitors. The finely tuned regulation of these peptides enables the living body to perform various functions more accurately and stably.

Regulation of NCG-peptide expression

Peptides derived from NCGs are also regulated at all levels from translation to protein modification. Since many NCGs are noncoding RNAs, the regulation of their transcription is not discussed. At the translational level, abundant methylation modifications in circRNAs can enhance the level of their translation activitiy. Under some conditions, the m6A marks abundant near the start codon indicate circRNA methylation. YTHDF3 recognizes the methylated modification and promotes translation in an eIF4G2-associated cap-independent manner. In addition, circRNA translation is increased under heat-shock conditions.53 Similar mechanisms in the regulation of mRNA translation have been discovered, providing a model for selective mRNA translation during stress.54,55 Poly(A) or poly(T) sequences after a stop codon can inhibit circRNA translation, suggesting that NCG peptides are different from traditional proteins at the translational level.56

At the level of protein modification, PLN and SLN, which have very similars to that of MLN and distinct tissue-specific distribution patterns, were originally discovered as micropeptides.57,58 PLN functions through the physical formation of combinations, and its function is regulated by phosphorylation and dephosphorylation in vivo. Dephosphorylated PLN mainly exists in the form of a monomer, inhibiting cardiac function by inhibiting SERCA, which is located in the sarcoplasmic reticulum (SR) membrane, and pumps Ca2+ from the cytoplasm back through the SR during muscle relaxation. After phosphorylation, PLN forms pentamers, which reduce the inhibitory effect on SERCA.59 This dynamic balance plays a key role in the enhancement of myocardial function by β-adrenergic agonists (Fig. 5). In addition, a specific PLN mutant (R9C), in which residue 9 is a mutated, inhibits phosphorylation of wild-type PLN and therefore chronically inhibits SERCA. Consequently, chronic inhibition causes dilated cardiomyopathy and premature death.60 In another case, that of the R14del mutant, the mutant PLN appears in the sarcolemma by mistake, where it interacts with Na/K-ATPase, resulting in cardiac remodeling, despite enhanced contractility.61 Orderly regulation indicates that the polypeptides derived from NCGs are inherent participants in life activities.

Fig. 5: Regulation of NCG peptides.
figure 5

The β-adrenergic agonist phosphorylates PLN monomers to form a pentamer, thereby suppressing the inhibition of PLN on SERCA and promoting myocardial contractility.

Biological functions of NCG peptides

Although the number of coding genes in a eukaryotic organism is not significantly larger than that in a prokaryotic organism, the physiological and pathological activities in the eukaryotic organism are more complex than those in the prokaryotic organism. NCGs are thought to play a pivotal part in establishing this difference between eukaryotes and prokaryotes. In recent years, continuous research has demonstrated that NCG-derived peptides have considerable biological functions covering various fields. The manner in which NCG peptides establish the differences between eukaryotes and prokaryotes is discussed in greater detail below.

NCG peptides facilitate embryonic development

Embryonic development requires that genes are expressed in an orderly manner.62 This process is called genetic programming and involves multifaceted regulation.63 Peptides derived from NCGs can regulate this process temporally. For example, the above-mentioned pri peptide shows tissue- and time-specific expression during embryogenesis, and its knockout is lethal to embryonic development.46 Expressed Svb remains in a state of inhibition until pri peptide expression is initiated.64 Therefore, the pri peptide provides accurate temporal control over epidermal morphogenesis. Similarly, the transcript of Gm7325 in human beings is annotated as a long noncoding RNA (lncRNA), and in fact, it can be translated into an 84-amino acid polypeptide Minion,65 also named Myomixer (Table 1).66 The expression of Myomixer/Minion is upregulated during the differentiation of C2C12 myoblasts, and downregulated following myoblast fusion. In terms of a mechanism of action, Minion together with Myomaker promotes the fusion of mononuclear myoblasts, which is essential for skeletal muscle formation during embryogenesis. Although Myomixer/Minion does not affect the expression levels of the Myomaker, Myomaker cannot induce myocyte fusion in the absence of Myomixer. Combined with the time specificity of expression, Myomixer/Minion functions as a Myomaker switch that acts synergistically at a specific time point.67,68 Another micropeptide, MPM (micropeptide in mitochondria), is also produced by lncRNA 1500011K16Rik (in mice) or LINC00116 (in humans). MPM, also known as mitoregulin (Mtln), promotes myogenic differentiation and has an inducive effect on muscle growth and regeneration. In terms of mechanisms, the ectopic expression of genes that enhance mitochondrial respiration can rescue the phenotype induced by MPM interference, thus providing evidence that the effect of MPM in muscle tissue development and postinjury regeneration is related to the role of MPM in mitochondrial respiration.69 In addition, functioning as a signaling pathway molecule, Toddler peptide (also known as Apela) (Table 1) activates APJ/Apelin signaling to promote gastrulation movements,52 and regulates mesodermal cell migration downstream of Nodal signaling in zebrafish.70 Loss-of-function assays using CRISPR/Cas9 suggest that Apela also has an extenive impact on mouse embryo development.71

NCG peptides regulate physiological activities

A group of polypeptides derived from NCGs is reported to finely adjust the normal activities of muscle. The transcript of the peptide DWORF (Table 1) is annotated as a lncRNA in both mice and humans. DWORF is mainly distributed in the heart and interacts with SERCA, similarly to the SLN, PLN, MLN, and SCL peptides. It should be noted that the MLN peptide is expressed in all skeletal muscles,72 and the SCL peptide is expressed in somatic muscles and the postembryonic heart.73 DWORF can alleviate the inhibitory effects of these four peptides on SERCA in vitro. In vivo, DWORF, and PLN together maintain the dynamic regulation of cardiomyocyte contractility by competing with each other, thereby enhancing the heart pumping function during changes in the external environment.74 This function exemplifies a typical case of the finely tuned regulation by small molecules, namely, NGC peptides. NGC peptides are also important at the level of cell biology. LINC01420/LOC550643 RNA is thought of a noncoding RNA, but in fact, it encodes a nonannotated polypeptide referred to as P-body dissociating polypeptide (NoBody) (Table 1). This peptide is negatively correlated with the number of P-bodies. In addition, NoBody can directly contact the enhancer of decapping 4 protein (EDC4) to induce the degradation of the substrate during nonsense-mediated decay (NMD).75 NCG peptides can also affect cellular metabolism. As described above, MOTS-c has a significant impact on the expression of metabolism- and inflammation-associated genes. MOTS-c treatment prevents diet-induced obesity and age- or high-fat diet-associated insulin resistance in mice. MPM/Mtln extensively fine-tunes the mitochondrial membrane potential, Ca2+ metabolism capacity, and ROS levels, and it enhances the stability and assembly of functional complexes as a molecular chaperone on the mitochondrial membrane, thereby strengthening respiratory efficiency.76 Mtln also cooperates with Cyb5r3 to affect lipid metabolism. The weakening of complex I in the respiratory supercomplex in Mtln-knockout mice may also contribute to the changes in Cyb5r3-related lipid metabolism that are caused by a lack of Mtln.77 MOXI, the homologous peptide of MPM/Mtln in mice, regulates mitochondrial oxidation and energy homeostasis by enhancing fatty acid β-oxidation, thereby improving exercise tolerance.78 Two proteins that interact directly with Mtln have been found through IP assays (in refs. 77,78); however, the full scope of the phenotypic changes cannot be explained solely by changes in the expression of Mtln led by two proteins. Further exploration of the mechanism of MPM/Mtln/MOXI action is likely to reveal other action mechanisms, which further illustrates the importance of NCG-peptide studies.

NCG peptides participate in the stress response and promote tissue repair

When cells are exposed to obvious environmental changes or macromolecular damages, they can undergo a series of adaptive changes, which have an impact on gene expression to enhance the ability of damage resistance and viability under adverse conditions.79,80 A set of regulatory systems contribute to changes in gene expression,81,82 and now NCG peptides can be added to this set. A sequence-conserved uORF in the 5′UTR of the mRNA of C/EBP-homologous protein (CHOP) can be translated into peptide of 31 aa or 34 aa (Table 1), which inhibits the translation of the downstream ORF of the CHOP protein under stress-free conditions.83 However, under stress conditions, phosphorylation of eIF2 reduces the level of uORF translation, thereby relieving the inhibitory effect. Thus, the CHOP expression level is relatively increased.84 Although two uORFs are involved in the regulation of activating transcription factor 4 (ATF4), similar mechanisms are also involved. The ribosome scanning from the 5′UTR of the mRNA first encounters uORF1 and then uORF2. The two uORFs are far from each other, therefore, both can be translated. However, due to the close proximity of uORF2 to the main downstream ATF4 ORF, the ribosome cannot restore the ability to reinitiation in time, and as a consequence, the start codon of the main downstream ATF4 ORF is skipped and AFTF4 is not translated. Under stress conditions, ribosome reinitiation is even less efficient: after the translation of uORF1, the ribosomes cannot reassemble at the start codon of uORF2, and consequently, uORF2 is skipped. In contrast, some ribosomes reassemble before encountering the main ATF4 ORF, resulting in ATF4 expression.85 To analyze the effect of uORFs, the starting site and distance to the main ORF should be taken into consideration. In addition, inhibition of uORF translation abolishes the UPF1-dependent nonsense-mediated mRNA decay (NMD), improving the stability of IFRD1 mRNA under stress conditions.86 In addition, an uORF in the 5′UTR of the mRNA of binding immunoglobulin protein (BiP) can be translated into a peptide of 9 aa (Table 1) in a leucine-initiated and eIF2A-dependent nontraditional manner of translation during the stress response, promoting Bip translation during stress.29 In fact, many translation initiation sites of uORFs in the 5′UTR are noncanonical and may represent other action mechanisms of uORFs in an integrated stress response (ISR) (Fig. 6).87,88

Fig. 6: uORF can participate in the ISR reaction in three ways to facilitate the expression of genes that alleviate stress damage or trigger apoptosis.
figure 6

In the absence of stress, the uORF is translated to inhibit the expression of a coding-region protein by means of ribosome stalling (1) and promoting UPF1-dependent mRNA decay (2). Upon stress, uORF expression is downregulated, and inhibition is reduced, resulting in increased protein expression in the coding region. In addition, stress upregulates eIF2A levels,140 and leads to the constitutive translation of uORF, which promotes translation of the coding-region proteins (3).

NCG-derived peptides participate in stress in a variety of ways to protect against external damage. Once damage occurs, other NCG peptides can promote tissue repair through different mechanisms. SPAR, which is translated from LINC00961, stabilizes the v-ATPase–Ragulator–Rags supercomplex to suppress mTORC1 activation in response to amino acid stimulation. When the muscle is damaged by the external environmental stimuli, the expression of SPAR peptide (Table 1) is suppressed, upregulating the mTORC1 signaling pathway, which promotes damage repair and tissue regeneration.89 The aforementioned Minion/Myomixer protein is undetectable in an adult mouse without injury but becomes significantly upregulated during tissue regeneration. Mechanically, Minion/Myomixer and Myomaker together induce cell fusion to promote muscle regeneration.65,66

NCG peptides modulate tumor development

Thus far, the mechanism of tumorigenesis has not been fully elucidated. However, an increasing number of mechanisms have been explored,90,91 including those involved in the role of NGC peptides. Reversion of pyruvate kinase M1 (PKM1) to PKM2 is common in cancers that benefits aerobic glycolysis and creates an advantage for tumorigenesis.92,93 HnRNP A1 is a kind of splicing factor that inhibits the inclusion of exon 9 in pyruvate kinase M, which promotes the formation of PKM2.94,95 LncRNA HOXB-AS3 can be translated into a peptide of 53 aa (Table 1) that can bind directly to the RGG domain in hnRNP A1, promoting hnRNP A1 to bind to exon 9 of PKM mRNA and thus inhibit the formation of PKM2 to induce a tumor-suppression effect.96 Thus, HOXB-AS3 peptides, in lieu of lncRNA HOXB-AS3, play a competitive role to inhibit tumor formation, providing another example of NCG-peptide function through direct binding to another protein (Fig. 7). In addition, circPPP1R12A promotes the proliferation, migration, and invasion of cancer cells to enhance tumorigenesis and the metastasis of colon cancer by activating the Hippo-YAP signaling pathway.97 In addition, SHPRH-146aa and FBXW7-185aa both act as tumor-suppressor genes and can be used as independent prognostic markers.38,39 β-catenin-370aa acts as an oncogene to contribute to the activation of the Wnt pathway and consequently promotes liver cancer growth and metastasis by protecting full-length β-catenin from GSK3β-mediated degradation.40 The transcript of cancer associated with small integral membrane open-reading frame 1 (termed CASIMO1) is considered to have no coding function, but actually encodes an 84 aa integral membrane microprotein (Table 1). The CASIMO1 peptide can promote cell proliferation through the downstream SQLE/MAPK/ERK signaling pathway and induce an increase in the proportion of cells in the proliferative phase. In addition, CASIMO1 also affects the migration capacity of tumor cell lines by affecting the cytoskeleton.98 Pseudogenes are protein-coding genes, and loss of selection pressure causes them to undergo deleterious mutations, resulting in tissue degeneration and their eventual transition into genetic fossils.99,100 However, among the 11 pseudogenes of Nanog, NANOGP8 is expressed in multiple cancer cell lines and tissues,101 where it plays an important role in tumor development.102

Fig. 7: HOXB-AS3 peptides, instead of HOXB-AS3 lncRNA, suppress tumor development.
figure 7

The HOXB-AS3 ORF, 5′UTR-ORF, and 5′UTR-ORFmut constructs were generated to study the effect of the HOXB-AS3 peptide and lncRNA on cancer progression. Both the HOXB-AS3 ORF and 5′UTR-ORF constructs expressed the HOXB-AS3 peptide. The 5′UTR-ORFmut contains the mutated HOXB-AS3 start codon and therefore did not encode the HOXB-AS3 peptide. All of these constructs were transfected into CRC cells, which rarely express HOXB-AS3 peptides. The NC group is a negative control, which was not transfected by constructs. a Tumor cell xenograft assay showing that the in vivo growth of tumor cells in the NC and 5′UTR-ORFmut groups was better than it was in the ORF and 5′UTR-ORF groups. b Metastatic tumor model by tail vein injection showing that tumor metastasis is also promoted in the HOXB-AS3 ORF and 5′UTR-ORF group. c Histological analysis of the pulmonary metastasis lesion shown in b.

Pathogenicity and the potential of NCG peptides in target therapy

The pathogenesis of a large number of diseases is still unclear, and concurrently, their treatment is not satisfactory. NCG peptides may support a new perspective from which to view the underlying mechanism of diseases. Taking the above-mentioned CASIMO1 peptide, circPPP1R12A-73aa and β-catenin-370aa as examples, aberrant expression of human endogenous NCG peptides could cause diseases, including cancer. NCG peptides derived from pathogenic microorganisms can also promote the development of diseases. The E7 protein encoded by HPV virus-derived circE7 can promote the growth and tumorigenic ability of CaSki cervical carcinoma cells, while circE7 by itself cannot.103

In addition to providing a new perspective on pathogenicity, NCG peptides are also promising targets for targeted therapy. Some achievements have been made in this regard. MOTS-c peptide treatment can inhibit osteolysis in a mouse model, which has potential in the therapy of osteolysis and other inflammation disorders.104 MOTS-c peptide treatment can also increase the ability of cold adaptation upon acute cold exposure and provide a potentially therapeutic drug for cold stress-related diseases.105 In addition, the role of MPM in mitochondrial respiration and muscle formation makes MPM a potential target for muscular dystrophy therapy.69 In terms of tumor-targeted therapy, NCG peptides, such as the SHPRH-146aa, FBXW7-185aa, and HOXB-AS3 peptides, can serve as tumor-targeting therapeutic drugs. The same is true for PINT87aa. Linc-PINT simultaneously generates a circular-form circPINTexon2, and circPINTexon2 produces an 87-amino acid peptide, PINT87aa. PINT87aa directly binds to polymerase-associated factor complex (PAF1c) and inhibits several oncogenes downstream of PAF1c, including CEBP1, cyclin D1, C-myc, Sox2, etc. In biological function, PINT87aa overexpression can suppress glioblastoma in vitro and in vivo.106 An ideal targeted therapeutic drug should effectively kill or inhibit tumor cells while not damaging normal tissue cells. These antitumor NCG peptides are naturally targeted therapeutic drugs with significantly reduced cytotoxicity, compared with the cytotoxicity induced by traditional drugs, as and substantially reduced immunogenicity. Furthermore, a relatively smaller molecular weight makes them more likely than traditional tumor suppressive proteins to be developed into drugs. With the development of applicable materials, these peptides can be packaged by suitable carriers and delivered into tumor cells, where they can specifically inhibit tumor cells.107 NCG peptides also have great potential in tumor immunotherapy. The ideal tumor-specific antigens (TSAs) enable T lymphocytes to correctly recognize tumor cells, and the ideal tumor-specific antigen is a key factor in the field of immunotherapy. In a genome-wide search for TSAs, NCG peptides were found to be main sources of targetable TSAs. Tumor vaccines developed according to NCG peptides enable mice to resist tumors, suggesting that NCG peptides can be used as therapeutic targets in tumor immunotherapy, particularly in tumor vaccines.108,109

Challenges and future trends

NCG peptides challenge the known features of coding genes

The originally discovered NCGs were found to act in the form of noncoding transcripts rather than through translation into peptides or proteins.110,111,112 However, later, some NCGs were found to have coding functions and thus should have been defined as coding genes. For example, pri-miRNAs, the primary transcripts of miRNAs and defined as NCGs, can encode peptide products.113 Pri-miRNAs have structures similar to traditional mRNAs, including a 5′-cap and a 3′-poly(A) tail.114 Taking pri-miR165a and pri-miR171b as NCG examples, they can be translated into peptides (Table 1) to promote the transcription of themselves. Further analysis shows that both are “ancient miRNAs”,115 which are conserved across many species, not “recent miRNAs”, which are more species-specific.116 Together with circRNA-derived SHPRH-146aa and FBWX7-185aa, the corresponding genes for mitochondrial genome-derived MOTS-c, lncRNA-derived MLN and DWORF, etc., were previously defined as NCGs but are capable of coding peptides. The discovery of these properties challenges previous opinions generated in NCG research and the known features of coding genes.

In addition, some NCGs have dual roles. Under some conditions, they function as a NCG, but in other conditions, they encode peptides. For example, in Drosophila melanogaster, Oscar plays its role through translation into proteins in an embryonic stage,117,118 and acts as a noncoding RNA during early oogenesis.119 In mammals, the SRA gene, which is regarded as an NCG, plays an important role in coactivating nuclear receptors120,121 and enhancing transcriptional factors.122 A new isoform, SRA1, has been found to act both as a NCG and a coding gene, and the two gene states coexist in the same cells.123 For this type of NCG, many questions remain unanswered. For example, under what circumstances do NCGs function as NCGs. and when are they translatable into functional peptides? What factors regulate the balance of the coding and noncoding forms? These questions are also applicable to the gene of pri, which is only expressed at a specific stage during embryonic development. For example, the Minion/Myomixer peptide is absent in uninjured muscle, but present in injured muscle.68 In another example, the CASIMO1 peptide is upregulated in tumors and contributes to tumorigenesis, but is downregulated in healthy tissues.98 Therefore, it is of paramount importance to understand the mechanisms and factors by which the NCGs switch between coding and noncoding forms and the conditions under which NCG-peptide expression is promoted or inhibited. Thus, gaining such an understanding is a great challenge and should also be a future area of focus.

Both the exact number and regulation mechanism remain unclear

The traditionally defined NCGs constitute >90% of the whole genome. However, the exact number of potentially coding NCGs remains unclear. Two approaches are mainly used in the search for peptides encoded by NCGs. One is to predict the coding potential of NCGs by bioinformatics analysis followed by experimental confirmation,124 and another is to characterize the peptides by mass spectrometry and then relate them back to genome DNA.125

In the first approach, bioinformatics analysis helps to target-specific genes for further confirmation and is the basis for consequent experiments. However, many puzzles confound the success of this approach. For instance, what are the characteristics of NCGs that can encode peptides? When the transcripts bind to a ribosome, is it translated into a functional peptide or is translation randomly undertaken because of probabilistic binding? In addition, a unanimous standard is demanded to facilitate the research by this approach. In the second approach, because the NCG-peptide products are more tissue-specific or state-specific than are traditional functional proteins, NCG peptides are more easily affected by extracellular stimuli. Thus, exploring the expression of NCGs only in an unstressed state or in specific cell lines may result in many peptides being undiscovered. For example, the translation of linc00689-derived micropeptide, STORM (stress- and TNF-α-activated ORF micropeptide) (Table 1), depends on eIF4E phosphorylation after TNF-α activates mammalian Ste20-like kinase (MST1).126 The discovery of this peptide is missed if only mass spectrometry is used to map the protein profiles in a resting state. At the same time, with an in-depth study of the coding mechanism, it is very likely to discover new mechanisms and new models of peptide translation, thus perfecting and enriching the central law, such as the non-AUG-initiated translation mechanism.127,128 Furthermore, the non-AUG-initiated translation mechanism of repeat polypeptides in some NCGs can directly cause diseases.129,130 In addition, the loop structure of circRNAs enables them to reverse the sequence of the start codon and stop codon in the gene sequence, which greatly enrich the number of ORFs.34

Therefore, the development of bioinformatic analysis standards and the establishment of experimental verification systems will also be a future challenge in this field. We need to explore the peptides in a boarder context to identify and characterize them.

Hidden functions and applications need to be uncovered

Gene expression is regulated at multiple levels. Compared with the regulation of mRNA levels, the regulation of protein levels does not involve changes in protein quantity. NCG peptides interact directly with functional proteins and thus adapt to short-term extracellular effects, and the regulation of the mRNA level is more biased to long-term adaption. Therefore, the regulation of NCG peptides in gene expression needs to be further explored. NCG peptides vary in length and are flexible in functional mechanisms. The mRNA corresponds to functional proteins. It remains unknown whether we can group peptides with the same action modes, such as MLN, DWORF, and Nobody. These NCG peptides function by affecting structural proteins, and thus, we believe that they can be named nonstructural functional peptides. Moreover, whether this mode of action is a universal mechanism for NCG peptides is currently unknown. Hence, research on the action modes and mechanisms of these peptides will also be a challenge in the future. There are significantly more NCGs than coding genes.131 With the continuous exploration of new mechanisms and new models, an increasing number of peptides will be discovered. The number of such peptides is possibly much larger than that of the proteins or peptide molecules we have discovered thus far. On the one hand, NCG peptides provide a new key to the door to open the mystery of life. On the other hand, they may become therapeutic targets for disease treatment. Because of their time- or tissue-specificity, NCG-encoded peptides are also time-specific and expressed in specific disease states. Hence, NCG peptides provide potential targets for disease interventions. However, these efforts have not yet begun. With the in-depth study of NCG peptides, our understanding, in either organism development or disease intervention, including tumor treatment, will surely enter a new era.

Potential applications of NCG peptides in real-world studies

A real-word study (RWS) supplements the data obtained from traditional clinical trials.132,133 NCG-peptide research is still in its infancy, and medical products of NCG peptides have not yet been used in RWS research. More efforts should be made to achieve clinical translation of NCG peptides. Since nonintervention is a feature of RWS, experimental intervention is indispensable in the search for NCG peptides. How to explore the role of NCG peptides in the natural state will continue to be a challenge.

Concluding remarks

An increasing number of NCGs have been verified to have coding functions,134,135 providing an in-depth understanding of life activities and complementing the existing library of protein or peptide molecules. Epigenetics and alternative splicing have indicated that the complicated human genome is even more intricate than originally thought.136,137 The emergence of noncoding RNA opens up a new world for the regulation of protein expression, greatly enriching the complexity of life activities.138,139 NCGs can also encode peptides, which undoubtedly adds a new direction for a more in-depth interpretation of the inherent laws of life. As more NCG peptides are discovered, new mechanisms and key molecules are likely to be accordingly revealed. The success in this effort with help us not only to explain the regulation process of many physiological and pathological phenomena but also to bring new ideas that promote the understanding and intervention of diseases.