Abstract
Non-coding RNAs (ncRNAs) play important roles in the regulation of many biological processes, such as transcription initiation and epigenetic modifications that occur after transcription and development. Several novel transcripts have been identified via high-throughput sequencing. However, identifying ncRNAs among the transcripts of novel species using alignment-based features is difficult. Thus, developing a fast and accurate method based on alignment-free features to identify ncRNAs among novel transcripts is necessary. In this study, we proposed a new approach, namely, coding potential prediction based on alignment-free features (CPAF), to identify ncRNAs among a large number of candidates. CPAF used four types of features: Fickett score; Hexamer score; composition, transition, and distribution features; and modified k-mer. From the results, CPAF performed better than previous state-of-the-art methods in predicting ncRNA transcripts, with particular reference to small ncRNAs. Finally, we applied CPAF to identify ncRNAs in Pacific oyster transcripts. Our approach identified more ncRNAs than other previously used methods.
Similar content being viewed by others
References
Ahlgren, N. A., Ren, J., Lu, Y. Y., Fuhrman, J. A., and Sun, F. Z., 2017. Alignment-free d2* oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences. Nucleic Acids Research, 45(1): 39–53, DOI: https://doi.org/10.1093/nar/gkw1002.
Feng, D., Li, Q., Yu, H., Kong, L., and Du, S., 2018. Transcriptional profiling of long non-coding RNAs in mantle of Crassostrea gigas and their association with shell pigmentation. Scientific Reports, 8(1): 1436, DOI: https://doi.org/10.1038/s41598-018-19950-6.
Gagnaire, P. A., Lamy, J. B., Cornette, F., Heurtebise, S., Dégremont, L., Flahauw, E., et al., 2018. Analysis of genome-wide differentiation between native and introduced populations of the cupped oysters Crassostrea gigas and Crassostrea angulata. Genome Biology and Evolution, 10(9): 2518–2534, DOI: https://doi.org/10.1093/gbe/evy194.
Guo, X., Li, C., Wang, H., and Xu, Z., 2018. Diversity and evolution of living oysters. Journal of Shellfish Research, 37(4): 755–772, DOI: https://doi.org/10.2983/035.037.0407.
Hung, T., and Chang, H. Y., 2010. Long noncoding RNA in genome regulation: Prospects and mechanisms. RNA Biology, 7(5): 582–585, DOI: https://doi.org/10.4161/rna.7.5.13216.
Huo, H. H., Gao, X. Q., Fei, F., Qin, F., Huang, B., and Liu, B. L., 2020. Transcriptomic profiling of the immune response to crowding stress in juvenile turbot (Scophthalmus maximus). Journal of Ocean University of China, 19(4): 911–922, DOI: https://doi.org/10.1007/s11802-020-4242-6.
Jiang, B., Song, K., Ren, J., Deng, M. H., Sun, F. Z., and Zhang, X. G., 2012. Comparison of metagenomic samples using sequence signatures. BMC Genomics, 13(1): 1–17, DOI: https://doi.org/10.1186/1471-2164-13-730.
Kang, Y. J., Yang, D. C., Kong, L., Hou, M., Meng, Y. Q., Wei, L., et al., 2017. CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Research, 45(1): W12–W16, DOI: https://doi.org/10.1093/nar/gkx428.
Kapranov, P., St Laurent, G., Raz, T., Ozsolak, F., Reynolds, C. P., Sorensen, P. H., et al., 2010. The majority of total nuclear-encoded non-ribosomal RNA in a human cell is ‘dark matter’ un-annotated RNA. BMC Biology, 8(149): 1–15, DOI: https://doi.org/10.1186/1741-7007-8-149.
Kim, D., Landmead, B., and Salzberg, S. L., 2015. HISAT: A fast spliced aligner with low memory requirements. Nature Methods, 12(4): 357–360, DOI: https://doi.org/10.1038/nmeth.3317.
Kong, L., Zhang, Y., Ye, Z. Q., Liu, X. Q., Zhao, S. Q., Wei, L., et al., 2007. CPC: Assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Research, 35(1): W345–W349, DOI: https://doi.org/10.1093/nar/gkm391.
Laurent, G. S., Wahlestedt, C., and Kapranov, P., 2015. The landscape of long noncoding RNA classification. Trends in Genetics, 31(5): 239–251, DOI: https://doi.org/10.1016/j.tig.2015.03.007.
Li, A., Zhang, J., and Zhou, Z., 2014. PLEK: A tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics, 15(311): 1–10, DOI: https://doi.org/10.1186/1471-2105-15-311.
Li, Y., Wang, Z., Cui, Y., Ma, P., Zhang, X., and Fan, C., 2021. Transcriptomic analysis of Pacific oyster (Crassostrea gigas) zygotes under hypotonic triploid induction. Journal of Ocean University of China, 20: 147–158, DOI: https://doi.org/10.1007/s11802-021-4450-8.
Li, Y. L., Sun, X. Q., Hu, X. L., Xun, X. G., Zhang, J. B., Guo, X. M., et al., 2017. Scallop genome reveals molecular adaptations to semi-sessile life and neurotoxins. Nature Communication, 8(1): 1–11, DOI: https://doi.org/10.1038/S41467-017-01927-0.
Morris, K. V., and Mattick, J. S., 2014. The rise of regulatory RNA. Nature Review Genetics, 15(6): 423–437, DOI: https://doi.org/10.1038/nrg3722.
Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., et al., 2008. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 320(5881): 1344–1349, DOI: https://doi.org/10.1126/science.1158441.
Nam, B. H., Kwak, W., Kim, Y. O., Kim, D. G., Kong, H. J., Kim, W. J., et al., 2017. Genome sequence of Pacific abalone (Haliotis discus hannai): The first draft genome in family Haliotidae. Gigascience, 6(5): 1–8, DOI: https://doi.org/10.1093/gigascience/gix014.
Ozsolak, F., and Milos, P. M., 2011. RNA sequencing: Advances, challenges and opportunities. Nature Review Genetics, 12(2): 87–98, DOI: https://doi.org/10.1038/nrg2934.
Pauli, A., Valen, E., Lin, M. F., Garber, M., Vastenhouw, N. L., Levin, J. Z., et al., 2012. Systematic identification of long non-coding RNAs expressed during zebrafish embryogenesis. Genome Research, 22(3): 577–591, DOI: https://doi.org/10.1101/gr.133009.111.
Pertea, M., Pertea, G. M., Antonescu, C. M., Chang, T. C., Mendell, J. T., and Salzberg, S. L., 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology, 33(3): 290–295, DOI: https://doi.org/10.1038/nbt.3122.
Reinert, G., Chew, D., Sun, F. Z., and Waterman, M. S., 2009. Alignment-free sequence comparison (I): Statistics and power. Journal of Computational Biology, 16(12): 1615–1634, DOI: https://doi.org/10.1089/cmb.2009.0198.
Song, K., 2020a. Classifying the lifestyle of metagenomically-derived phages sequences using alignment-free methods. Frontiers in Microbiology, 11: 2865, DOI: https://doi.org/10.3389/fmicb.2020.567769.
Song, K., 2020b. Genomic landscape of mutational biases in the Pacific oyster Crassostrea gigas. Genome Biology and Evolution, 12(11): 1943–1952, DOI: https://doi.org/10.1093/gbe/evaa160.
Song, K., Ren, J., and Sun, F. Z., 2019. Reads binning improves alignment-free metagenome comparison. Frontiers in Genetics, 10: 1156, DOI: https://doi.org/10.3389/fgene.2019.01156.
Song, K., Ren, J., Zhai, Z. Y., Liu, X. M., Deng, M. H., and Sun, F. Z., 2013. Alignment-free sequence comparison based on next-generation sequencing reads. Journal of Computational Biology, 20(2): 64–79, DOI: https://doi.org/10.1089/cmb.2012.0228.
Sun, J., Zhang, Y., Xu, T., Zhang, Y., Mu, H., Zhang, Y., et al., 2017. Adaptation to deep-sea chemosynthetic environments as revealed by mussel genomes. Nature Ecology & Evolution, 1(5): 1–7, DOI: https://doi.org/10.1038/s41559-017-0121.
Sun, L., Luo, H., Bu, D., Zhao, G., Yu, K., Zhang, C., et al., 2013. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Research, 41(17): e166–e166, DOI: https://doi.org/10.1093/nar/gkt646.
Tang, K. J., Lu, Y. Y., and Sun, F. Z., 2018. Background adjusted alignment-free dissimilarity measures improve the detection of horizontal gene transfer. Frontiers in Microbiology, 9: 711, DOI: https://doi.org/10.3389/Fmicb.2018.00711.
Tong, X., and Liu, S., 2019. CPPred: Coding potential prediction based on the global description of RNA sequence. Nucleic Acids Research, 47(8): e43, DOI: https://doi.org/10.1093/nar/gkz087.
Wang, L., Park, H. J., Dasari, S., Wang, S., Kocher, J. P., and Li, W., 2013. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model. Nucleic Acids Research, 41(6): e74, DOI: https://doi.org/10.1093/nar/gkt006.
Wang, S., Zhang, J., Jiao, W., Li, J., Xun, X., Sun, Y., et al., 2017. Scallop genome provides insights into evolution of bilaterian karyotype and development. Nature Ecology & Evolution, 1(5): 1–12, DOI: https://doi.org/10.1038/s41559-017-0120.
Wilusz, J. E., Sunwoo, H., and Spector, D. L., 2009. Long non-coding RNAs: Functional surprises from the RNA world. Genes & Development, 23(13): 1494–1504, DOI: https://doi.org/10.1101/gad.1800909.
Zhang, G. F., Fang, X. D., Guo, X. M., Li, L., Luo, R. B., Xu, F., et al., 2012. The oyster genome reveals stress adaptation and complexity of shell formation. Nature, 490: 49–54, DOI: https://doi.org/10.1038/nature11413.
Acknowledgement
This research was supported by the National Natural Science Foundation of China (No. 11701546).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chai, W., Song, K. Identification of Non-Coding RNAs Based on Alignment-Free Features in Crassostrea gigas (Pacific Oyster) Transcriptome. J. Ocean Univ. China 21, 1633–1640 (2022). https://doi.org/10.1007/s11802-022-5058-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11802-022-5058-3