repRNA: a web server for generating various feature vectors of RNA sequences

Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen

doi:10.1007/s00438-015-1078-7

repRNA: a web server for generating various feature vectors of RNA sequences

Methodology Article
Published: 18 June 2015

Volume 291, pages 473–481, (2016)
Cite this article

Molecular Genetics and Genomics Aims and scope Submit manuscript

Bin Liu^1,2,3,
Fule Liu¹,
Longyun Fang¹,
Xiaolong Wang^1,2 &
…
Kuo-Chen Chou^3,4

1315 Accesses
126 Citations
1 Altmetric
Explore all metrics

Abstract

With the rapid growth of RNA sequences generated in the postgenomic age, it is highly desired to develop a flexible method that can generate various kinds of vectors to represent these sequences by focusing on their different features. This is because nearly all the existing machine-learning methods, such as SVM (support vector machine) and KNN (k-nearest neighbor), can only handle vectors but not sequences. To meet the increasing demands and speed up the genome analyses, we have developed a new web server, called “representations of RNA sequences” (repRNA). Compared with the existing methods, repRNA is much more comprehensive, flexible and powerful, as reflected by the following facts: (1) it can generate 11 different modes of feature vectors for users to choose according to their investigation purposes; (2) it allows users to select the features from 22 built-in physicochemical properties and even those defined by users’ own; (3) the resultant feature vectors and the secondary structures of the corresponding RNA sequences can be visualized. The repRNA web server is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repRNA/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Article Open access 14 March 2017

Daesik Choi, Byungkyu Park, … Kyungsook Han

Feature Importance Analysis of Non-coding DNA/RNA Sequences Based on Machine Learning Approaches

RNAlign2D: a rapid method for combined RNA structure and sequence-based alignment using a pseudo-amino acid substitution matrix

Article Open access 16 October 2021

Tomasz Woźniak, Małgorzata Sajek, … Marcin Piotr Sajek

References

Cao DS, Xu QS, Liang YZ (2013) propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29(7):960–962
Article PubMed CAS Google Scholar
Chen YK, Li KB (2013) Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition. J Theor Biol 318:1–12
Article PubMed CAS Google Scholar
Chen W, Feng PM, Lin H (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 41(6):e68
Article PubMed CAS PubMed Central Google Scholar
Chen W, Feng PM, Lin H (2014a) iSS-PseDNC: identifying splicing sites using pseudo dinucleotide composition. Biomed Research International (BMRI) 2014:623149
Google Scholar
Chen W, Lei TY, Jin DC, Lin H (2014b) PseKNC: a flexible web-server for generating pseudo K-tuple nucleotide composition. Anal Biochem 456:53–60
Article PubMed CAS Google Scholar
Chen W, Zhang X, Brooker J, Lin H, Zhang L (2015) PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics 31(1):119–120
Article PubMed CAS Google Scholar
Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins: structure. Function, and Genetics 43:246–255
CAS Google Scholar
Chou KC (2005) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21(1):10–19
Article PubMed CAS Google Scholar
Chou KC (2011) Some remarks on protein attribute prediction and pseudo amino acid composition (50th anniversary year review). J Theor Biol 273(1):236–247
Article PubMed CAS Google Scholar
Chou KC (2015) Impacts of bioinformatics to medicinal chemistry. Med Chem 11(3):218–234
Article PubMed CAS Google Scholar
Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A (2015) Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou’s general PseAAC. J Theor Biol 364:284–294
Article PubMed CAS Google Scholar
Ding H, Deng EZ, Yuan LF, Liu L, Lin H (2014) iCTX-Type: a sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Research International (BMRI) 2014:286419
Google Scholar
Du P, Wang X, Xu C, Gao Y (2012) PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal Biochem 425(2):117–119
Article PubMed CAS Google Scholar
Du P, Gu S, Jiao Y (2014) PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 15(3):3495–3506
Article PubMed CAS PubMed Central Google Scholar
Esmaeili M, Mohabatkar H, Mohsenzadeh S (2010) Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 263(2):203–209
Article PubMed CAS Google Scholar
Fletez-Brant C, Lee D, McCallion AS, Beer MA (2013) kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets. Nucleic Acids Res 41:W544–W556
Article PubMed PubMed Central Google Scholar
Georgiou DN, Karakasidis TE, Nieto JJ, Torres A (2009) Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou’s pseudo amino acid composition. J Theor Biol 257(1):17–26
Article PubMed CAS Google Scholar
Georgiou DN, Karakasidis TE, Megaritis AC (2013) A short survey on genetic sequences, Chou’s pseudo amino acid composition and its combination with fuzzy set theory. Open Bioinform J 7:41–48
Article CAS Google Scholar
Ghandi M, Lee D, Mohammad-Noori M, Beer MA (2014) Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput Biol 10(7):e1003711
Article PubMed PubMed Central Google Scholar
Guo SH, Deng EZ, Xu LQ, Ding H, Lin H (2014) iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo K-tuple nucleotide composition. Bioinformatics 30(11):1522–1529
Article PubMed CAS Google Scholar
Khan ZU, Hayat M, Khan MA (2015) Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J Theor Biol 365:197–203
Article PubMed CAS Google Scholar
Kumar R, Srivastava A, Kumari B, Kumar M (2015) Prediction of beta-lactamase and its class by Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 365:96–103
Article PubMed CAS Google Scholar
Lee D, Karchin R, Beer MA (2011) Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res 21(12):2167–2180
Article PubMed CAS PubMed Central Google Scholar
Lin S-X, Lapointe J (2013) Theoretical and experimental biology in one. J Biomed Sci Eng 06(04):435–442
Article Google Scholar
Lin H, Deng EZ, Ding H, Chen W (2014) iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo K-tuple nucleotide composition. Nucleic Acids Res 42(21):12961–12972
Article PubMed PubMed Central Google Scholar
Liu B, Wang X, Zou Q, Dong Q, Chen Q (2013) Protein remote homology detection by combining Chou’s pseudo amino acid composition and profile-based protein representation. Mol Inform 32:775–782
Article CAS Google Scholar
Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X (2014a) iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 9(9):e106691
Article PubMed PubMed Central Google Scholar
Liu B, Zhang D, Xu R, Xu J, Wang X, Chen Q (2014b) Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30(4):472–479
Article PubMed CAS Google Scholar
Liu B, Chen J, Wang X (2015a) Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis. MGG. doi:10.1007/s00438-00015-01044-00434
PubMed Central Google Scholar
Liu B, Fang L, Jie C, Liu F, Wang X (2015b) miRNA-dis: microRNA precursor identification based on distance structure status pairs. Mol BioSyst 11:1194–1204
Article PubMed CAS Google Scholar
Liu B, Fang L, Liu F, Wang X, Chen J (2015c) Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 10:e0121501
Article PubMed PubMed Central Google Scholar
Liu B, Fang L, Liu F, Wang X (2015d) iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. J Biomol Struct Dyn. doi:10.1080/07391102.2015.1014422
Google Scholar
Liu B, Liu F, Fang L, Wang X (2015e) repDNA: a python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinf 31(8):1307–1309
Article Google Scholar
Liu B, Xu J, Fan S, Xu R, Jiyun Zhou J, Wang X (2015f) PseDNA-Pro: DNA-binding protein identification by combining Chou’s PseAAC and physicochemical distance transformation. Mol Inform 34:8–17
Article CAS Google Scholar
Liu Z, Xiao X, Qiu WR (2015g) iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. Anal Biochem 474:69–77
Article PubMed CAS Google Scholar
Liu B, Liu F, Wang X, Chen J, Fang L, Chou KC (2015h) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res. doi:10.1093/nar/gkv458
Google Scholar
Lorenz R, Bernhart SH, Siederdissen CHz, Tafer H, Flamm C, Stadler PF, Hofacker IL (2011). ViennaRNA Package 2.0. Algorithms Mol Biol 6(26)
Mandal M, Mukhopadhyay A, Maulik U (2015) Prediction of protein subcellular localization by incorporating multiobjective PSO-based feature subset selection into the general form of Chou’s PseAAC. Med Biol Eng Comput 53(4):331–344
Article PubMed Google Scholar
Mei S (2012) Multi-kernel transfer learning based on Chou’s PseAAC formulation for protein submitochondria localization. J Theor Biol 293:121–130
Article PubMed CAS Google Scholar
Mohabatkar H, Mohammad Beigi M, Esmaeili A (2011) Prediction of GABA(A) receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 281(1):18–23
Article PubMed CAS Google Scholar
Mohabatkar H, Beigi MM, Abdolahi K, Mohsenzadeh S (2013) Prediction of allergenic proteins by means of the concept of chou’s pseudo amino acid composition and a machine learning approach. Med Chem 9(1):133–137
Article PubMed CAS Google Scholar
Mohammad Beigi M, Behjati M, Mohabatkar H (2011) Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach. J Struct Funct Genomics 12(4):191–197
Article PubMed CAS Google Scholar
Mondal S, Pai PP (2014) Chou’s pseudo amino acid composition improves sequence-based antifreeze protein prediction. J Theor Biol 356:30–35
Article PubMed CAS Google Scholar
Nanni L, Lumini A (2008) Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization. Amino Acids 34:653–660
Article PubMed CAS Google Scholar
Qiu WR, Xiao X, Chou KC (2014) iRSpot-TNCPseAAC: identify recombination spots with trinucleotide composition and pseudo amino acid components. Int J Mol Sci 15(2):1746–1766
Article PubMed PubMed Central Google Scholar
Sahu SS, Panda G (2010) A novel feature representation method based on Chou’s pseudo amino acid composition for protein structural class prediction. Comput Biol Chem 34(5–6):320–327
Article PubMed CAS Google Scholar
Shen HB, Chou KC (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373(2):386–388
Article PubMed CAS Google Scholar
Wei L, Liao M, Gao Y, Ji R, He Z, Zou Q (2014) Improved and promising identification of human microRNAs by incorporating a high-quality negative set. IEEE/ACM Trans Comput Biol Bioinf 11:192–201
Article Google Scholar
Xue C, Li F, He T, Liu GP, Li Y, Zhang X (2005) Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6:310
Article PubMed PubMed Central Google Scholar
Zhang Y, Wang X, Kang L (2011) A k-mer scheme to predict piRNAs and characterize locust piRNAs. Bioinformatics 27(6):771–776
Article PubMed CAS PubMed Central Google Scholar
Zhong WZ, Zhou SF (2014) Molecular science for drug development and biomedicine. Int J Mol Sci 15:20072–20078
Article PubMed CAS PubMed Central Google Scholar
Zhou XB, Chen C, Li ZC, Zou XY (2007) Using Chou’s amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. J Theor Biol 248:546–551
Article PubMed CAS Google Scholar

Download references

Acknowledgments

The authors wish to thank the three anonymous reviewers for the constructive comments, which were very useful to strengthening the presentation of this paper.

Conflict of interest

The authors declare no competing interests.

Funding

This work was supported by the National Natural Science Foundation of China (61300112 and 61272383), the Scientific Research Innovation Foundation in Harbin Institute of Technology (Project No. HIT.NSRIF.2013103), the Project Sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, and Shenzhen Municipal Science and Technology Innovation Council (Grant No. CXZZ20140904154910774), and National High Technology Research and Development Program of China (863 Program) [2015AA015405].

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China
Bin Liu, Fule Liu, Longyun Fang & Xiaolong Wang
Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, 518055, Guangdong, China
Bin Liu & Xiaolong Wang
Gordon Life Science Institute, Boston, MA, 02478, USA
Bin Liu & Kuo-Chen Chou
Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
Kuo-Chen Chou

Authors

Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fule Liu
View author publications
You can also search for this author in PubMed Google Scholar
Longyun Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kuo-Chen Chou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bin Liu.

Additional information

Communicated by S. Hohmann.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 1404 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, B., Liu, F., Fang, L. et al. repRNA: a web server for generating various feature vectors of RNA sequences. Mol Genet Genomics 291, 473–481 (2016). https://doi.org/10.1007/s00438-015-1078-7

Download citation

Received: 24 April 2015
Accepted: 04 June 2015
Published: 18 June 2015
Issue Date: February 2016
DOI: https://doi.org/10.1007/s00438-015-1078-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

repRNA: a web server for generating various feature vectors of RNA sequences

Abstract

Access this article

Similar content being viewed by others

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Feature Importance Analysis of Non-coding DNA/RNA Sequences Based on Machine Learning Approaches

RNAlign2D: a rapid method for combined RNA structure and sequence-based alignment using a pseudo-amino acid substitution matrix

References

Acknowledgments

Conflict of interest

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (DOCX 1404 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

repRNA: a web server for generating various feature vectors of RNA sequences

Abstract

Access this article

Similar content being viewed by others

Predicting protein-binding regions in RNA using nucleotide profiles and compositions

Feature Importance Analysis of Non-coding DNA/RNA Sequences Based on Machine Learning Approaches

RNAlign2D: a rapid method for combined RNA structure and sequence-based alignment using a pseudo-amino acid substitution matrix

References

Acknowledgments

Conflict of interest

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (DOCX 1404 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation