Abstract
Given the importance of RNA secondary structures in defining their biological role, it would be convenient for researchers seeking RNA data if both sequence and structural information pertaining to RNA molecules are made available together. Current nucleotide data repositories archive only RNA sequence data. Furthermore, storage formats which can frugally represent RNA sequence as well as structure data in a single file, are currently unavailable. This article proposes a novel storage format, ‘FASTR’, for concomitant representation of RNA sequence and structure. The storage efficiency of the proposed FASTR format has been evaluated using RNA data from various microorganisms. Results indicate that the size of FASTR formatted files (containing both RNA sequence as well as structure information) are equivalent to that of FASTA-format files, which contain only RNA sequence information. RNA secondary structure is typically represented using a combination of a string of nucleotide characters along with the corresponding dot-bracket notation indicating structural attributes. ‘FASTR’ – the novel storage format proposed in the present study enables a frugal representation of both RNA sequence and structural information in the form of a single string. In spite of having a relatively smaller storage footprint, the resultant ‘fastr’ string(s) retain all sequence as well as secondary structural information that could be stored using a dot-bracket notation. An implementation of the ‘FASTR’ methodology is available for download at http://metagenomics.atc.tcs.com/compression/fastr .
Similar content being viewed by others
References
Aalberts DP and Hodas NO 2005 Asymmetry in RNA pseudoknots: observation and theory. Nucleic Acids Res. 33 2210–2214
Achawanantakun R, Sun Y and Takyar SS 2011 ncRNA consensus secondary structure derivation using grammar strings. J. Bioinform. Comput. Biol. 9 317–337
Andronescu M, Aguirre-Hernández R, Condon A and Hoos HH 2003 RNAsoft: A suite of RNA secondary structure prediction and design software tools. Nucleic Acids Res. 31 3416–3422
Antczak M, Zok T, Popenda M, Lukasiak P, Adamiak RW, Blazewicz J and Szachniuk M 2014 RNApdbee--a webserver to derive secondary structures from pdb files of knotted and unknotted RNAs. Nucleic Acids Res. 42 W368–W372
Bose T, Mohammed MH, Dutta A and Mande SS 2012 BIND - an algorithm for loss-less compression of nucleotide sequence data. J. Biosci. 37 785–789
Breaker RR 2011 Prospects for riboswitch discovery and analysis. Mol. Cell. 43 867–879
Chiu JK and Chen YP 2014 Efficient conversion of RNA pseudoknots to knot-free structures using a graphical model. IEEE Trans. Biomed. Eng. Dec 2
Deorowicz S and Grabowski Sz 2011 Compression of DNA sequence reads in FASTQ format. Bioinformatics 27(6) 860–862
Dutta A, Haque MM, Bose T, Reddy CVSK and Mande SS 2015 FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets. J. Bioinforma. Comput. Biol. 13 1541003
Gruber AR, Lorenz R, Bernhart SH, Neuböck R and Hofacker IL 2008 The Vienna RNA websuite. Nucleic Acids Res. 36 W70–W74
Lorenz R, Bernhart SH, Siederdissen CH, zu Tafer H, Flamm C, Stadler PF and Hofacker IL 2011 Vienna RNA package 2.0. Algorithms Mol. Biol. 6 26
McManus CJ and Graveley BR 2011 RNA structure and the mechanisms of alternative splicing. Curr. Opin. Genet. Dev. 21 373–379
Mohammed MH, Dutta A, Bose T, Chadaram S and Mande SS 2012 DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences. Bioinformatics. 28 2527–2529
Ossowski S, Schwab R and Weigel D 2008 Gene silencing in plants using artificial microRNAs and other small RNAs. Plant J. 53 674–690
Pinho AJ and Pratas D 2014 MFCompress: a compression tool for FASTA and multi-FASTA data. Bioinformatics 30 117–118
Sato K, Hamada M, Asai K and Mituyama T 2009 CENTROIDFOLD: a web server for RNA secondary structure prediction. Nucleic Acids Res. 37 W277–W280
Sato K, Kato Y, Hamada M, Akutsu T and Asai K 2011 IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 27 i85–i93
Smit S, Rother K, Heringa J and Knight R 2008 From knotted to nested RNA structures: a variety of computational methods for pseudoknot removal. RNA 14 410–416
Smith C, Heyne S, Richter AS, Will S and Backofen R 2010 Freiburg RNA Tools: a web server integrating INTARNA, EXPARNA and LOCARNA. Nucleic Acids Res. 38 W373–W377
Wan Y, Kertesz M, Spitale RC, Segal E and Chang HY 2011 Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12 641–655
Westhof E and Romby P 2010 The RNA structurome: high-throughput probing. Nat. Methods. 7 965–967
Wiese KC, Glen E and Vasudevan A 2005 JViz.Rna--a Java tool for RNA secondary structure visualization. IEEE Trans. Nanobiosci. 4 212–218
Zuker M 2003 Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 3406–3415
Acknowledgements
Tungadri Bose is also a PhD scholar in the Indian Institute of Technology, Bombay, and would like to acknowledge the Institute for its support. We also thank Mr Pranav Nawathe for assisting in the development of the FASTR webpage.
Author information
Authors and Affiliations
Corresponding author
Additional information
Corresponding editor: Mandar V Deshmukh
[Bose T, Dutta A, Mohammed MH, Gandhi H and Mande SS 2015 FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information. J. Biosci.] DOI 10.1007/s12038-015-9546-0
Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/sep2015/supp/Bose.pdf
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 89.9 kb)
Rights and permissions
About this article
Cite this article
Bose, T., Dutta, A., MH, M. et al. FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information. J Biosci 40, 571–577 (2015). https://doi.org/10.1007/s12038-015-9546-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12038-015-9546-0