Skip to main content
Log in

FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information

  • Published:
Journal of Biosciences Aims and scope Submit manuscript

Abstract

Given the importance of RNA secondary structures in defining their biological role, it would be convenient for researchers seeking RNA data if both sequence and structural information pertaining to RNA molecules are made available together. Current nucleotide data repositories archive only RNA sequence data. Furthermore, storage formats which can frugally represent RNA sequence as well as structure data in a single file, are currently unavailable. This article proposes a novel storage format, ‘FASTR’, for concomitant representation of RNA sequence and structure. The storage efficiency of the proposed FASTR format has been evaluated using RNA data from various microorganisms. Results indicate that the size of FASTR formatted files (containing both RNA sequence as well as structure information) are equivalent to that of FASTA-format files, which contain only RNA sequence information. RNA secondary structure is typically represented using a combination of a string of nucleotide characters along with the corresponding dot-bracket notation indicating structural attributes. ‘FASTR’ – the novel storage format proposed in the present study enables a frugal representation of both RNA sequence and structural information in the form of a single string. In spite of having a relatively smaller storage footprint, the resultant ‘fastr’ string(s) retain all sequence as well as secondary structural information that could be stored using a dot-bracket notation. An implementation of the ‘FASTR’ methodology is available for download at http://metagenomics.atc.tcs.com/compression/fastr .

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2

Similar content being viewed by others

References

  • Aalberts DP and Hodas NO 2005 Asymmetry in RNA pseudoknots: observation and theory. Nucleic Acids Res. 33 2210–2214

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Achawanantakun R, Sun Y and Takyar SS 2011 ncRNA consensus secondary structure derivation using grammar strings. J. Bioinform. Comput. Biol. 9 317–337

    Article  CAS  PubMed  Google Scholar 

  • Andronescu M, Aguirre-Hernández R, Condon A and Hoos HH 2003 RNAsoft: A suite of RNA secondary structure prediction and design software tools. Nucleic Acids Res. 31 3416–3422

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Antczak M, Zok T, Popenda M, Lukasiak P, Adamiak RW, Blazewicz J and Szachniuk M 2014 RNApdbee--a webserver to derive secondary structures from pdb files of knotted and unknotted RNAs. Nucleic Acids Res. 42 W368–W372

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Bose T, Mohammed MH, Dutta A and Mande SS 2012 BIND - an algorithm for loss-less compression of nucleotide sequence data. J. Biosci. 37 785–789

    Article  CAS  PubMed  Google Scholar 

  • Breaker RR 2011 Prospects for riboswitch discovery and analysis. Mol. Cell. 43 867–879

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Chiu JK and Chen YP 2014 Efficient conversion of RNA pseudoknots to knot-free structures using a graphical model. IEEE Trans. Biomed. Eng. Dec 2

  • Deorowicz S and Grabowski Sz 2011 Compression of DNA sequence reads in FASTQ format. Bioinformatics 27(6) 860–862

  • Dutta A, Haque MM, Bose T, Reddy CVSK and Mande SS 2015 FQC: A novel approach for efficient compression, archival, and dissemination of fastq datasets. J. Bioinforma. Comput. Biol. 13 1541003

    Article  CAS  Google Scholar 

  • Gruber AR, Lorenz R, Bernhart SH, Neuböck R and Hofacker IL 2008 The Vienna RNA websuite. Nucleic Acids Res. 36 W70–W74

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Lorenz R, Bernhart SH, Siederdissen CH, zu Tafer H, Flamm C, Stadler PF and Hofacker IL 2011 Vienna RNA package 2.0. Algorithms Mol. Biol. 6 26

    Article  PubMed Central  PubMed  Google Scholar 

  • McManus CJ and Graveley BR 2011 RNA structure and the mechanisms of alternative splicing. Curr. Opin. Genet. Dev. 21 373–379

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Mohammed MH, Dutta A, Bose T, Chadaram S and Mande SS 2012 DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences. Bioinformatics. 28 2527–2529

    Article  CAS  PubMed  Google Scholar 

  • Ossowski S, Schwab R and Weigel D 2008 Gene silencing in plants using artificial microRNAs and other small RNAs. Plant J. 53 674–690

    Article  CAS  PubMed  Google Scholar 

  • Pinho AJ and Pratas D 2014 MFCompress: a compression tool for FASTA and multi-FASTA data. Bioinformatics 30 117–118

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Sato K, Hamada M, Asai K and Mituyama T 2009 CENTROIDFOLD: a web server for RNA secondary structure prediction. Nucleic Acids Res. 37 W277–W280

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Sato K, Kato Y, Hamada M, Akutsu T and Asai K 2011 IPknot: fast and accurate prediction of RNA secondary structures with pseudoknots using integer programming. Bioinformatics 27 i85–i93

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Smit S, Rother K, Heringa J and Knight R 2008 From knotted to nested RNA structures: a variety of computational methods for pseudoknot removal. RNA 14 410–416

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Smith C, Heyne S, Richter AS, Will S and Backofen R 2010 Freiburg RNA Tools: a web server integrating INTARNA, EXPARNA and LOCARNA. Nucleic Acids Res. 38 W373–W377

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wan Y, Kertesz M, Spitale RC, Segal E and Chang HY 2011 Understanding the transcriptome through RNA structure. Nat. Rev. Genet. 12 641–655

    Article  CAS  PubMed  Google Scholar 

  • Westhof E and Romby P 2010 The RNA structurome: high-throughput probing. Nat. Methods. 7 965–967

    Article  CAS  PubMed  Google Scholar 

  • Wiese KC, Glen E and Vasudevan A 2005 JViz.Rna--a Java tool for RNA secondary structure visualization. IEEE Trans. Nanobiosci. 4 212–218

    Article  Google Scholar 

  • Zuker M 2003 Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31 3406–3415

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Tungadri Bose is also a PhD scholar in the Indian Institute of Technology, Bombay, and would like to acknowledge the Institute for its support. We also thank Mr Pranav Nawathe for assisting in the development of the FASTR webpage.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sharmila S Mande.

Additional information

Corresponding editor: Mandar V Deshmukh

[Bose T, Dutta A, Mohammed MH, Gandhi H and Mande SS 2015 FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information. J. Biosci.] DOI 10.1007/s12038-015-9546-0

Supplementary materials pertaining to this article are available on the Journal of Biosciences Website at http://www.ias.ac.in/jbiosci/sep2015/supp/Bose.pdf

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 89.9 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bose, T., Dutta, A., MH, M. et al. FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information. J Biosci 40, 571–577 (2015). https://doi.org/10.1007/s12038-015-9546-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12038-015-9546-0

Keywords

Navigation