Applied Bioinformatics

, Volume 5, Issue 1, pp 49–53 | Cite as


A Tool for Processing of Sequence Similarity Analysis Reports
  • Marcos Catanho
  • Daniel Mascarenhas
  • Wim Degrave
  • Antonio Basílio de Miranda
Application Note


The widely used programs BLAST (in this article, ‘BLAST’ includes both the National Center for Biotechnology Information [NCBI] BLAST® and the Washington University version WU BLAST) and FASTA for similarity searches in nucleotide and protein databases usually result in copious output. However, when large query sets are used, human inspection rapidly becomes impractical. BioParser is a Perl program for parsing BLAST and FASTA reports. Making extensive use of the BioPerl toolkit, the program filters, stores and returns components of these reports in either ASCII or HTML format. BioParser is also capable of automatically feeding a local MySQL® database with the parsed information, allowing subsequent filtering of hits and/or alignments with specific attributes. For this reason, BioParser is a valuable tool for large-scale similarity analyses by improving the access to the information present in BLAST or FASTA reports, facilitating extraction of useful information of large sets of sequence alignments, and allowing for easy handling and processing of the data.



We wish to thank Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Programa de Apoio à Pesquisa Estratégica em Saúde — Fiocruz (PAPES-Fiocruz), World Health Organization — Special Programme for Research and Training in Tropical Diseases (WHO/TDR), United Nations University — Biotechnology for Latin America and the Caribbean — Bioinformatics Network for Latin-America and Caribbean (UNU-BIOLAC LacBioNet) and Ciencia y Tecnología para el Desarrollo— Red Iberoamericana de Bioinformática (CYTED-RIB) for support.

The authors have no conflicts of interest that are directly relevant to the content of this article.


  1. 1.
    Yona G, Brenner SE. Comparison of protein sequences and practical database searching. In: Higgins D, Taylor W, editors. Bioinformatics: sequence, structure and databanks: a practical approach. Oxford: Oxford University Press, 2000: 167–90Google Scholar
  2. 2.
    Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. J Mol Biol 1990; 215: 403–10PubMedGoogle Scholar
  3. 3.
    Altschul SF, Madden TL, Schaffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25: 3389–402PubMedCrossRefGoogle Scholar
  4. 4.
    Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 1988; 85: 2444–8PubMedCrossRefGoogle Scholar
  5. 5.
    Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol 1990; 183: 63–98PubMedCrossRefGoogle Scholar
  6. 6.
    Smith TF, Waterman MS. Comparison of biosequences. Adv Appl Math 1981; 2: 482–9CrossRefGoogle Scholar
  7. 7.
    Stajich JE, Block D, Boulez K, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res 2002; 12: 1611–8PubMedCrossRefGoogle Scholar
  8. 8.
    Xing L, Brendel V. Multi-query sequence BLAST output examination with MuSeqBox. Bioinformatics 2001; 17: 744–5PubMedCrossRefGoogle Scholar
  9. 9.
    Paquola AC, Machado AA, Reis EM, et al. Zerg: a very fast BLAST parser library. Bioinformatics 2003; 19: 1035–6PubMedCrossRefGoogle Scholar
  10. 10.
    BioPerl [online]. Available from URL: [Accessed 2005 June]
  11. 11.
    Henriques C, Otto TD, Catanho M, et al. Classification of transporter families in Trypanosoma cruzi [abstract no. BM128]. XXI Annual meeting of the Brazilian Society of Protozoology/XXXII Meeting of Basic Research in Chagas Disease; 2005 Nov 7–9; Caxambú, Brazil; 119Google Scholar
  12. 12.
    Ren Q, Kang KH, Paulsen IT. TransportDB: a relational database of cellular membrane transport systems. Nucleic Acids Res 2004; 32: D284–8PubMedCrossRefGoogle Scholar
  13. 13.
    Eddy SR. Profile hidden Markov models. Bioinformatics 1998; 14(9): 755–63PubMedCrossRefGoogle Scholar
  14. 14.
    Catanho M, Mascarenhas D, Degrave W, et al. GenoMycDB: database for comparative analysis of mycobacterial genes and genomes. Genet Mol Res In pressGoogle Scholar

Copyright information

© Adis Data Information BV 2006

Authors and Affiliations

  • Marcos Catanho
    • 1
    • 2
  • Daniel Mascarenhas
    • 1
  • Wim Degrave
    • 1
  • Antonio Basílio de Miranda
    • 1
  1. 1.Department of Biochemistry and Molecular BiologyOswaldo Cruz Institute, FiocruzRio de JaneiroBrazil
  2. 2.Department of GeneticsFernandes Figueira Institute, FiocruzRio de JaneiroBrazil

Personalised recommendations