An Overview of the BioExtract Server: A Distributed, Web-Based System for Genomic Analysis

  • C. M. LushboughEmail author
  • V. P. Brendel
Conference paper
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 680)


Genome research is becoming increasingly dependent on access to multiple, distributed data sources, and bioinformatic tools. The importance of integration across distributed databases and Web services will continue to grow as the number of requisite resources expands. Use of bioinformatic workflows has seen considerable growth in recent years as scientific research becomes increasingly dependent on the analysis of large sets of data and the use of distributed resources. The BioExtract Server ( is a Web-based system designed to aid researchers in the analysis of distributed genomic data by providing a platform to facilitate the creation of bioinformatic workflows. Scientific workflows are created within the system by recording the analytic tasks preformed by researchers. These steps may include querying multiple data sources, saving query results as searchable data extracts, and executing local and Web-accessible analytic tools. The series of recorded tasks can be saved as a computational workflow simply by providing a name and description.


Database integration Genomic analysis Scientific provenance Scientific workflows Web services 



The BioExtract Server project is currently supported in part by the National Science Foundation grant DBI-0606909.


  1. 1.
    K. Verdi, H. Ellis, and M. Gryk, Conceptual-level workflow modeling of scientific experiments using NMR as a case study, BMC Bioinformatics, 8:31, 2007PubMedCrossRefGoogle Scholar
  2. 2.
    S.F. Altschul, T.L. Madden, A.A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D.J. Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Research, 25(17):3389–3402, 1997PubMedCrossRefGoogle Scholar
  3. 3.
    R. Chenna, H. Sugawara, T. Koike, R. Lopez, T.J. Gibson, D.G. Higgins, and J.D. Thompson, Multiple sequence alignment with the Clustal series of programs, Nucleic Acids Research, 31(13):3497–3500, 2003PubMedCrossRefGoogle Scholar
  4. 4.
    M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch, The enhanced suffix array and its application to genome analysis, Lecture Notes in Computer Science, 2452:449–463, 2002. CrossRefGoogle Scholar
  5. 5.
    E. Deelman and Y. Gil, Workshop on the Challenges of Scientific Workflows; Sponsored by the National Science Foundation,, May 1–2, 2006Google Scholar
  6. 6.
    D. De Roure and C. Goble, Software design for empowering scientists, IEEE Software, 26(1):88–95, 2009CrossRefGoogle Scholar
  7. 7.
    D. De Roure, C. Goble, and R. Stevens, The design and realization of the myExperiment Virtual Research Environment for social sharing of workflows, Future Generation Computer Systems, 25(5):561–567, 2009. corrected proof available as: DOI CrossRefGoogle Scholar
  8. 8.
    D. Hull, K. Wolstencroft, R. Stevens, C. Goble, M. Pocock, P. Li, and T. Oinn, Taverna: a tool for building and running workflows of services, Nucleic Acids Research, 34(Web Server issue):W729–W732, 2006PubMedCrossRefGoogle Scholar
  9. 9.
    B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E.A. Lee, J. Tao, and Y. Zhao, Scientific workflow management and the Kepler system, Concurrency and Computation: Practice & Experience, 18(10):1039–1065, 2006CrossRefGoogle Scholar
  10. 10.
    A. Harrison, I. Taylor, I. Wang, and M. Shields, WS-RF workflow in Triana, International Journal of High Performance Computing Applications (IJHPCA), 22(3):268–283, 2008CrossRefGoogle Scholar
  11. 11.
    J. Elhai, A. Taton, J. Massar, J. Myers, M. Travers, J. Casey, M. Slupesky, and J. Shrager, BioBIKE: A Web-based, programmable, integrated biological knowledge base, Nucleic Acids Research, 37(Web Server issue):W28–W32. doi10.1093, 2009PubMedCrossRefGoogle Scholar
  12. 12.
    S. Bowers, T. McPhillips, B. Ludäscher, S.Cohen, and S. Davidson, A Model for user-oriented data provenance in pipelined scientific workflows, Lecture Notes in Computer Science, Springer, Berlin, ISBN: 978-3-540-46302-3, pp 133–147Google Scholar
  13. 13.
    C. Goble, Position statement: musings on provenance, workflow and (semantic web) annotations for bioinformatics, Proceedings of the Workshop on Data Derivation and Provenance, 2002;
  14. 14.
    L. Moreau, B Ludäscher, I. Altintas, R. Barga, S. Bowers, , S. Callahan, G. Chin, B. Clifford, S. Cohen, S. Cohen-Boulakia, S. Davidson, E. Deelman, L. Digiampietri, I. Foster, J. Freire, J. Frew, J. Futrelle, T. Gibson, Y. Gil, C. Goble, J. Golbeck, P. Groth, D. A. Holland, S. Jiang, J. Kim, D. Koop, A. Krenek, T. McPhillips, G. Mehta, S. Miles, D. Metzger, S. Munroe, J. Myers, B. Plale, N. Podhorszki, V. Ratnakar, E. Santos, C. Scheidegger, K. Schuchardt, M. Seltzer, Y. Simmhan, C. Silva, P. Slaughter, E. Stephan, R. Stevens, D. Turi, H. Vo, M. Wilde, J. Zhao, and Y. Zhao, The First Provenance Challenge, Concurrency and Computation: Practice & Experience, 20(5):409–418, 2008CrossRefGoogle Scholar
  15. 15.
    L. Moreau, J. Futrelle, R. McGrath, J. Myers, and P. Pualson, The open provenance model: an overview, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, ISBN 978-3-540-89964-8, 5272:323–326, 2008Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of South DakotaVermillionUSA

Personalised recommendations