BioDIFF: An Effective Fast Change Detection Algorithm for Biological Annotations

  • Yang Song
  • Sourav S. Bhowmick
  • C. Forbes DeweyJr.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4443)


Warehousing heterogeneous, dynamic biological data is a key technique for biological data integration as it greatly improves performance. However, it requires complex maintenance procedures to update the warehouse in light of the changes to the sources. Consequently, a key issue to address is how to detect changes to the underlying biological data sources. In this paper, we present an algorithm called BioDiff for detecting exact changes to biological annotations. In our approach we transform heterogeneous biological data to XML format and then detect changes between two versions of XML representation of biological data. Our algorithm extends X-Diff, a published XML change detection algorithm. X-Diff, being designed for any type of XML data, does not exploit the semantics of biological data to reduce the data set of bipartite mapping. We have implemented BioDiff in Java. We have conducted an extensive performance study using data from EMBL, GenBank, SwissProt and PDB. Our experimental results show that BioDiff runs 1.5 to 6 times faster than X-Diff.


Biological Data Matching Phase Bipartite Match Biological Annotation Change Detection Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bahl, A.: PlasmoDB: the Plasmodium genome resource. An integrated database that provides tools for accessing, analysing and mapping expression and sequence data (both finished and unfinished). Nucleic Acids Res. 30, 87–90 (2002)CrossRefGoogle Scholar
  2. 2.
    Cobena, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. In: In Proc. of ICDE, pp. 41–52 (2002)Google Scholar
  3. 3.
    Davidson, S.B., Crabtree, J., Brunk, B., et al.: K2/Kleisli and GUS: Experiments in integrated Access to Genomic Data Sources. IBM Systems Journal 40(2), 512–531 (2001)CrossRefGoogle Scholar
  4. 4.
    Garofalakis, M.N., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: A System for Extracting Document Type Descriptors from XML Documents. In: Proc. of SIGMOD, pp. 165–176 (2000)Google Scholar
  5. 5.
    Hammer, J., Schneider, M.: Genomics Algebra: A New, Integrating Data Model, Language, and Tool for Processing and Querying Genomic Information. In: Proc. of Conference on Innovative Data Systems Research (CIDR) (2003)Google Scholar
  6. 6.
    Leser, U., Naumann, F. (Almost) Hands-Off Information Integration for the Life Sciences. In: Proc. of CIDR (2005)Google Scholar
  7. 7.
    Ritter, O., Kocab, P., Senger, M., Wolf, D., Suhai, S.: Prototype implementation of the integrated genomic database. Comput. Biomed. Res. 27, 97–115 (1994)CrossRefGoogle Scholar
  8. 8.
    Stein, L.D.: Integrating Biological Databases. Nature Rev. Genet. 4(5), 337–345 (2003)CrossRefGoogle Scholar
  9. 9.
    Song, Y., Bhowmick, S.S.: BioDiff: An Effective Fast Change Detection Algorithm for Genomic and Proteomic Data. In: Proc. of ACM CIKM (Poster), pp. 146–147 (2004)Google Scholar
  10. 10.
    Song, Y., Bhowmick, S.S.: Bio2X: A Rule-based Approach for Semi-automatic Transformation of Semistructured Biological Data to XML. Data and Knowledge Engineering Journal 52(2), 249–271 (2005)Google Scholar
  11. 11.
    Wang, Y., DeWitt, D., Cai, J.-Y.: X-Diff: A Fast Change Detection Algorithm for XML Documents. In: Proc. of IEEE ICDE , pp. 519–530 (2003)Google Scholar
  12. 12.
    Zdobnov, E.M., Lopez, R., Apweiler, R., Etzold, T.: The EBI SRS server-recent Developments. Bioinformatics 18(2), 368–373 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Yang Song
    • 1
  • Sourav S. Bhowmick
    • 1
    • 2
  • C. Forbes DeweyJr.
    • 3
  1. 1.School of Computer Engineering, Nanyang Technological UniversitySingapore
  2. 2.Singapore-MIT Alliance, Nanyang Technological UniversitySingapore
  3. 3.Division of Biological Engineering, Massachusetts Institute of TechnologyUSA

Personalised recommendations