An Extensible Light-Weight XML-Based Monitoring System for Sequence Databases

  • Dieter Van de Craen
  • Frank Neven
  • Kerstin Koch
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4075)


Life science researchers want biological information in their interest to become available to them as soon as possible. A monitoring system is a solution that relieves biologists from periodic exploration of databases. In particular, it allows them to express their interest in certain data by means of queries/constraints; they are then notified when new data arrives satisfying these queries/constraints. We describe a sequence monitoring system XSeqM where users can combine metadata queries on sequence records with constraints on an alignment against a given source sequence. The system is an XML-based solution where constraints are specified through search fields in a user-friendly web interface and which are then translated to corresponding XPath-expressions. The system is easily extensible as addition of new databases to the system then only amounts to the specification of new mappings from search fields to XPath-expressions. To protect private source sequences obtained in labs, it is imperative that researchers do not have to upload their sequences to a general untrusted system, but that they can run XSeqM locally. To keep the system light-weight, we therefore introduce an optimization technique based on query containment to reduce the number of XPath-evaluations which constitutes the bottleneck of the system. We experimentally validate this technique and show that it can drastically improve the running time.


Directed Acyclic Graph Disjunctive Normal Form True Propagation Sequence Record Containment Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    The Apache Xalan Project,
  2. 2.
    Bioinformatic Sequence Markup Language (BSML),
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
    PubMed Cubby,
  9. 9.
  10. 10.
    World Wide Web Consortium. Extensible Markup Language (XML),
  11. 11.
  12. 12.
    Altinel, M., Franklin, M.J.: Efficient filtering of XML documents for selective dissemination of information. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 53–64. Morgan Kaufmann Publishers Inc., San Francisco (2000)Google Scholar
  13. 13.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)Google Scholar
  14. 14.
    Bleiholder, J., Khuller, S., Naumann, F., Raschid, L., Wu, Y.: Query planning in the presence of overlapping sources. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 811–828. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    Bleiholder, J., Naumann, Z., Lacroix, F., Raschid, L., Murthy, H., Vidal, M.-E.: Biofast: challenges in exploring linked life sciences sources. SIGMOD Record 33(2), 72–77 (2004)CrossRefGoogle Scholar
  16. 16.
    Cerami, E.: XML for Bioinformatics. Springer, Heidelberg (2004)Google Scholar
  17. 17.
    Clark, J.: XML Path Language (XPath),
  18. 18.
    Diao, Y., Fischer, P., Franklin, M., To, R.: YFilter: Efficient and Scalable Filtering of XML Documents. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), p. 341. IEEE Computer Society, Los Alamitos (2002)CrossRefGoogle Scholar
  19. 19.
    Diao, Y., Franklin, M.J.: High-Performance XML Filtering: An Overview of YFilter. IEEE Data Engineering Bulletin 26(1), 41–48 (2003)Google Scholar
  20. 20.
    Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)MATHGoogle Scholar
  21. 21.
    Green, T.J., Miklau, G., Onizuka, M., Suciu, D.: Processing XML Streams with Deterministic Automata. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 173–189. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  22. 22.
    Hokamp, K., Wolfe, K.: What’s new in the library? What’s new in GenBank? Let PubCrawler tell you. Trends in Genetics 15(11), 471–472 (1999)CrossRefGoogle Scholar
  23. 23.
    Hokamp, K., Wolfe, K.H.: PubCrawler: keeping up comfortably with PubMed and GenBank. Nucleic Acids Research 32, (Web Server Issue), W16–W19 (2004)CrossRefGoogle Scholar
  24. 24.
    Neven, F., Van de Craen, D.: Optimizing monitoring queries over distributed data. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 829–846. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Shultz, M., De Groote, S.L.: MEDLINE SDI services: how do they compare? Journal of the Medical Library Association 91(4), 460–467 (2003)Google Scholar
  26. 26.
    Wilson, J.F.: The rise of biological databases. The Scientist 16(6), 34 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dieter Van de Craen
    • 1
  • Frank Neven
    • 1
  • Kerstin Koch
    • 1
  1. 1.Hasselt University and Transnational University of Limburg, School for Information Technology 

Personalised recommendations