Classification and Focused Crawling for Semistructured Data

  • Martin Theobald
  • Ralf Schenkel
  • Gerhard Weikum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2818)

Abstract

Despite the great advances in XML data management and querying, the currently prevalent XPath- or XQuery-centric approaches face severe limitations when applied to XML documents in large intranets, digital libraries, federations of scientific data repositories, and ultimately the Web. In such environments, data has much more diverse structure and annotations than in a business-data setting and there is virtually no hope for a common schema or DTD that all the data complies with. Without a schema, however, databasestyle querying would often produce either empty result sets, namely, when queries are overly specific, or way too many results, namely, when search predicates are overly broad, the latter being the result of the user not knowing enough about the structure and annotations of the data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Martin Theobald
    • 1
  • Ralf Schenkel
    • 1
  • Gerhard Weikum
    • 1
  1. 1.Universität des Saarlandes, Fachrichtung 6.2 InformatikSaarbrückenGermany

Personalised recommendations