Advertisement

Ähnlichkeitssuche auf XML-Daten

  • Sergej Sizov
  • Anja Theobald
  • Gerhard Weikum
Conference paper
Part of the Informatik aktuell book series (INFORMAT)

Zusammenfassung

Anfragesprachen für XML, wie z.B. XPATH oder XML-QL, unterstützen Boolesches Retrieval; Anfrageergebnisse sind dabei ungeordnete Mengen von XML-Elementen, die die regulären Suchmuster einer Anfrage erfüllen. Dieses Suchparadigma ist für stark schematisierte, “geschlossene“ XML-Dokumentkollektionen, z.B. elektronische Kataloge, geeignet. Für die Suche nach Informationen im World Wide Web oder in “offenen“ Umgebungen, z.B. Intranets großer Unternehmen, ist jedoch Ranked Retrieval vorzuziehen; Anfrageergebnisse sind dabei Ranglisten von XML- Elementen, die nach absteigender Relevanz sortiert sind. Web-Suchmaschinen, die auf Information-Retrieval-Konzepten basieren, sind andererseits nicht in der Lage, die zusätzlichen Informationen, die sich aus der Struktur von XML-Dokumenten und der semantischen Annotation durch Elementnamen ergeben, effektiv auszunutzen. Im vorliegenden Beitrag werden Konzepte vorgestellt, die die Suchmöglichkeiten von XML-Anfragesprachen mit Ranked Retrieval verbinden. Insbesondere werden Möglichkeiten diskutiert, wie das Suchen auf XML-Daten mit Hilfe von Ontologien und speziellen Indexstrukturen in seiner Effektivität und Effizienz verbessert werden kann. Die vorgestellten Konzepte werden in der laufenden Implementierung der Anfragesprache XXL verwendet.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Literatur

  1. [ABSOO]
    S. Abiteboul, P. Buneman, D. Suciu: Data on the Web—From Relations to Semistructured Data and XML. San Francisco: Morgan Kaufmann Publishers, 2000.Google Scholar
  2. [AQM+97]
    S. Abiteboul, D. Quass, J. McHugh, J. Widom, J.L. Wiener: The Lorel Query Language for Semistructured Data. International Journal of Digital Libraries 1(1): 68–88 (1997).Google Scholar
  3. [BAN+97]
    K. Böhm, K. Aberer, E.J. Neuhold, X. Yang: Structured Document Storage and Refined Declarative and Navigational Access Mechanisms in HyperStorM, VLDB Journal 6(4), 1997.Google Scholar
  4. [BGM+99]
    E. Bertino, G. Guerrini, I. Merlo, M. Mesiti: An Approach to Classify Semi-Structured Objects. 13th European Conference on Object-Oriented Programming (ECOOP), Lisbon, Portugal, 1999.Google Scholar
  5. [BKK+00]
    R. Braumandl, M. Keidel, A. Kemper, D. Kossmann, A. Kreutz, S. Pröltz, S. Seltzsam, K. Stocker: ObjectGlobe: Ubiquitous Query Processing on the Internet, International Workshop on Technologies for E-Services, Cairo, 2000.Google Scholar
  6. [BP98]
    S. Brin, L. Page: The Anatomy of a Large Scale Hypertextual Web Search Engine, 7th WWW Conference, 1998.Google Scholar
  7. [BrOO] BrightPlanet.com: The Deep Web: Surfacing Hidden Value, White Paper, 2000, http://www.completeplanet.com/Tutorials/DeepWeb/index.asp
  8. [BR99]
    R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval, Addison Wesley, 1999.Google Scholar
  9. [CCD+99]
    S. Ceri, S. Comai, E. Damiani, P. Fraternali, S. Paraboschi, L. Tanca: XML-GL: A Graphical Language for Querying and Restructuring XML Documents. WWW8/Computer Networks 31(11-16): 1171–1187 (1999).CrossRefGoogle Scholar
  10. [CDAR98]
    S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan: Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies, The VLDB Journal Vol. 7, No. 3, 1998.Google Scholar
  11. [CDOO]
    H. Chen, S. Dumais: Bringing Order to the Web: Automatically Categorizing Search Results. CHI 2000, Human Factors in Computing Systems, 2000, 145–152.Google Scholar
  12. [Coh98]
    W.W. Cohen: Integration of Heterogeneous Databases Without Common Domains Using Queries Based on Textual Similarity, ACM SIGMOD Conference, 1998.Google Scholar
  13. [Coh99]
    W. W. Cohen: Recognizing Structure in Web Pages using Similarity Queries. 16. Nat. Conf. on Artif. Intelligence (AAAI) / 11th Conf. on Innovative Appl. Of Artif. Intelligence (IAAI), pp. 59–66, 1999.Google Scholar
  14. [CRFOO]
    D.D. Chamberlin, J. Robie, D. Florescu: Quilt: An XML Query Language for Heterogeneous Data Sources, 3rd International Workshop on the Web and Databases, 2000.Google Scholar
  15. [CSM97]
    M. Cutler, Y. Shih, W. Meng: Using the Structure of HTML Documents to Improve Retrieval, USENIX Symposium on Internet Technologies and Systems, Monterey, California, 1997.Google Scholar
  16. [CVD99]
    S. Chakrabarti, M. van den Berg, B. Dom: Focused Crawling: A New Approach to Topic-specific Web Resource Discovery, WWW Conference, Toronto, 1999.Google Scholar
  17. [DCOO]
    S. Dumais, H. Chen: Hierarchical Classification of Web Content, ACM SIGIR Conference, 2000.Google Scholar
  18. [DFF+99]
    A. Deutsch, M. F. Fernandez, D. Florescu, A. Y. Levy, D. Suciu: A Query Language for XML. WWW8/Computer Networks 31(11-16): 1155–1169 (1999).CrossRefGoogle Scholar
  19. [FGOO]
    N. Fuhr, K. Großjohann: XIRQL: An Extension of XQL for Information Retrieval, ACM SIGIR Workshop on XML and Information Retrieval, Athens, 2000.Google Scholar
  20. [FKM00]
    D. Florescu, D. Kossmann, I. Manolescu: Integrating Keyword Search into XML Query Processing, WWW Conference, 2000.Google Scholar
  21. [FM00]
    T. Fiebig, G. Moerkotte: Evaluating Queries on Structure with Extended Access Support Relations. 3rd International Workshop on Web and Databases (WebDB), Dallas, USA, 2000.Google Scholar
  22. [FR98]
    N. Fuhr, T. Rölleke: HySpirit—a Probabilistic Inference Engine for Hypermedia Retrieval in Large Databases, 6th International Conference on Extending Database Technology (EDBT), Valencia, Spain, 1998.Google Scholar
  23. [GHR00]
    N. Gupta, J.R. Haritsa, M. Ramanath: Distributed Query Processing on the Web, International Conference on Data Engineering (ICDE), 2000.Google Scholar
  24. [GW97]
    R. Goldman, J. Widom: DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases, Proc. of the Very Large Data Base (VLDB) Conference, 1997.Google Scholar
  25. [Hea98]
    M.A. Hearst (Editor), Tends and Controversies: Support Vector Machines, IEEE Intelligent Systems Vol. 13, No. 4, 1998.Google Scholar
  26. [HTK00]
    Y. Hayashi, J. Tomita, G. Kikui: Searching Text-rich XML Documents with Relevance Ranking. ACM SIGIR 2000 Workshop on XML and Information Retrieval, Athens, Greece, 2000.Google Scholar
  27. [HW00]
    A. Heuer, G. Weber: SWING: Eine Suchmaschine mit Datenbankanschluß, Gl Workshop Internet-Datenbanken, Berlin, 2000.Google Scholar
  28. [ILW+00]
    Z.G. Ives, A.L. Levy, D.S. Weld, D. Florescu, M. Friedman: Adaptive Query Processing for Internet Applications, IEEE Data Engineering Bulletin Vol. 23, No. 2, 2000.Google Scholar
  29. [K199]
    J.M. Kleinberg: Authoritative Sources in a Hyperlinked Environment, Journal of the ACM Vol. 46, No. 5, 1999.Google Scholar
  30. [KM95]
    A. Kemper, G. Moerkotte: Physical Object Management. Modern Database Systems 1995: 175-202, in: Won Kim (Ed.): Modern Database Systems: The Object Model, Interoperability, and Beyond. ACM Press and Addison-Wesley, 1995.Google Scholar
  31. [Kos99]
    D. Kossmann (Editor), Special Issue on XML, IEEE Data Engineering Bulletin Vol. 22, No. 3, 1999.Google Scholar
  32. [KRR+00]
    S.R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, E. Upfal: The Web as a Graph, ACM Symposium on Principles of Database Systems (PODS), 2000.Google Scholar
  33. [Lew98]
    D.D. Lewis: Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval, European Conference on Machine Learning (ECML), 1998.Google Scholar
  34. [MAG+97]
    J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: A Database Management System for Semistructured Data. SIGMOD Record, 26(3): 54–66 (1997).CrossRefGoogle Scholar
  35. [MJK+98]
    S.-H. Myaeng, D.-H. Jang, M.-S. Kim, Z.-C. Zhoo: A Flexible Model for Retrieval of SGML Documents, ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998.Google Scholar
  36. [MWKOO]
    P. Mitra, G. Wiederhold, M.L. Kersten: Articulation of Ontology Interdependencies Using a Graph-Oriented Approach, Proceedings of the 7th International Conference on Extending Database Technology (EDBT), Constance, Germany, Springer, 2000.Google Scholar
  37. [NDM+00]
    J. Naughton, D. DeWitt, D. Maier, et al.: The Niagara Internet Query System. http://www.cs.wisc.edu/niagara/Publications.html
  38. [Ora8ia]
    Oracle 8i interMedia: Platform Service for Internet Media and Document Content, http://technet.oracle.com/products/intermedia/
  39. [Ora8ib]
    Oracle 8i interMedia Text Reference Release 8.1.5.Google Scholar
  40. [Ra97]
    Raghavan, P.: Information Retrieval Algorithms: A Survey, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1997.Google Scholar
  41. [RN95]
    S. Russell, P. Norvig: Artificial Intelligence—A Modern Approach, Prentice-Hall, 1995.Google Scholar
  42. [SEOO]
    A. Sugiura, O. Etzioni: Query Routing for Web Search Engines: Architecture and Experiments, 9th WWW Conference, 2000.Google Scholar
  43. [SGT+99]
    J. Shanmugasundaram, G. He, K. Tufte, C. Zhang, D. DeWitt, J. Naughton: Relational Databases for Querying XML Documents: Limitations and Opportunities. Proc. of the Very Large Data Base (VLDB) Conference, 1999.Google Scholar
  44. [SM99]
    CD. Manning, H. Schuetze: Foundations of Statistical Natural Language Processing, MIT Press, 1999.Google Scholar
  45. [TW00]
    A. Theobald, G. Weikum: Adding Relevance to XML, Proceedings of the 3rd International Workshop on the Web and Databases, LNCS, Springer, 2000.Google Scholar
  46. [Vap99]
    V. Vapnik: The Nature of Statistical Learning Theory. Springer, New York, 1999.Google Scholar
  47. [XMLQL]
    XML-QL: A Query Language for XML, User’s Guide, Version 0.6, http://www.research.att.com/~mff/xmlql/doc
  48. [XPATH]
    XML Path Language (XPath) Version 1.0. W3C Recommendation, 1999, http://www.w3.org/TR/xpath

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Sergej Sizov
    • 1
  • Anja Theobald
    • 1
  • Gerhard Weikum
    • 1
  1. 1.Universität des SaarlandesGermany

Personalised recommendations