Skip to main content

Towards a Statistically Semantic Web

  • Conference paper
Conceptual Modeling – ER 2004 (ER 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3288))

Included in the following conference series:

Abstract

The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this should enable a wealth of query processing and semantic reasoning capabilities using XQuery and logical inference engines. However, we believe that the diversity and uncertainty of terminologies and schema-like annotations will make precise querying on a Web scale extremely elusive if not hopeless, and the same argument holds for large-scale dynamic federations of Deep Web sources. Therefore, ontology-based reasoning and querying needs to be enhanced by statistical means, leading to relevance-ranked lists as query results.

This paper presents steps towards such a “statistically semantic” Web and outlines technical challenges. We discuss how statistically quantified ontological relations can be exploited in XML retrieval, how statistics can help in making Web-scale search efficient, and how statistical information extracted from users’ query logs and click streams can be leveraged for better search result ranking. We believe these are decisive issues for improving the quality of next-generation search engines for intranets, digital libraries, and the Web, and they are crucial also for peer-to-peer collaborative Web search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aberer, K., et al.: Emergent Semantics Principles and Issues. In: International Conference on Database Systems for Advanced Applications, DASFAA (2004)

    Google Scholar 

  2. Abolhassani, M., Fuhr, N.: Applying the Divergence from Randomness Approach for Content-Only Search in XML Documents. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 409–419. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  3. Alexandria Digital Library Project, Gazetteer Development, http://www.alexandria.ucsb.edu/gazetteer/

  4. Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying Structured Text in an XML Database. In: SIGMOD 2003 (2003)

    Google Scholar 

  5. Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD 2004 (2004)

    Google Scholar 

  6. Arasu, A., Garcia-Molina, H.: Extracting Structured Data fromWeb Pages. In: SIGMOD 2003 (2003)

    Google Scholar 

  7. Bawa, M., Manku, G.S., Raghavan, P.: SETS: Search Enhanced by Topic Segmentation. In: SIGIR 2003 (2003)

    Google Scholar 

  8. Bender, M., Michel, S., Weikum, G., Zimmer, C.: Bookmark-driven Query Routing in Peer-to-Peer Web Search. In: SIGIR Workshop on Peer-to-Peer Information Retrieval (2004)

    Google Scholar 

  9. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: WWW Conference (1998)

    Google Scholar 

  10. Budanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental. In: Application-oriented Evaluation of FiveMeasures, Workshop on WordNet and Other Lexical Resources (2001)

    Google Scholar 

  11. Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML Documents via XML Fragments. In: SIGIR 2003 (2003)

    Google Scholar 

  12. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2002)

    Google Scholar 

  13. Chinenyanga, T., Kushmerick, N.: An Expressive and Efficient Language for XML Information Retrieval. Journal of the American Society for Information Science and Technology (JASIST) 53(6) (2002)

    Google Scholar 

  14. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: VLDB 2003 (2003)

    Google Scholar 

  15. Cohen, W.W., Hurst, M., Jensen, L.S.: A Flexible Learning System for Wrapping Tables and Lists in HTML Documents. In: Antonacopoulos, A., Hu, J. (eds.) Web Document Analysis: Challenges and Opportunities, Word Scientific Publishing, Singapore (2004)

    Google Scholar 

  16. Cohen, W.W., Sarawagi, S.: Exploiting Dictionaries in Named Entity Extraction: Combining Semi-markov Extraction Processes and Data Integration Methods. In: KDD 2004 (2004)

    Google Scholar 

  17. Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: VLDB 2001 (2001)

    Google Scholar 

  18. Crespo, A., Garcia-Molina, H.: Semantic Overlay Networks, Technical Report, Stanford University (2003)

    Google Scholar 

  19. Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y.: Query Expansion by Mining User Logs. IEEE Transactions on Knowledge and Data Engineering 15(4) (2003)

    Google Scholar 

  20. Cunningham, H.: GATE, a General Architecture for Text Engineering. Computers and the Humanities 36 (2002)

    Google Scholar 

  21. Davulcu, H., Vadrevu, S., Nagarajan, S., Ramakrishnan, I.V.: OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites. IEEE Intelligent Systems 18(5) (2003)

    Google Scholar 

  22. Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.Y.: Learning to Match Ontologies on the Semantic Web. VLDB Journal 12(4) (2003)

    Google Scholar 

  23. Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. Journal of Computer and System Sciences 66(4) (2003)

    Google Scholar 

  24. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  25. Fensel, D., Wahlster, W., Lieberman, H., Hendler, J.A. (eds.): Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, Cambridge (2002)

    Google Scholar 

  26. Fuhr, N., Großjohann, K.: XIRQL – An Extension of XQL for Information Retrieval. In: SIGIR Workshop on XML and Information Retrieval (2000)

    Google Scholar 

  27. Fuhr, N.: Probabilistic Datalog: Implementing Logical Information Retrieval for Advanced Applications. Journal of the American Society for Information Science (JASIS) 51(2) (2000)

    Google Scholar 

  28. Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: SIGIR 2001 (2001)

    Google Scholar 

  29. Getoor, L., Friedman, N., Koller, D., Pfeffer, A.: Learning Probabilistic Relational Models. In: Dzeroski, S., Lavrac, N. (eds.) Relational Data Mining, Springer, Heidelberg (2001)

    Google Scholar 

  30. Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., Flesca, S.: The Lixto Data Extraction Project – Back and Forth between Theory and Practice. In: PODS 2004 (2004)

    Google Scholar 

  31. Grabs, T., Schek, H.-J.: Flexible Information Retrieval on XML Documents. In: Blanken, H., et al. (eds.) Intelligent Search on XML Data, Springer, Heidelberg (2003)

    Google Scholar 

  32. Graupmann, J., Biwer, M., Zimmer, C., Zimmer, P., Bender, M., Theobald, M., Weikum, G.: COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data, Demo Program. In: VLDB 2004 (2004)

    Google Scholar 

  33. Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing Multi-Feature Queries for Image Databases. In: VLDB 2000 (2000)

    Google Scholar 

  34. Güntzer, U., Balke, W.-T., Kießling, W.: Towards Efficient Multi-Feature Queries in Heterogeneous Environments. In: International Symposium on Information Technology, ITCC (2001)

    Google Scholar 

  35. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD 2003 (2003)

    Google Scholar 

  36. Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: THESUS: Organizing Web Document Collections Based on Link Semantics. VLDB Journal 12(4) (2003)

    Google Scholar 

  37. Halpern, J.Y.: Reasoning about Uncertainty. MIT Press, Cambridge (2003)

    MATH  Google Scholar 

  38. Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the Integration of Structure Indexes and Inverted Lists. In: SIGMOD 2004 (2004)

    Google Scholar 

  39. Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Klusch, M., et al. (eds.) Intelligent Information Agents, Springer, Heidelberg (2003)

    Google Scholar 

  40. Lerman, K., Getoor, L., Minton, S., Knoblock, C.A.: Using the Structure of Web Sites for Automatic Segmentation of Tables. In: SIGMOD 2004 (2004)

    Google Scholar 

  41. Liu, Z., Luo, C., Cho, J., Chu, W.W.: A Probabilistic Approach to Metasearching with Adaptive Probing. In: ICDE 2004 (2004)

    Google Scholar 

  42. Lu, J., Callan, J.P.: Content-based Retrieval in Hybrid Peer-to-peer Networks. In: CIKM 2003 (2003)

    Google Scholar 

  43. Luxenburger, J., Weikum, G.: Query-log Based Authority Analysis forWeb Information Search (submitted for publication)

    Google Scholar 

  44. Maedche, A., Staab, S.: Learning Ontologies for the SemanticWeb. In: International Workshop on the Semantic Web, SemWeb (2001)

    Google Scholar 

  45. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  46. Meng, W., Yu, C.T., Liu, K.-L.: Building Efficient and Effective Metasearch Engines. ACM Computing Surveys 34(1) (2002)

    Google Scholar 

  47. Nepal, S., Ramakrishna, M.V.: Query Processing Issues in Image (Multimedia) Databases. In: ICDE 1999 (1999)

    Google Scholar 

  48. Nottelmann, H., Fuhr, N.: Combining CORI and the Decision-Theoretic Approach for Advanced Resource Selection. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 138–153. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  49. Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic SchemaMatching. VLDB Journal 10(4) (2001)

    Google Scholar 

  50. Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Prentice Hall, Englewood Cliffs (2002)

    Google Scholar 

  51. Sahuguet, A., Azavant, F.: Building Light-weight Wrappers for Legacy Web Datasources using W4F. In: VLDB 1999 (1999)

    Google Scholar 

  52. Schenkel, R., Theobald, A., Weikum, G.: Ontology-Enabled XML Search. In: Blanken, H., et al. (eds.) Intelligent Search on XML Data, Springer, Heidelberg (2003)

    Google Scholar 

  53. Schenkel, R., Theobald, A., Weikum, G.: An Efficient Connection Index for Complex XML Document Collections. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 237–255. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  54. Schlieder, T., Meuss, H.: Querying and Ranking XML Documents. Journal of the American Society for Information Science and Technology (JASIST) 53(6) (2002)

    Google Scholar 

  55. Schlieder, T., Meuss, H.: Result Ranking for Structured Queries against XML Documents. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (2000)

    Google Scholar 

  56. Schlieder, T., Naumann, F.: Approximate Tree Embedding for Querying XML Data. In: SIGIR Workshop on XML and Information Retrieval (2000)

    Google Scholar 

  57. Skounakis, M., Craven, M., Ray, S.: Hierarchical Hidden Markov Models for Information Extraction. In: IJCAI 2003 (2003)

    Google Scholar 

  58. Staab, S., Studer, R. (eds.): Handbook on Ontologies. Springer, Heidelberg (2004)

    Google Scholar 

  59. Theobald, M., Schenkel, R., Weikum, G.: Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data. In: International Workshop on Web and Databases, WebDB (2003)

    Google Scholar 

  60. Theobald, A., Weikum, G.: Adding Relevance to XML. In: Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  61. Theobald, A., Weikum, G.: The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 477. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  62. Theobald, M., Weikum, G., Schenkel, R.: Top-k Query Evaluation with Probabilistic Guarantees. In: VLDB 2004 (2004)

    Google Scholar 

  63. Tijerino, Y.A., Embley, D., Lonsdale, D.W., Nagy, G.: Ontology Generation from Tables. In: WISE 2003 (2003)

    Google Scholar 

  64. Voorhees, E.M.: Query Expansion Using Lexical-Semantic Relations. In: SIGIR 1994 (1994)

    Google Scholar 

  65. Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query Clustering Using User Logs. ACM TOIS 20(1) (2002)

    Google Scholar 

  66. Xu, L., Dai, C., Cai, W., Zhou, S., Zhou, A.: Towards Adaptive Probabilistic Search in Unstructured P2P Systems. In: Asia-Pacific Web Conference, APWeb (2004)

    Google Scholar 

  67. Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., Zhang, H.-J., Lu, C.-J.: Implicit Link Analysis for Small Web Search. In: SIGIR 2003 (2003)

    Google Scholar 

  68. Yu, S., Cai, D., Wen, J.-R., Ma, W.-Y.: Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation. In: WWW Conference (2003)

    Google Scholar 

  69. Zezula, P., Amato, G., Rabitti, F.: Processing XML Queries with Tree Signatures. In: Blanken, H., et al. (eds.) Intelligent Search on XML Data, Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Weikum, G., Graupmann, J., Schenkel, R., Theobald, M. (2004). Towards a Statistically Semantic Web. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, TW. (eds) Conceptual Modeling – ER 2004. ER 2004. Lecture Notes in Computer Science, vol 3288. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30464-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30464-7_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23723-5

  • Online ISBN: 978-3-540-30464-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics