Towards a Statistically Semantic Web

Weikum, Gerhard; Graupmann, Jens; Schenkel, Ralf; Theobald, Martin

doi:10.1007/978-3-540-30464-7_2

Gerhard Weikum²¹,
Jens Graupmann²¹,
Ralf Schenkel²¹ &
…
Martin Theobald²¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3288))

Included in the following conference series:

International Conference on Conceptual Modeling

965 Accesses
3 Citations

Abstract

The envisioned Semantic Web aims to provide richly annotated and explicitly structured Web pages in XML, RDF, or description logics, based upon underlying ontologies and thesauri. Ideally, this should enable a wealth of query processing and semantic reasoning capabilities using XQuery and logical inference engines. However, we believe that the diversity and uncertainty of terminologies and schema-like annotations will make precise querying on a Web scale extremely elusive if not hopeless, and the same argument holds for large-scale dynamic federations of Deep Web sources. Therefore, ontology-based reasoning and querying needs to be enhanced by statistical means, leading to relevance-ranked lists as query results.

This paper presents steps towards such a “statistically semantic” Web and outlines technical challenges. We discuss how statistically quantified ontological relations can be exploited in XML retrieval, how statistics can help in making Web-scale search efficient, and how statistical information extracted from users’ query logs and click streams can be leveraged for better search result ranking. We believe these are decisive issues for improving the quality of next-generation search engines for intranets, digital libraries, and the Web, and they are crucial also for peer-to-peer collaborative Web search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aberer, K., et al.: Emergent Semantics Principles and Issues. In: International Conference on Database Systems for Advanced Applications, DASFAA (2004)
Google Scholar
Abolhassani, M., Fuhr, N.: Applying the Divergence from Randomness Approach for Content-Only Search in XML Documents. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 409–419. Springer, Heidelberg (2004)
Chapter Google Scholar
Alexandria Digital Library Project, Gazetteer Development, http://www.alexandria.ucsb.edu/gazetteer/
Al-Khalifa, S., Yu, C., Jagadish, H.V.: Querying Structured Text in an XML Database. In: SIGMOD 2003 (2003)
Google Scholar
Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: FleXPath: Flexible Structure and Full-Text Querying for XML. In: SIGMOD 2004 (2004)
Google Scholar
Arasu, A., Garcia-Molina, H.: Extracting Structured Data fromWeb Pages. In: SIGMOD 2003 (2003)
Google Scholar
Bawa, M., Manku, G.S., Raghavan, P.: SETS: Search Enhanced by Topic Segmentation. In: SIGIR 2003 (2003)
Google Scholar
Bender, M., Michel, S., Weikum, G., Zimmer, C.: Bookmark-driven Query Routing in Peer-to-Peer Web Search. In: SIGIR Workshop on Peer-to-Peer Information Retrieval (2004)
Google Scholar
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: WWW Conference (1998)
Google Scholar
Budanitsky, A., Hirst, G.: Semantic Distance in WordNet: An Experimental. In: Application-oriented Evaluation of FiveMeasures, Workshop on WordNet and Other Lexical Resources (2001)
Google Scholar
Carmel, D., Maarek, Y.S., Mandelbrod, M., Mass, Y., Soffer, A.: Searching XML Documents via XML Fragments. In: SIGIR 2003 (2003)
Google Scholar
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann Publishers, San Francisco (2002)
Google Scholar
Chinenyanga, T., Kushmerick, N.: An Expressive and Efficient Language for XML Information Retrieval. Journal of the American Society for Information Science and Technology (JASIST) 53(6) (2002)
Google Scholar
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: A Semantic Search Engine for XML. In: VLDB 2003 (2003)
Google Scholar
Cohen, W.W., Hurst, M., Jensen, L.S.: A Flexible Learning System for Wrapping Tables and Lists in HTML Documents. In: Antonacopoulos, A., Hu, J. (eds.) Web Document Analysis: Challenges and Opportunities, Word Scientific Publishing, Singapore (2004)
Google Scholar
Cohen, W.W., Sarawagi, S.: Exploiting Dictionaries in Named Entity Extraction: Combining Semi-markov Extraction Processes and Data Integration Methods. In: KDD 2004 (2004)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo, P.: RoadRunner: Towards Automatic Data Extraction from Large Web Sites. In: VLDB 2001 (2001)
Google Scholar
Crespo, A., Garcia-Molina, H.: Semantic Overlay Networks, Technical Report, Stanford University (2003)
Google Scholar
Cui, H., Wen, J.-R., Nie, J.-Y., Ma, W.-Y.: Query Expansion by Mining User Logs. IEEE Transactions on Knowledge and Data Engineering 15(4) (2003)
Google Scholar
Cunningham, H.: GATE, a General Architecture for Text Engineering. Computers and the Humanities 36 (2002)
Google Scholar
Davulcu, H., Vadrevu, S., Nagarajan, S., Ramakrishnan, I.V.: OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites. IEEE Intelligent Systems 18(5) (2003)
Google Scholar
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.Y.: Learning to Match Ontologies on the Semantic Web. VLDB Journal 12(4) (2003)
Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. Journal of Computer and System Sciences 66(4) (2003)
Google Scholar
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Fensel, D., Wahlster, W., Lieberman, H., Hendler, J.A. (eds.): Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press, Cambridge (2002)
Google Scholar
Fuhr, N., Großjohann, K.: XIRQL – An Extension of XQL for Information Retrieval. In: SIGIR Workshop on XML and Information Retrieval (2000)
Google Scholar
Fuhr, N.: Probabilistic Datalog: Implementing Logical Information Retrieval for Advanced Applications. Journal of the American Society for Information Science (JASIS) 51(2) (2000)
Google Scholar
Fuhr, N., Großjohann, K.: XIRQL: A Query Language for Information Retrieval in XML Documents. In: SIGIR 2001 (2001)
Google Scholar
Getoor, L., Friedman, N., Koller, D., Pfeffer, A.: Learning Probabilistic Relational Models. In: Dzeroski, S., Lavrac, N. (eds.) Relational Data Mining, Springer, Heidelberg (2001)
Google Scholar
Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., Flesca, S.: The Lixto Data Extraction Project – Back and Forth between Theory and Practice. In: PODS 2004 (2004)
Google Scholar
Grabs, T., Schek, H.-J.: Flexible Information Retrieval on XML Documents. In: Blanken, H., et al. (eds.) Intelligent Search on XML Data, Springer, Heidelberg (2003)
Google Scholar
Graupmann, J., Biwer, M., Zimmer, C., Zimmer, P., Bender, M., Theobald, M., Weikum, G.: COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data, Demo Program. In: VLDB 2004 (2004)
Google Scholar
Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing Multi-Feature Queries for Image Databases. In: VLDB 2000 (2000)
Google Scholar
Güntzer, U., Balke, W.-T., Kießling, W.: Towards Efficient Multi-Feature Queries in Heterogeneous Environments. In: International Symposium on Information Technology, ITCC (2001)
Google Scholar
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: Ranked Keyword Search over XML Documents. In: SIGMOD 2003 (2003)
Google Scholar
Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: THESUS: Organizing Web Document Collections Based on Link Semantics. VLDB Journal 12(4) (2003)
Google Scholar
Halpern, J.Y.: Reasoning about Uncertainty. MIT Press, Cambridge (2003)
MATH Google Scholar
Kaushik, R., Krishnamurthy, R., Naughton, J.F., Ramakrishnan, R.: On the Integration of Structure Indexes and Inverted Lists. In: SIGMOD 2004 (2004)
Google Scholar
Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Klusch, M., et al. (eds.) Intelligent Information Agents, Springer, Heidelberg (2003)
Google Scholar
Lerman, K., Getoor, L., Minton, S., Knoblock, C.A.: Using the Structure of Web Sites for Automatic Segmentation of Tables. In: SIGMOD 2004 (2004)
Google Scholar
Liu, Z., Luo, C., Cho, J., Chu, W.W.: A Probabilistic Approach to Metasearching with Adaptive Probing. In: ICDE 2004 (2004)
Google Scholar
Lu, J., Callan, J.P.: Content-based Retrieval in Hybrid Peer-to-peer Networks. In: CIKM 2003 (2003)
Google Scholar
Luxenburger, J., Weikum, G.: Query-log Based Authority Analysis forWeb Information Search (submitted for publication)
Google Scholar
Maedche, A., Staab, S.: Learning Ontologies for the SemanticWeb. In: International Workshop on the Semantic Web, SemWeb (2001)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Meng, W., Yu, C.T., Liu, K.-L.: Building Efficient and Effective Metasearch Engines. ACM Computing Surveys 34(1) (2002)
Google Scholar
Nepal, S., Ramakrishna, M.V.: Query Processing Issues in Image (Multimedia) Databases. In: ICDE 1999 (1999)
Google Scholar
Nottelmann, H., Fuhr, N.: Combining CORI and the Decision-Theoretic Approach for Advanced Resource Selection. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 138–153. Springer, Heidelberg (2004)
Chapter Google Scholar
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic SchemaMatching. VLDB Journal 10(4) (2001)
Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Prentice Hall, Englewood Cliffs (2002)
Google Scholar
Sahuguet, A., Azavant, F.: Building Light-weight Wrappers for Legacy Web Datasources using W4F. In: VLDB 1999 (1999)
Google Scholar
Schenkel, R., Theobald, A., Weikum, G.: Ontology-Enabled XML Search. In: Blanken, H., et al. (eds.) Intelligent Search on XML Data, Springer, Heidelberg (2003)
Google Scholar
Schenkel, R., Theobald, A., Weikum, G.: An Efficient Connection Index for Complex XML Document Collections. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 237–255. Springer, Heidelberg (2004)
Chapter Google Scholar
Schlieder, T., Meuss, H.: Querying and Ranking XML Documents. Journal of the American Society for Information Science and Technology (JASIST) 53(6) (2002)
Google Scholar
Schlieder, T., Meuss, H.: Result Ranking for Structured Queries against XML Documents. In: DELOS Workshop: Information Seeking, Searching and Querying in Digital Libraries (2000)
Google Scholar
Schlieder, T., Naumann, F.: Approximate Tree Embedding for Querying XML Data. In: SIGIR Workshop on XML and Information Retrieval (2000)
Google Scholar
Skounakis, M., Craven, M., Ray, S.: Hierarchical Hidden Markov Models for Information Extraction. In: IJCAI 2003 (2003)
Google Scholar
Staab, S., Studer, R. (eds.): Handbook on Ontologies. Springer, Heidelberg (2004)
Google Scholar
Theobald, M., Schenkel, R., Weikum, G.: Exploiting Structure, Annotation, and Ontological Knowledge for Automatic Classification of XML Data. In: International Workshop on Web and Databases, WebDB (2003)
Google Scholar
Theobald, A., Weikum, G.: Adding Relevance to XML. In: Suciu, D., Vossen, G. (eds.) WebDB 2000. LNCS, vol. 1997, Springer, Heidelberg (2001)
Chapter Google Scholar
Theobald, A., Weikum, G.: The Index-based XXL Search Engine for Querying XML Data with Relevance Ranking. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, p. 477. Springer, Heidelberg (2002)
Chapter Google Scholar
Theobald, M., Weikum, G., Schenkel, R.: Top-k Query Evaluation with Probabilistic Guarantees. In: VLDB 2004 (2004)
Google Scholar
Tijerino, Y.A., Embley, D., Lonsdale, D.W., Nagy, G.: Ontology Generation from Tables. In: WISE 2003 (2003)
Google Scholar
Voorhees, E.M.: Query Expansion Using Lexical-Semantic Relations. In: SIGIR 1994 (1994)
Google Scholar
Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query Clustering Using User Logs. ACM TOIS 20(1) (2002)
Google Scholar
Xu, L., Dai, C., Cai, W., Zhou, S., Zhou, A.: Towards Adaptive Probabilistic Search in Unstructured P2P Systems. In: Asia-Pacific Web Conference, APWeb (2004)
Google Scholar
Xue, G.-R., Zeng, H.-J., Chen, Z., Ma, W.-Y., Zhang, H.-J., Lu, C.-J.: Implicit Link Analysis for Small Web Search. In: SIGIR 2003 (2003)
Google Scholar
Yu, S., Cai, D., Wen, J.-R., Ma, W.-Y.: Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation. In: WWW Conference (2003)
Google Scholar
Zezula, P., Amato, G., Rabitti, F.: Processing XML Queries with Tree Signatures. In: Blanken, H., et al. (eds.) Intelligent Search on XML Data, Springer, Heidelberg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Max-Planck Institute of Computer Science, Saarbruecken, Germany
Gerhard Weikum, Jens Graupmann, Ralf Schenkel & Martin Theobald

Authors

Gerhard Weikum
View author publications
You can also search for this author in PubMed Google Scholar
Jens Graupmann
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Schenkel
View author publications
You can also search for this author in PubMed Google Scholar
Martin Theobald
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatica e Automazione, Università Roma Tre, Via Vasca Navale 79, 00146, Roma, Italy
Paolo Atzeni
Computer Science Department, University of California, 3731 Boelter Hall, 90095, Los Angeles, CA, USA
Wesley Chu
Department of Computer Science, Tsinghua University, 100084, Beijing, P.R. China
Hongjun Lu
Department of Computer Science and Engineering, Fudan University, 200433, China
Shuigeng Zhou
School of Computing, National University of Singapore,
Tok-Wang Ling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Weikum, G., Graupmann, J., Schenkel, R., Theobald, M. (2004). Towards a Statistically Semantic Web. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, TW. (eds) Conceptual Modeling – ER 2004. ER 2004. Lecture Notes in Computer Science, vol 3288. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30464-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-30464-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23723-5
Online ISBN: 978-3-540-30464-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics