Skip to main content
Log in

DIASPORA: A highly distributed web-query processing system

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Current proposals for web querying systems have assumed a centralized processing architecture wherein data is shipped from the remote sites to the user's site. We present here the design and implementation of DIASPORA, a highly distributed query processing system for the web. It is based on the premise that several web applications are more naturally processed in a distributed manner, opening up possibilities of significant reductions in network traffic and user response times. DIASPORA is built over an expressive graph-based data model that utilizes simple heuristics and lends itself to automatic generation. The model captures both the content of web documents and the hyperlink structural framework of a web site. Distributed queries on the model are expressed through a declarative language that permits users to explicitly specify navigation. DIASPORA implements a query-shipping model wherein queries are autonomously forwarded from one web-site to another, without requiring much coordination from the query originating site. Its design addresses a variety of interesting issues that arise in the distributed web context including determining query completion, handling query rewriting, supporting query termination and preventing multiple computations of a query at a site due to the same query arriving through different paths in the hyperlink framework. The DIASPORA system is currently operational and is undergoing testing on our campus network. In this paper we describe the design of the system and report initial performance results that indicate significant performance improvements over comparable centralized approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abiteboul, S. (1997), “Querying Semi-Structured Data,” In Proceedings of the International Conference on Database Theory.

  • Abiteboul, S., D. Quass, J. McHugh, J. Widom, and J. Weiner (1997), “The Lorel Query Language for Semistructured Data,” Journal of Digital Libraries 1, 1.

    MATH  Google Scholar 

  • Abiteboul, S. and V. Vianu (1997), “Regular Path Queries with Constraints,” In Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

  • Adelberg, B. (1998), “NoDoSE: A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents,” In Proceedings of the ACM SIGMOD Conference on Management of Data.

  • Arocena, G. and A. Mendelzon (1998), “WebOQL: Restructuring Documents, Databases and Webs,” In Proceedings of the 14th International Conference on Data Engineering.

  • Ashish, N. and C. Knoblock (1997), “Wrapper Generation for Semistructured Internet Sources,” SIGMOD Record 26, 4.

    Article  Google Scholar 

  • Atzeni, P., G. Mecca, and P. Merialdo (1997), “To Weave the Web,” In Proceedings of the 23rd Very Large Data Bases Conference.

  • Bhowmick, S., S. Madria, W.-K. Ng, and E.-P. Lim (2000), “Detecting and Representing Relevant Web Deltas Using Web Join,” In Proceedings of the 20th International Conference on Distributed Computing Systems.

  • Bhowmick, S., S.K. Madria, W.-K. Ng, and E.-P. Lim (1998), “Web Warehousing System: Design and Issues,” In Proceedings of the International Workshop on Data Warehousing and Data Mining.

  • Buneman, P., S. Davidson, G. Hillebrand, and D. Suciu (1996), “A Query Language and Optimization Techniques for Unstructured Data,” In Proceedings of the ACM SIGMOD Conference on Management of Data.

  • Deutsch, A., M. Fernandez, and D. Suciu (1999), “Storing SEMISTRUCTURED Data with STORED,” In Proceedings of the ACM SIGMOD Conference on Management of Data.

  • Fernandez, M., D. Florescu, J. Kang, A. Levy, and D. Suciu (1998), “Experiences with a Web-Site Management System,” In Proceedings of the ACM SIGMOD Conference on Management of Data.

  • Florescu, D., A. Levy, and A. Mendelzon (1998), “Database Techniques for the World Wide Web: A Survey,” SIGMOD Record 27, 3.

    Article  Google Scholar 

  • Fujimoto, R. (1990), “Parallel Discrete-Event Simulation,” Communications of the ACM 33, 10.

    Article  Google Scholar 

  • Garcia-Molina, H., J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom (1995), “Integrating and Accessing Heterogeneous Information Sources in TSIMMIS,” In Proceedings of the AAAI Symposium on Information Gathering.

  • Grumbach, S. and G. Mecca (1999), “In Search of the Lost Schema,” In Proceedings of International Conference on Database Theory.

  • J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo (1997), “Extracting Semistructured Information from theWeb,” In Proceedings of the Workshop on Management of Semistructured Data.

  • Katoh, K., A. Morishima, and H. Kitagawa (1998), “Navigator-based Query Processing in the World Wide Web Wrapper,” In Proceedings of the 5th International Conference of Foundations of Data Organization.

  • Konopnicki, D. and O. Shmueli (1995), “W3QS: A Query System for the World-Wide Web,” In Proceedings of the 21st Very Large Data Bases Conference.

  • Lakshmanan, L., F. Sadri, and I. Subramanian (1996), “A Declarative Language for Querying and Restructuring the Web,” In Proceedings of the 6th International Workshop on Research Issues in Data Engineering.

  • Litzkow, M., M. Livny, and M.W. Mutka (1988), “Condor – A Hunter of Idle Workstations,” In Proceedings of the 8th International Conference of Distributed Computing Systems.

  • Mendelzon, A., G. Mihaila, and T. Milo (1997), “Querying the World Wide Web,” Journal of Digital Libraries 1, 1.

    MATH  Google Scholar 

  • Milijicic, D., W. LaForge, and D. Chauhan (1998), “Mobile Objects and Agents (MOA),” In Proceedings of the USENIX Conference on Objectoriented Technologies and Systems.

  • Nguyen, T. and V. Srinivasan (1996), “Accessing Relational Databases from the World Wide Web,” In Proceedings of the ACM SIGMOD Conference on Management of Data.

  • Raggett, D. (1997), “HTML 3.2 Reference Specification,” http://www.w3.org/TR/REC-html32.html.

  • Ramanath, M. (2000), “DIASPORA: A Highly Distributed Web-Query Processing System,” Master's thesis, Indian Institute of Science.

  • Shanmugasundaram, J., H. Gang, K. Tufte, C. Zhang, D.J. DeWitt, and J.F. Naughton (1999), “Relational Databases for Querying XML Documents: Limitations and Opportunities,” In Proceedings of the 25th Very Large Data Bases Conference.

  • suciu/strudel/external/files/F662777668.ps.

  • XML (1998), “Extensible Markup Language (XML) 1.0,” http://www.w3.org/XML.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramanath, M., Haritsa, J.R. DIASPORA: A highly distributed web-query processing system. World Wide Web 3, 111–124 (2000). https://doi.org/10.1023/A:1019233713818

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1019233713818

Keywords

Navigation