Skip to main content

Web Information Integration Based on Compressed XML

  • Conference paper
Databases in Networked Information Systems (DNIS 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2822))

Included in the following conference series:

  • 285 Accesses

Abstract

Nowadays, information integration to web data sources and XML becomes a favorite information exchange format. New application motivates the problems that massive information is often transmitted in network and must be processed in limited buffer in mediator. To process query on massive data from web data source effectively, we present a method of XML compression based on edit distance for information transmission in information integration. By compressing XML, this method can reduce both the transmission time and buffer space. Two different strategies of XML compression for transmission and process in mediator are designed. Optimization of the combination of these strategies is discussed. We also propose the query execution algorithms on compressed XML data in buffer of mediator. We focus on main operators of data from wrapper in mediator, namely sort, union, join and aggregation. Implementation of these operators on compressed data using two different methods is described in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice Hall, Englewood Cliffs (2000)

    Google Scholar 

  2. Wirderhold, G.: Mediators in the Architecture of Future Information Systems. IEEE Computer 25, 38–49

    Google Scholar 

  3. Bray, T., Paoli, J., Sperberg-McQueen, C.M.: Extensible markup language (XML) 1.0. W3C Recommendation (February 1998), http://www.w3.org/TR/REXxml

  4. Christophides, V., Cluet, S., Simeon, J.: On Wrapping Query Languages and Efficient XML Integration. In: Proc. of ACM SIGMOD Conf. on Management of Data, Dallas, TX (May 2000)

    Google Scholar 

  5. Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: 27th International Conference on Very Large Data Bases, Rome, Italy (2001)

    Google Scholar 

  6. Wang, H., Li, J., He, Z.: An Effective Storage Strategy for Compressed XML Warehouse. In: Proc. of National Database Conference of China (2002)

    Google Scholar 

  7. Liefke, H., Suciu, D.: XMill: an ecient compressor for XML data. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (2000)

    Google Scholar 

  8. Cheney, J.: Compressing XML with Multiplexed Hierarchical Models. In: Proceedings of the 2001 IEEE Data Compression Conference, pp. 163–172 (2001)

    Google Scholar 

  9. Manolescu, D., Florescu, D.: Kossmann: Answering XML Queries over Heterogeneous Data Sources. In: SIGMOD2001 (2001)

    Google Scholar 

  10. Papakonstantinou, S., Abiteboul, H.: Garcia-Molina: Object Fusion in Mediator Systems. In: VLDB 1996 (1996)

    Google Scholar 

  11. Ives, Z., Halevy, A., Weld, D.: Integrating Network-Bound XML Data. Data Engineering Bulletin 24(2) (2001)

    Google Scholar 

  12. Xyleme, L.: A dynamic warehouse for XML data of the Web. IEEE Data Engineering Bulletin (2001)

    Google Scholar 

  13. Naughton, J., et al.: The Niagara Internet Query System. IEEE Data Engineering Bulletin (2001)

    Google Scholar 

  14. Kossmann, D.: The state of the art in distributed query processing. In: ACM Computing Surveys, vol. 32(4) (December 2000)

    Google Scholar 

  15. Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An Adaptive Query Execution System for Data Integration. In: Proceedings of the SIGMOD Conference, Philadelphia, Pennsylvania (1999)

    Google Scholar 

  16. Bouganim, L., Fabret, F., Valduriez, P., Mohan, C.: Dynamic Query Scheduling in Data Integration Systems. In: 16th International Conference on Data Engineering, San Diego, California, February 28 - March 03 (2000)

    Google Scholar 

  17. Marian, S., Abiteboul, G., Mignet, C.: Change-centric management of versions in an XML warehouse. In: VLDB (2001)

    Google Scholar 

  18. Silberschatz, P., Baer Galvin, G.: Gagne: Operating System Concepts, 6th edn. John Wiley & Sons, Incl., Chichester (2001)

    Google Scholar 

  19. Wang, H.: Research of Information Integration in distribute Environment. Thesis of Bachelor Degree of Harbin Institute of Technology (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, H., Li, J., He, Z., Luo, J. (2003). Web Information Integration Based on Compressed XML. In: Bianchi-Berthouze, N. (eds) Databases in Networked Information Systems. DNIS 2003. Lecture Notes in Computer Science, vol 2822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39845-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39845-5_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20111-3

  • Online ISBN: 978-3-540-39845-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics