Skip to main content

Materialization of Web Data Sources

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7538))

Abstract

Recent years witnessed an exponential increase in the number of data services available on the Web. Many popular Web sites, including social networks, offer API for interacting with their information, and open data initiative such as the Linked Data project promise to achieve the vision of the Web of data. Unfortunately, access to Web data is typically limited by the constraints imposed by the query interface, and by technical limitations such as the network latency, or the number and frequency of allowed daily service invocations. Moreover, several sources may independently publish data about the same real-world objects; in such case, their combined use for assembling all available information about those objects requires duplicate removal, reconciliation and integration. This paper describes various data materialization problems, defining properties such as source coverage and data alignment of the materialized data, and then focuses on a specific problem, the reseeding of data access methods by using available information from previous calls in order to build a materialization of maximum size.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bozzon, A., Brambilla, M., Ceri, S., Fraternali, P.: Liquid query: multi-domain exploratory search on the web. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web (2010)

    Google Scholar 

  2. Cafarella, M.J., Madhavan, J., Halevy, A.: Web-scale extraction of structured data. SIGMOD Rec. 37, 55–61 (2009)

    Article  Google Scholar 

  3. Cali, A., Calvanese, D., Martinenghi, D.: Dynamic Query Optimization under Access Limitations and Dependencies. J. UCS 15(1), 33–62 (2009)

    MathSciNet  MATH  Google Scholar 

  4. Cambazoglu, B.B., Junqueira, F.P., Plachouras, V., Banachowski, S., Cui, B., Lim, S., Bridge, B.: A refreshing perspective of search engine caching. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web (2010)

    Google Scholar 

  5. Bozzon, A., Brambilla, M., Ceri, S., Quarteroni, S.: A Framework for Integrating, Exploring, and Searching Location-Based Web Data. IEEE Internet Computing 15(6), 24–31 (2011)

    Article  Google Scholar 

  6. Dasgupta, A., Das, G., Mannila: A random walk approach to sampling hidden databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (2007)

    Google Scholar 

  7. Gupta, A., Mumick, I.S. (eds.): Materialized views: techniques, implementations, and applications. MIT Press, Cambridge (1999)

    Google Scholar 

  8. Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10, 270–294 (2001)

    Article  MATH  Google Scholar 

  9. Halevy, A., Rajaraman, A., Ordille, J.: Data integration: The teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)

    Google Scholar 

  10. Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s deep web crawl. Proc. VLDB Endowment 1(2), 1241–1252 (2008)

    Google Scholar 

  11. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: 16th International World Wide Web Conference (WWW 2007), New York, NY, USA (2007)

    Google Scholar 

  12. Wu, P., Wen, J.-R., Liu, H., Ma, W.-Y.: Query selection techniques for efficient crawling of structured web sources. In: International Conference on Data Engineering (2006)

    Google Scholar 

  13. Zerfos, P., Cho, J., Ntoulas, A.: Downloading textual hidden web content through keyword queries. In: Joint Conference on Digital Libraries, pp. 100–109 (2005)

    Google Scholar 

  14. Bozzon, A., Braga, D., Brambilla, M., Ceri, S., Corcoglioniti, F., Fraternali, P., Vadacca, S.: Search computing: multi-domain search on ranked data. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), pp. 1267–1270. ACM, New York (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bozzon, A., Ceri, S., Zagorac, S. (2012). Materialization of Web Data Sources. In: Ceri, S., Brambilla, M. (eds) Search Computing. Lecture Notes in Computer Science, vol 7538. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34213-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34213-4_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34212-7

  • Online ISBN: 978-3-642-34213-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics