Materialization of Web Data Sources

Bozzon, Alessandro; Ceri, Stefano; Zagorac, Srđan

doi:10.1007/978-3-642-34213-4_5

Materialization of Web Data Sources

Alessandro Bozzon¹⁸,
Stefano Ceri¹⁸ &
Srđan Zagorac¹⁸

Chapter

944 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7538))

Abstract

Recent years witnessed an exponential increase in the number of data services available on the Web. Many popular Web sites, including social networks, offer API for interacting with their information, and open data initiative such as the Linked Data project promise to achieve the vision of the Web of data. Unfortunately, access to Web data is typically limited by the constraints imposed by the query interface, and by technical limitations such as the network latency, or the number and frequency of allowed daily service invocations. Moreover, several sources may independently publish data about the same real-world objects; in such case, their combined use for assembling all available information about those objects requires duplicate removal, reconciliation and integration. This paper describes various data materialization problems, defining properties such as source coverage and data alignment of the materialized data, and then focuses on a specific problem, the reseeding of data access methods by using available information from previous calls in order to build a materialization of maximum size.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bozzon, A., Brambilla, M., Ceri, S., Fraternali, P.: Liquid query: multi-domain exploratory search on the web. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web (2010)
Google Scholar
Cafarella, M.J., Madhavan, J., Halevy, A.: Web-scale extraction of structured data. SIGMOD Rec. 37, 55–61 (2009)
Article Google Scholar
Cali, A., Calvanese, D., Martinenghi, D.: Dynamic Query Optimization under Access Limitations and Dependencies. J. UCS 15(1), 33–62 (2009)
MathSciNet MATH Google Scholar
Cambazoglu, B.B., Junqueira, F.P., Plachouras, V., Banachowski, S., Cui, B., Lim, S., Bridge, B.: A refreshing perspective of search engine caching. In: WWW 2010: Proceedings of the 19th International Conference on World Wide Web (2010)
Google Scholar
Bozzon, A., Brambilla, M., Ceri, S., Quarteroni, S.: A Framework for Integrating, Exploring, and Searching Location-Based Web Data. IEEE Internet Computing 15(6), 24–31 (2011)
Article Google Scholar
Dasgupta, A., Das, G., Mannila: A random walk approach to sampling hidden databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (2007)
Google Scholar
Gupta, A., Mumick, I.S. (eds.): Materialized views: techniques, implementations, and applications. MIT Press, Cambridge (1999)
Google Scholar
Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10, 270–294 (2001)
Article MATH Google Scholar
Halevy, A., Rajaraman, A., Ordille, J.: Data integration: The teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)
Google Scholar
Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s deep web crawl. Proc. VLDB Endowment 1(2), 1241–1252 (2008)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: 16th International World Wide Web Conference (WWW 2007), New York, NY, USA (2007)
Google Scholar
Wu, P., Wen, J.-R., Liu, H., Ma, W.-Y.: Query selection techniques for efficient crawling of structured web sources. In: International Conference on Data Engineering (2006)
Google Scholar
Zerfos, P., Cho, J., Ntoulas, A.: Downloading textual hidden web content through keyword queries. In: Joint Conference on Digital Libraries, pp. 100–109 (2005)
Google Scholar
Bozzon, A., Braga, D., Brambilla, M., Ceri, S., Corcoglioniti, F., Fraternali, P., Vadacca, S.: Search computing: multi-domain search on ranked data. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), pp. 1267–1270. ACM, New York (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
Alessandro Bozzon, Stefano Ceri & Srđan Zagorac

Authors

Alessandro Bozzon
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Ceri
View author publications
You can also search for this author in PubMed Google Scholar
Srđan Zagorac
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Elettronica e Informazione,, Politecnico di Milano, Via Ponzio, 34/5, 20133, Milan, Italy
Stefano Ceri
Dipartimento di Elettronica e Informazione, Politecnico di Milano, 20133, Milan, Italy
Marco Brambilla

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bozzon, A., Ceri, S., Zagorac, S. (2012). Materialization of Web Data Sources. In: Ceri, S., Brambilla, M. (eds) Search Computing. Lecture Notes in Computer Science, vol 7538. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34213-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-34213-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34212-7
Online ISBN: 978-3-642-34213-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics