Efficient Top-k Data Sources Ranking for Query on Deep Web

Shen, Derong; Li, Meifang; Yu, Ge; Kou, Yue; Nie, Tiezheng

doi:10.1007/978-3-540-85481-4_25

Derong Shen¹,
Meifang Li¹,
Ge Yu¹,
Yue Kou¹ &
…
Tiezheng Nie¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5175))

Included in the following conference series:

International Conference on Web Information Systems Engineering

886 Accesses
2 Citations

Abstract

Efficient Query processing on deep web has been gaining great importance due to large amount of deep web data sources. Nevertheless, how to discover the most relevant data sources on deep web is still a challenging issue. Inspired by observations on deep web, the paper presents a novel top-k ranking strategy to rank relevant data sources according to user’s requirement. First, it applies an attribute based dominant pattern growth (ADP-growth) algorithm to mine the most dominant attributes, and then employs a top-k style ranking algorithm on those attributes to exploit the most relevant data sources with candidate pruning and early termination, which considers the probability of result merging. Further, it improves the algorithm by incorporating relevant attributes based searching strategy to find the data sources, which has been proved of higher efficiency. We have conducted extensive experiments on a real world dataset and demonstrated the efficiency and effectiveness of our approach.

This work is supported by the National Science Foundation (60673139, 60573090), the National High-Tech Development Program (2008AA01Z146).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chang, K.C.-C., He, B., Li, C., Patel, M., Zhang, Z.: Structured databases on the web: Observations and implications. SIGMOD Record 33(3), 61–70 (2004)
Article Google Scholar
BrightPlanet.com. The deep web: Surfacing hidden value, http://brightplanet.com/deepcontent/
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On Power-Law Relationships of the Internet Topology. In: Proc. of ACM SIGCOMM, pp. 251–262 (1999)
Google Scholar
He, B., Chang, K.C.-C.: Statistical schema matching across web query interfaces. In: Proc. of SIGMOD, pp. 217–228. ACM Press, New York (2003)
Google Scholar
He, B., Chang, K.C.-C., Han, J.: Discovering complex matchings across web query interfaces: A correlation mining approach. In: Proc. of SIGKDD, pp. 147–158. ACM Press, New York (2004)
Google Scholar
Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining Top-k frequent closed patterns without minimum support. In: Proc. of ICDM, pp. 211–218. IEEE Computer Society, New York (2002)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of ACM SIGMOD, pp. 1–12. ACM Press, New York (2000)
Chapter Google Scholar
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proc. of VLDB, pp. 648–659 (2004)
Google Scholar
Theobald, M., Schenkel, R., Weikum, G.: Efficient and Self-tuning Incremental Query Expansion for Top-k Query Processing. In: Proc. of SIGIR, pp. 242–249. ACM Press, New York (2005)
Google Scholar
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting Top-k Join Queries in Relational Databases. In: Proc. of VLDB, pp. 754–765 (2003)
Google Scholar
Kabra, G., Li, C.K., Chang, K.C.-C.: Query Routing: Finding Ways in the Maze of the Deep Web. In: Proc. of WIRI, pp. 64–73. IEEE Computer Society, New York (2005)
Google Scholar
UIUC Web Integration Repository, http://eagle.cs.uiuc.edu/metaquerier
Chang, K.C.-C., He, B., Zhang, Z.: Toward large scale integration: Building a metaquerier over databases on the web. In: Proc. of CIDR, pp. 44–55 (2005)
Google Scholar
He, H., Meng, W., Yu, C., Wu, Z.: Wise-integrator: An automatic integrator of web search interfaces fore-commerce. In: Proceedings of VLDB, pp. 357–368 (2003)
Google Scholar
Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: Proc. of VLDB, pp. 129–138 (2001)
Google Scholar
Barbosa, L., Freire, J.: Searching for Hidden-Web Databases. In: Proc. of WebDB (2005)
Google Scholar
Ipeirotis, P.G., Gravano, L.: Distributed search over the hidden-web: Hierarchical sampling and selection. In: Proc. of VLDB, pp. 394–405 (2002)
Google Scholar
Ipeirotis, P.G., Gravano, L., Sahami, M.: Probe, count, and classify: Categorizing hidden web databases. In: Proc. of ACM SIGMOD, pp. 67–78. ACM Press, New York (2001)
Google Scholar
Gravano, L., Ipeirotis, P., Sahami, M.: QProber, a system for automatic classification of hidden-web databases. ACM TOIS 21(1), 1–41 (2003)
Article Google Scholar
Caverlee, J., Liu, L., Rocco, D.: Discovering Interesting Relationships among Deep Web Databases: A Source-Biased Approach. World Wide Web 9, 585–622 (2006)
Article Google Scholar
Shu, L., Meng, W., He, H., Yu, C.: Querying Capability Modeling and Construction of Deep Web Sources. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, pp. 13–25. Springer, Heidelberg (2007)
Chapter Google Scholar
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66, 614–656 (2001)
Article MathSciNet Google Scholar
Chang, K.C.-C., Hwang, S.-W.: Minimal Probing: Supporting Expensive Predicates for Top-K Queries. In: Proc. of SIGMOD, pp. 346–357. ACM Press, New York (2002)
Google Scholar
Marian, A., Sihem, A.Y., Nick, K., Divesh, S.: Adaptive Processing of Top-k Queries in XML. In: Proc. of ICDE, pp. 162–173. IEEE Computer Society, New York (2005)
Google Scholar
Michel, S., Triantafillou, P., Weikum, G.: KLEE: A Framework for Distributed Top-k Query Algorithms. In: Proc. of VLDB, pp. 637–648. ACM Press, New York (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Northeastern University, Shenyang, China, 110004
Derong Shen, Meifang Li, Ge Yu, Yue Kou & Tiezheng Nie

Authors

Derong Shen
View author publications
You can also search for this author in PubMed Google Scholar
Meifang Li
View author publications
You can also search for this author in PubMed Google Scholar
Ge Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yue Kou
View author publications
You can also search for this author in PubMed Google Scholar
Tiezheng Nie
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

James Bailey David Maier Klaus-Dieter Schewe Bernhard Thalheim Xiaoyang Sean Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, D., Li, M., Yu, G., Kou, Y., Nie, T. (2008). Efficient Top-k Data Sources Ranking for Query on Deep Web. In: Bailey, J., Maier, D., Schewe, KD., Thalheim, B., Wang, X.S. (eds) Web Information Systems Engineering - WISE 2008. WISE 2008. Lecture Notes in Computer Science, vol 5175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85481-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-85481-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85480-7
Online ISBN: 978-3-540-85481-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics