Abstract
Efficient Query processing on deep web has been gaining great importance due to large amount of deep web data sources. Nevertheless, how to discover the most relevant data sources on deep web is still a challenging issue. Inspired by observations on deep web, the paper presents a novel top-k ranking strategy to rank relevant data sources according to user’s requirement. First, it applies an attribute based dominant pattern growth (ADP-growth) algorithm to mine the most dominant attributes, and then employs a top-k style ranking algorithm on those attributes to exploit the most relevant data sources with candidate pruning and early termination, which considers the probability of result merging. Further, it improves the algorithm by incorporating relevant attributes based searching strategy to find the data sources, which has been proved of higher efficiency. We have conducted extensive experiments on a real world dataset and demonstrated the efficiency and effectiveness of our approach.
This work is supported by the National Science Foundation (60673139, 60573090), the National High-Tech Development Program (2008AA01Z146).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chang, K.C.-C., He, B., Li, C., Patel, M., Zhang, Z.: Structured databases on the web: Observations and implications. SIGMOD Record 33(3), 61–70 (2004)
BrightPlanet.com. The deep web: Surfacing hidden value, http://brightplanet.com/deepcontent/
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On Power-Law Relationships of the Internet Topology. In: Proc. of ACM SIGCOMM, pp. 251–262 (1999)
He, B., Chang, K.C.-C.: Statistical schema matching across web query interfaces. In: Proc. of SIGMOD, pp. 217–228. ACM Press, New York (2003)
He, B., Chang, K.C.-C., Han, J.: Discovering complex matchings across web query interfaces: A correlation mining approach. In: Proc. of SIGKDD, pp. 147–158. ACM Press, New York (2004)
Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining Top-k frequent closed patterns without minimum support. In: Proc. of ICDM, pp. 211–218. IEEE Computer Society, New York (2002)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of ACM SIGMOD, pp. 1–12. ACM Press, New York (2000)
Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: Proc. of VLDB, pp. 648–659 (2004)
Theobald, M., Schenkel, R., Weikum, G.: Efficient and Self-tuning Incremental Query Expansion for Top-k Query Processing. In: Proc. of SIGIR, pp. 242–249. ACM Press, New York (2005)
Ilyas, I.F., Aref, W.G., Elmagarmid, A.K.: Supporting Top-k Join Queries in Relational Databases. In: Proc. of VLDB, pp. 754–765 (2003)
Kabra, G., Li, C.K., Chang, K.C.-C.: Query Routing: Finding Ways in the Maze of the Deep Web. In: Proc. of WIRI, pp. 64–73. IEEE Computer Society, New York (2005)
UIUC Web Integration Repository, http://eagle.cs.uiuc.edu/metaquerier
Chang, K.C.-C., He, B., Zhang, Z.: Toward large scale integration: Building a metaquerier over databases on the web. In: Proc. of CIDR, pp. 44–55 (2005)
He, H., Meng, W., Yu, C., Wu, Z.: Wise-integrator: An automatic integrator of web search interfaces fore-commerce. In: Proceedings of VLDB, pp. 357–368 (2003)
Raghavan, S., Garcia-Molina, H.: Crawling the hidden web. In: Proc. of VLDB, pp. 129–138 (2001)
Barbosa, L., Freire, J.: Searching for Hidden-Web Databases. In: Proc. of WebDB (2005)
Ipeirotis, P.G., Gravano, L.: Distributed search over the hidden-web: Hierarchical sampling and selection. In: Proc. of VLDB, pp. 394–405 (2002)
Ipeirotis, P.G., Gravano, L., Sahami, M.: Probe, count, and classify: Categorizing hidden web databases. In: Proc. of ACM SIGMOD, pp. 67–78. ACM Press, New York (2001)
Gravano, L., Ipeirotis, P., Sahami, M.: QProber, a system for automatic classification of hidden-web databases. ACM TOIS 21(1), 1–41 (2003)
Caverlee, J., Liu, L., Rocco, D.: Discovering Interesting Relationships among Deep Web Databases: A Source-Biased Approach. World Wide Web 9, 585–622 (2006)
Shu, L., Meng, W., He, H., Yu, C.: Querying Capability Modeling and Construction of Deep Web Sources. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds.) WISE 2007. LNCS, vol. 4831, pp. 13–25. Springer, Heidelberg (2007)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66, 614–656 (2001)
Chang, K.C.-C., Hwang, S.-W.: Minimal Probing: Supporting Expensive Predicates for Top-K Queries. In: Proc. of SIGMOD, pp. 346–357. ACM Press, New York (2002)
Marian, A., Sihem, A.Y., Nick, K., Divesh, S.: Adaptive Processing of Top-k Queries in XML. In: Proc. of ICDE, pp. 162–173. IEEE Computer Society, New York (2005)
Michel, S., Triantafillou, P., Weikum, G.: KLEE: A Framework for Distributed Top-k Query Algorithms. In: Proc. of VLDB, pp. 637–648. ACM Press, New York (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shen, D., Li, M., Yu, G., Kou, Y., Nie, T. (2008). Efficient Top-k Data Sources Ranking for Query on Deep Web. In: Bailey, J., Maier, D., Schewe, KD., Thalheim, B., Wang, X.S. (eds) Web Information Systems Engineering - WISE 2008. WISE 2008. Lecture Notes in Computer Science, vol 5175. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85481-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-85481-4_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85480-7
Online ISBN: 978-3-540-85481-4
eBook Packages: Computer ScienceComputer Science (R0)