Abstract
Web databases provide different types of query interfaces to access the data records stored in the backend databases. While most existing works exploit a complex query interface with multiple input fields to perform schema identification of the Web databases, little attention has been paid on how to identify the schema of web databases by simple query interface (SQI), which has only one single query text input field. This paper proposes a new method of instance-based query probing to identify WDBs’ interface and result schema for SQI. The interface schema identification problem is defined as generating the fullcondition query of SQI and a novel query probing strategy is proposed. The result schema is also identified based on the result webpages of SQI’s full-condition query, and an extended identification of the non-query attributes is proposed to improve the attribute recall rate. Experimental results on web databases of online shopping for book, movie and mobile phone show that our method is effective and efficient.
This work is supported by the National Natural Science Foundation of China (Grant No. 60833003).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ambite, J.-L., Gazen, B., Knoblock, C.A., Lerman, K., Russ, T.: Discovering and learning semantic models of online sources for information integration. In: IJCAI Workshop on Information Integration on the Web, Pasadena, CA (2009)
Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: ICDE, pp. 698 (2003)
Carman, M.J., Knoblock, C.A.: Learning semantic definitions of online information sources. J. Artif. Intell. Res (JAIR) 30, 1–50 (2007)
MySQL Conference Expo. Mysql conference expo. sphinx: High performance full text search for mysql
He, B., Chang, K.C.-C.: Statistical schema matching across web query interfaces. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 217–228. ACM, New York (2003)
He, H., Meng, W., Lu, Y., Yu, C., Wu, Z., Meng, P.W.: Towards deeper understanding of the search interfaces of the deep web. In: World Wide Web (2007)
He, H., Meng, W., Yu, C.T., Wu, Z.: Wise-integrator: An automatic integrator of web search interfaces for e-commerce. In: VLDB 2003: Proceedings of the 29th international conference on Very large data bases, pp. 357–368 (2003)
Ipeirotis, P., Gravano, L., Sahami, M.: Probe, count and classify: categorizing hidden web databases. SIGMOD Rec. 30(2), 67–78 (2001)
Lin, L., Zhou, L.: Leveraging webpage classification for data object recognition. In: Web Intelligence, pp. 667–670 (2007)
MicroSoft. Querying sql server using full-text search
Ru, Y., Horowitz, E.: Indexing the invisible web: a survey. Online Information Review 29(3), 249–265 (2005)
Wang, J., Lochovsky, F.H.: Data extraction and label assignment for web databases. In: WWW, pp. 187–196 (2003)
Wang, J., Wen, J.-R., Lochovsky, F.H., Ma, W.-Y.: Instance-based schema matching for web databases by domain-specific query probing. In: VLDB 2004: Proceedings of the 30th international conference on Very large data bases, VLDB, pp. 408–419 (2004)
Wang, W., Meng, W., Yu, C.T.: Concept hierarchical based text database categorization in a metasearch engine environment. In: WISE 2000, Proceedings of the First International Conference on Web Information Systems Engineering, pp. 283–290 (2000)
Wu, W., Doan, A., Yu, C.: Webiq: Learning from the web to match deep-web query interfaces. In: International Conference on Data Engineering, p. 44 (2006)
Wu, W., Yu, C.T., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep web. In: SIGMOD Conference, pp. 95–106 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, L., Zhou, L. (2010). Web Database Schema Identification through Simple Query Interface. In: Lacroix, Z. (eds) Resource Discovery. RED 2009. Lecture Notes in Computer Science, vol 6162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14415-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-14415-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14414-1
Online ISBN: 978-3-642-14415-8
eBook Packages: Computer ScienceComputer Science (R0)