Web Database Schema Identification through Simple Query Interface

Lin, Ling; Zhou, Lizhu

doi:10.1007/978-3-642-14415-8_2

Ling Lin¹⁷ &
Lizhu Zhou¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6162))

Included in the following conference series:

International Workshop on Resource Discovery

292 Accesses
1 Citations

Abstract

Web databases provide different types of query interfaces to access the data records stored in the backend databases. While most existing works exploit a complex query interface with multiple input fields to perform schema identification of the Web databases, little attention has been paid on how to identify the schema of web databases by simple query interface (SQI), which has only one single query text input field. This paper proposes a new method of instance-based query probing to identify WDBs’ interface and result schema for SQI. The interface schema identification problem is defined as generating the fullcondition query of SQI and a novel query probing strategy is proposed. The result schema is also identified based on the result webpages of SQI’s full-condition query, and an extended identification of the non-query attributes is proposed to improve the attribute recall rate. Experimental results on web databases of online shopping for book, movie and mobile phone show that our method is effective and efficient.

This work is supported by the National Natural Science Foundation of China (Grant No. 60833003).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ambite, J.-L., Gazen, B., Knoblock, C.A., Lerman, K., Russ, T.: Discovering and learning semantic models of online sources for information integration. In: IJCAI Workshop on Information Integration on the Web, Pasadena, CA (2009)
Google Scholar
Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: ICDE, pp. 698 (2003)
Google Scholar
Carman, M.J., Knoblock, C.A.: Learning semantic definitions of online information sources. J. Artif. Intell. Res (JAIR) 30, 1–50 (2007)
MATH Google Scholar
MySQL Conference Expo. Mysql conference expo. sphinx: High performance full text search for mysql
Google Scholar
He, B., Chang, K.C.-C.: Statistical schema matching across web query interfaces. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 217–228. ACM, New York (2003)
Chapter Google Scholar
He, H., Meng, W., Lu, Y., Yu, C., Wu, Z., Meng, P.W.: Towards deeper understanding of the search interfaces of the deep web. In: World Wide Web (2007)
Google Scholar
He, H., Meng, W., Yu, C.T., Wu, Z.: Wise-integrator: An automatic integrator of web search interfaces for e-commerce. In: VLDB 2003: Proceedings of the 29th international conference on Very large data bases, pp. 357–368 (2003)
Google Scholar
Ipeirotis, P., Gravano, L., Sahami, M.: Probe, count and classify: categorizing hidden web databases. SIGMOD Rec. 30(2), 67–78 (2001)
Article Google Scholar
Lin, L., Zhou, L.: Leveraging webpage classification for data object recognition. In: Web Intelligence, pp. 667–670 (2007)
Google Scholar
MicroSoft. Querying sql server using full-text search
Google Scholar
Ru, Y., Horowitz, E.: Indexing the invisible web: a survey. Online Information Review 29(3), 249–265 (2005)
Article Google Scholar
Wang, J., Lochovsky, F.H.: Data extraction and label assignment for web databases. In: WWW, pp. 187–196 (2003)
Google Scholar
Wang, J., Wen, J.-R., Lochovsky, F.H., Ma, W.-Y.: Instance-based schema matching for web databases by domain-specific query probing. In: VLDB 2004: Proceedings of the 30th international conference on Very large data bases, VLDB, pp. 408–419 (2004)
Google Scholar
Wang, W., Meng, W., Yu, C.T.: Concept hierarchical based text database categorization in a metasearch engine environment. In: WISE 2000, Proceedings of the First International Conference on Web Information Systems Engineering, pp. 283–290 (2000)
Google Scholar
Wu, W., Doan, A., Yu, C.: Webiq: Learning from the web to match deep-web query interfaces. In: International Conference on Data Engineering, p. 44 (2006)
Google Scholar
Wu, W., Yu, C.T., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep web. In: SIGMOD Conference, pp. 95–106 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Ling Lin & Lizhu Zhou

Authors

Ling Lin
View author publications
You can also search for this author in PubMed Google Scholar
Lizhu Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Scientific Data Management Laboratory, Arizona State University, 85287-5706, Tempe, AZ, USA
Zoé Lacroix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, L., Zhou, L. (2010). Web Database Schema Identification through Simple Query Interface. In: Lacroix, Z. (eds) Resource Discovery. RED 2009. Lecture Notes in Computer Science, vol 6162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14415-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-14415-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14414-1
Online ISBN: 978-3-642-14415-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics