Skip to main content

Web Database Schema Identification through Simple Query Interface

  • Conference paper
Resource Discovery (RED 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6162))

Included in the following conference series:

Abstract

Web databases provide different types of query interfaces to access the data records stored in the backend databases. While most existing works exploit a complex query interface with multiple input fields to perform schema identification of the Web databases, little attention has been paid on how to identify the schema of web databases by simple query interface (SQI), which has only one single query text input field. This paper proposes a new method of instance-based query probing to identify WDBs’ interface and result schema for SQI. The interface schema identification problem is defined as generating the fullcondition query of SQI and a novel query probing strategy is proposed. The result schema is also identified based on the result webpages of SQI’s full-condition query, and an extended identification of the non-query attributes is proposed to improve the attribute recall rate. Experimental results on web databases of online shopping for book, movie and mobile phone show that our method is effective and efficient.

This work is supported by the National Natural Science Foundation of China (Grant No. 60833003).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ambite, J.-L., Gazen, B., Knoblock, C.A., Lerman, K., Russ, T.: Discovering and learning semantic models of online sources for information integration. In: IJCAI Workshop on Information Integration on the Web, Pasadena, CA (2009)

    Google Scholar 

  2. Arasu, A., Garcia-Molina, H.: Extracting structured data from web pages. In: ICDE, pp. 698 (2003)

    Google Scholar 

  3. Carman, M.J., Knoblock, C.A.: Learning semantic definitions of online information sources. J. Artif. Intell. Res (JAIR) 30, 1–50 (2007)

    MATH  Google Scholar 

  4. MySQL Conference Expo. Mysql conference expo. sphinx: High performance full text search for mysql

    Google Scholar 

  5. He, B., Chang, K.C.-C.: Statistical schema matching across web query interfaces. In: SIGMOD 2003: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 217–228. ACM, New York (2003)

    Chapter  Google Scholar 

  6. He, H., Meng, W., Lu, Y., Yu, C., Wu, Z., Meng, P.W.: Towards deeper understanding of the search interfaces of the deep web. In: World Wide Web (2007)

    Google Scholar 

  7. He, H., Meng, W., Yu, C.T., Wu, Z.: Wise-integrator: An automatic integrator of web search interfaces for e-commerce. In: VLDB 2003: Proceedings of the 29th international conference on Very large data bases, pp. 357–368 (2003)

    Google Scholar 

  8. Ipeirotis, P., Gravano, L., Sahami, M.: Probe, count and classify: categorizing hidden web databases. SIGMOD Rec. 30(2), 67–78 (2001)

    Article  Google Scholar 

  9. Lin, L., Zhou, L.: Leveraging webpage classification for data object recognition. In: Web Intelligence, pp. 667–670 (2007)

    Google Scholar 

  10. MicroSoft. Querying sql server using full-text search

    Google Scholar 

  11. Ru, Y., Horowitz, E.: Indexing the invisible web: a survey. Online Information Review 29(3), 249–265 (2005)

    Article  Google Scholar 

  12. Wang, J., Lochovsky, F.H.: Data extraction and label assignment for web databases. In: WWW, pp. 187–196 (2003)

    Google Scholar 

  13. Wang, J., Wen, J.-R., Lochovsky, F.H., Ma, W.-Y.: Instance-based schema matching for web databases by domain-specific query probing. In: VLDB 2004: Proceedings of the 30th international conference on Very large data bases, VLDB, pp. 408–419 (2004)

    Google Scholar 

  14. Wang, W., Meng, W., Yu, C.T.: Concept hierarchical based text database categorization in a metasearch engine environment. In: WISE 2000, Proceedings of the First International Conference on Web Information Systems Engineering, pp. 283–290 (2000)

    Google Scholar 

  15. Wu, W., Doan, A., Yu, C.: Webiq: Learning from the web to match deep-web query interfaces. In: International Conference on Data Engineering, p. 44 (2006)

    Google Scholar 

  16. Wu, W., Yu, C.T., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep web. In: SIGMOD Conference, pp. 95–106 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, L., Zhou, L. (2010). Web Database Schema Identification through Simple Query Interface. In: Lacroix, Z. (eds) Resource Discovery. RED 2009. Lecture Notes in Computer Science, vol 6162. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14415-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14415-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14414-1

  • Online ISBN: 978-3-642-14415-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics