Skip to main content

Choosing Values for Text Fields in Web Forms

  • Conference paper
Advances in Databases and Information Systems

Abstract

Since the only way to gain access to Hidden Web data is through form submission, one of the challenges is how to fill Web forms automatically. In this paper, we propose algorithms which address this challenge.We describe an efficient method to select good values for text fields and a technique which minimizes the number of form submissions and simultaneously maximizes the number of rows retrieved from the underlying database. Experiments using real Web forms show the advantages of our proposed approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Jiang, L., Wu, Z., Zheng, Q., Liu, J.: Learning Deep Web Crawling with Diverse Features. In: WI/IAT, pp. 572–575 (2009)

    Google Scholar 

  2. Kantorski, G., Moraes, T., Heuser, C.: Strategies for Automatically Filling Web Forms. Technical Report RP367-PPGC, Instituto de Informatica, UFRGS, Brazil (2012)

    Google Scholar 

  3. Madhavan, J., Ko, D., Kot, Ł., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s Deep Web Crawl. Proc. of the VLDB Endowment 1(2), 1241–1252 (2008)

    Google Scholar 

  4. Bergman, M.K.: The Deep Web: Surfacing hidden value. Journal of Electronic Publishing 7(1), 07–01 (2001)

    Google Scholar 

  5. He, B., Patel, M., Zhang, Z., Chang, K.C.C.: Accessing the Deep Web. Communications of the ACM 50(5), 94–101 (2007)

    Article  Google Scholar 

  6. Barbosa, L., Freire, J.: Siphoning Hidden-Web Data through keyword-based interfaces. In: SBBD, pp. 309–321 (2004)

    Google Scholar 

  7. Khare, R., An, Y., Song, I.Y.: Understanding deep web search interfaces: A survey. SIGMOD Rec. 39(1), 33–40 (2010)

    Article  Google Scholar 

  8. Liddle, S.W., Embley, D.W., Scott, D.T., Yau, S.H.: Extracting Data behind Web Forms. In: Olivé, À., Yoshikawa, M., Yu, E.S.K. (eds.) ER 2003. LNCS, vol. 2784, pp. 402–413. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Liu, B., Grossman, R., Zhai, Y.: Mining data records in Web pages. In: SIGKDD, pp. 601–606. ACM (2003)

    Google Scholar 

  10. Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: VLDB, pp. 129–138 (2001)

    Google Scholar 

  11. Toda, G.A., Cortez, E., da Silva, A.S., de Moura, E.: A probabilistic approach for automatically filling form-based web interfaces. Proc. of the VLDB Endowment 4(3), 151–160 (2010)

    Google Scholar 

  12. Wu, P., Wen, J.R., Liu, H., Ma, W.Y.: Query selection techniques for efficient crawling of structured web sources. In: ICDE, p. 47. IEEE (2006)

    Google Scholar 

  13. Ntoulas, A., Zerfos, P., Cho, J.: Downloading textual hidden web content through keyword queries. In: JCDL, pp. 100–109 (2005)

    Google Scholar 

  14. Cafarella, M.J., Madhavan, J., Halevy, A.: Web-scale extraction of structured data. SIGMOD Rec. 37(4), 55–61 (2009)

    Article  Google Scholar 

  15. Chang, K.C.C., He, B., Li, C., Patel, M., Zhang, Z.: Structured databases on the web: Observations and implications. SIGMOD Rec. 33(3), 61–70 (2004)

    Article  Google Scholar 

  16. Florescu, D., Levy, A., Mendelzon, A.: Database techniques for the World-Wide Web: a survey. SIGMOD Rec. 27(3), 59–74 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gustavo Zanini Kantorski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kantorski, G.Z., Moraes, T.G., Moreira, V.P., Heuser, C.A. (2013). Choosing Values for Text Fields in Web Forms. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32741-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32741-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32740-7

  • Online ISBN: 978-3-642-32741-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics