Choosing Values for Text Fields in Web Forms

  • Gustavo Zanini Kantorski
  • Tiago Guimaraes Moraes
  • Viviane Pereira Moreira
  • Carlos Alberto Heuser
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 186)

Abstract

Since the only way to gain access to Hidden Web data is through form submission, one of the challenges is how to fill Web forms automatically. In this paper, we propose algorithms which address this challenge.We describe an efficient method to select good values for text fields and a technique which minimizes the number of form submissions and simultaneously maximizes the number of rows retrieved from the underlying database. Experiments using real Web forms show the advantages of our proposed approaches.

Keywords

Crawling Deep Web Filling Forms HiddenWeb 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jiang, L., Wu, Z., Zheng, Q., Liu, J.: Learning Deep Web Crawling with Diverse Features. In: WI/IAT, pp. 572–575 (2009)Google Scholar
  2. 2.
    Kantorski, G., Moraes, T., Heuser, C.: Strategies for Automatically Filling Web Forms. Technical Report RP367-PPGC, Instituto de Informatica, UFRGS, Brazil (2012)Google Scholar
  3. 3.
    Madhavan, J., Ko, D., Kot, Ł., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s Deep Web Crawl. Proc. of the VLDB Endowment 1(2), 1241–1252 (2008)Google Scholar
  4. 4.
    Bergman, M.K.: The Deep Web: Surfacing hidden value. Journal of Electronic Publishing 7(1), 07–01 (2001)Google Scholar
  5. 5.
    He, B., Patel, M., Zhang, Z., Chang, K.C.C.: Accessing the Deep Web. Communications of the ACM 50(5), 94–101 (2007)CrossRefGoogle Scholar
  6. 6.
    Barbosa, L., Freire, J.: Siphoning Hidden-Web Data through keyword-based interfaces. In: SBBD, pp. 309–321 (2004)Google Scholar
  7. 7.
    Khare, R., An, Y., Song, I.Y.: Understanding deep web search interfaces: A survey. SIGMOD Rec. 39(1), 33–40 (2010)CrossRefGoogle Scholar
  8. 8.
    Liddle, S.W., Embley, D.W., Scott, D.T., Yau, S.H.: Extracting Data behind Web Forms. In: Olivé, À., Yoshikawa, M., Yu, E.S.K. (eds.) ER 2003. LNCS, vol. 2784, pp. 402–413. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Liu, B., Grossman, R., Zhai, Y.: Mining data records in Web pages. In: SIGKDD, pp. 601–606. ACM (2003)Google Scholar
  10. 10.
    Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: VLDB, pp. 129–138 (2001)Google Scholar
  11. 11.
    Toda, G.A., Cortez, E., da Silva, A.S., de Moura, E.: A probabilistic approach for automatically filling form-based web interfaces. Proc. of the VLDB Endowment 4(3), 151–160 (2010)Google Scholar
  12. 12.
    Wu, P., Wen, J.R., Liu, H., Ma, W.Y.: Query selection techniques for efficient crawling of structured web sources. In: ICDE, p. 47. IEEE (2006)Google Scholar
  13. 13.
    Ntoulas, A., Zerfos, P., Cho, J.: Downloading textual hidden web content through keyword queries. In: JCDL, pp. 100–109 (2005)Google Scholar
  14. 14.
    Cafarella, M.J., Madhavan, J., Halevy, A.: Web-scale extraction of structured data. SIGMOD Rec. 37(4), 55–61 (2009)CrossRefGoogle Scholar
  15. 15.
    Chang, K.C.C., He, B., Li, C., Patel, M., Zhang, Z.: Structured databases on the web: Observations and implications. SIGMOD Rec. 33(3), 61–70 (2004)CrossRefGoogle Scholar
  16. 16.
    Florescu, D., Levy, A., Mendelzon, A.: Database techniques for the World-Wide Web: a survey. SIGMOD Rec. 27(3), 59–74 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Gustavo Zanini Kantorski
    • 1
  • Tiago Guimaraes Moraes
    • 1
  • Viviane Pereira Moreira
    • 1
  • Carlos Alberto Heuser
    • 1
  1. 1.UFRGSPorto AlegreBrazil

Personalised recommendations