Abstract
Since the only way to gain access to Hidden Web data is through form submission, one of the challenges is how to fill Web forms automatically. In this paper, we propose algorithms which address this challenge.We describe an efficient method to select good values for text fields and a technique which minimizes the number of form submissions and simultaneously maximizes the number of rows retrieved from the underlying database. Experiments using real Web forms show the advantages of our proposed approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Jiang, L., Wu, Z., Zheng, Q., Liu, J.: Learning Deep Web Crawling with Diverse Features. In: WI/IAT, pp. 572–575 (2009)
Kantorski, G., Moraes, T., Heuser, C.: Strategies for Automatically Filling Web Forms. Technical Report RP367-PPGC, Instituto de Informatica, UFRGS, Brazil (2012)
Madhavan, J., Ko, D., Kot, Ł., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s Deep Web Crawl. Proc. of the VLDB Endowment 1(2), 1241–1252 (2008)
Bergman, M.K.: The Deep Web: Surfacing hidden value. Journal of Electronic Publishing 7(1), 07–01 (2001)
He, B., Patel, M., Zhang, Z., Chang, K.C.C.: Accessing the Deep Web. Communications of the ACM 50(5), 94–101 (2007)
Barbosa, L., Freire, J.: Siphoning Hidden-Web Data through keyword-based interfaces. In: SBBD, pp. 309–321 (2004)
Khare, R., An, Y., Song, I.Y.: Understanding deep web search interfaces: A survey. SIGMOD Rec. 39(1), 33–40 (2010)
Liddle, S.W., Embley, D.W., Scott, D.T., Yau, S.H.: Extracting Data behind Web Forms. In: Olivé, À., Yoshikawa, M., Yu, E.S.K. (eds.) ER 2003. LNCS, vol. 2784, pp. 402–413. Springer, Heidelberg (2003)
Liu, B., Grossman, R., Zhai, Y.: Mining data records in Web pages. In: SIGKDD, pp. 601–606. ACM (2003)
Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: VLDB, pp. 129–138 (2001)
Toda, G.A., Cortez, E., da Silva, A.S., de Moura, E.: A probabilistic approach for automatically filling form-based web interfaces. Proc. of the VLDB Endowment 4(3), 151–160 (2010)
Wu, P., Wen, J.R., Liu, H., Ma, W.Y.: Query selection techniques for efficient crawling of structured web sources. In: ICDE, p. 47. IEEE (2006)
Ntoulas, A., Zerfos, P., Cho, J.: Downloading textual hidden web content through keyword queries. In: JCDL, pp. 100–109 (2005)
Cafarella, M.J., Madhavan, J., Halevy, A.: Web-scale extraction of structured data. SIGMOD Rec. 37(4), 55–61 (2009)
Chang, K.C.C., He, B., Li, C., Patel, M., Zhang, Z.: Structured databases on the web: Observations and implications. SIGMOD Rec. 33(3), 61–70 (2004)
Florescu, D., Levy, A., Mendelzon, A.: Database techniques for the World-Wide Web: a survey. SIGMOD Rec. 27(3), 59–74 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kantorski, G.Z., Moraes, T.G., Moreira, V.P., Heuser, C.A. (2013). Choosing Values for Text Fields in Web Forms. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32741-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-32741-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32740-7
Online ISBN: 978-3-642-32741-4
eBook Packages: EngineeringEngineering (R0)