Abstract
Current wrappers are unable to extract multiple sections data records from search engine results pages as sections usually have complicated layout and structure. Extracting data from search engine results pages is important for meta search engine applications and comparative shopping lists evaluation. In this paper, we present a novel data extraction technique which uses visual cue to check for the regularity of structure in multiple sections data records. Our findings show that though there are no regularity in structure for multiple sections data records, there is regularity in structure for multiple sections data records. Our technique is novel and can serve as a model for future multiple sections data extraction and it will be useful for meta search engine application, which needs an accurate tool to locate its source of information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Liu, B., Grossman, R., Zhai, Y.: Mining data records in Web. In: ACM SIGKDD, pp. 601–606 (2003)
Miao, G., Tatemura, J., Hsiung, W.-P., Sawires, A., Moser, L.E.: Extracting Data Records from the Web Using Tag Path Clustering. In: ACM WWW, pp. 981–990 (2009)
Zhao, H., Meng, W., Wu, Z., Raghavan, V., Yu, C.: Fully automatic wrapper generation for search engines. In: ACM WWW, pp. 66–75 (2005)
Zhao, H., Meng, W., Yu, C.: Automatic extraction of dynamic record sections from deep web. In: ACM VLDB (2006)
Hong, J.L., Siew, E., Egerton, S.: Information Extraction for Search Engines using Fast Heuristic Techniques. DKE 69(2), 169–196 (2010)
Hong, J.L.: Deep Web Data Extraction. In: IEEE SMC (2010)
Hong, J.L.: Data Extraction for Deep Web using WordNet. In: IEEE TSMC (2011)
Hong, J.L., Siew, E., Egerton, S.: WMS- Extracting Multiple Sections Data Records from Search Engine Results Pages. In: ACM SAC (2010)
Simon, K., Lausen, G.: ViPER: augmenting automatic information extraction with visual perceptions. In: ACM CIKM, pp. 381–388 (2005)
Liu, W., Meng, X., Meng, W.: ViDE: A Vision-based Approach for Deep Web Data Extraction. IEEE TKDE 22(3), 447–460 (2009)
Su, W., Wang, J., Lochovsky, F.H.: ODE: Ontology-assisted Data Extraction. ACM TODS 34(12) (2009)
Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: ACM WWW, pp. 76–85 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wong, D., Hong, J.L. (2012). Multiple Sections Extraction Using Visual Cue. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds) Neural Information Processing. ICONIP 2012. Lecture Notes in Computer Science, vol 7667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34500-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-34500-5_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34499-2
Online ISBN: 978-3-642-34500-5
eBook Packages: Computer ScienceComputer Science (R0)