WPPS: A Framework for Web Page Processing

  • Ruslan R. Fayzrakhmanov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7651)

Abstract

In this paper, we present WPPS, a new configurable Java-based framework for developing web page processing methods. The key innovations of WPPS are 1) a unified ontological model which describes the visual representation of web pages; 2) an API and abstractions which allow the application of both declarative and object-oriented mechanisms to develop new methods and approaches.

Keywords

web information extraction web page understanding ontological models object-oriented paradigm declarative approach 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fayzrakhmanov, R.R.: Information Extraction from Web Pages Based on Their Visual Representation. In: Harth, A., Koch, N. (eds.) ICWE 2011. LNCS, vol. 7059, pp. 342–346. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Hiremath, P.S., Algur, S.P.: Extraction of flat and nested data records from web pages. In: SivaKumar, K., Selvi, A. (eds.) IJCSE, vol. 2, pp. 36–45. SIPS Tech. (2010)Google Scholar
  3. 3.
    Krüpl-Sypien, B., Fayzrakhmanov, R.R., Holzinger, W., Panzenböck, M., Baumgartner, R.: A versatile model for web page representation, information extraction and content re-packaging. In: Proc. of DocEng 2011, pp. 129–138. ACM (2011)Google Scholar
  4. 4.
    Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: Proc. of WWW 2005, pp. 76–85 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ruslan R. Fayzrakhmanov
    • 1
  1. 1.Institute of Information SystemsTU ViennaViennaAustria

Personalised recommendations