Rule-Based Structural Analysis of Web Pages

  • Fabio Vitali
  • Angelo Di Iorio
  • Elisa Ventura Campori
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)


Structural analysis of web pages has been proposed several times and for a number of reasons and purposes, such as the re-flowing of standard web pages to fit a smaller PDA screen. elISA is a rule-based system for the analysis of regularities and structures within web pages that is used for a fairly different task, the determination of editable text blocks within standard web pages, as needed by the IsaWiki collaborative editing environment. The elISA analysis engine is implemented as a XSLT meta-stylesheet that applied to a rule set generates an XSLT stylesheet that, in turn, applied to the original HTML document generates the requested analysis.


Content Area Link Element XPath Expression Declarative Rule Editable Region 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
  2. 2.
    Annozilla (Annotea on Mozilla),
  3. 3.
    Blood, R.: Weblogs: a history and perspective (2000),
  4. 4.
    Bieber, M., Vitali, F., Ashman, H., Balasubramanian, V., Oinas-Kukkonen, H.: Fourth Generation Hypertext: Some Missing Links for the World Wide Web. International Journal of Human-Computer Studies 47, 31–65 (1997)CrossRefGoogle Scholar
  5. 5.
    Chen, Y., Ma, W., Zhang, H.: Detecting Web Page Structure for Adaptive Viewing on Small Form Factor Devices. In: Proceedings of the Twelfth International Conference on World Wide Web (WWW2003), Budapest, Hungary, May 20-24 (2003)Google Scholar
  6. 6.
  7. 7.
    Cunningham, W., Leuf, B.: The Wiki way. Addison-Wesley, New York (2001)Google Scholar
  8. 8.
    Di Iorio, A., Vitali, F.: Writing the Web. Journal of Digital Information, 5(1), 2004-05-27, art. n. 251Google Scholar
  9. 9.
    Gupta, S., Kaiser, G., Neistadt, D., Grimm, P.: DOM-based Content Extraction of HTML Documents. In: Proc. of WWW 2003, Budapest, Hungary, May 20-24 (2003)Google Scholar
  10. 10.
  11. 11.
    Kovacevic, M., Diligenti, M., Gori, M., Milutinovic, V.: Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification. In: Proceedings IEEE International Conference on Data Mining, December 9-12, pp. 250–257 (2002)Google Scholar
  12. 12.
    Koivunen, M.-R.: The Annotea Project. World Wide Web Consortium (2000),
  13. 13.
    Mukherjee, S., Yang, G., Tan, W., Ramakrishnan, I.V.: Automatic Discovery of Semantic Structures in HTML Documents. In: Proceedings of Seventh International Conference on Document Analysis and Recognition (ICDAR 2003), Edinburgh, Scotland, August 3-6, pp. 245–249 (2003)Google Scholar
  14. 14.
    Nelson, T.H.: Literary Machines, Sausalito (CA), USA. Mindful PressGoogle Scholar
  15. 15.
    Nanno, T., Saito, S., Okumura, M.: Structuring Web pages based on Repetition of Element. In: Proceedings of the Second International Workshop on Web Document Analysis(WDA 2003), Edinburgh, UK, August 3, pp. 7–10 (2003)Google Scholar
  16. 16.
    Penn, G., Hu, J., Luo, H., McDonald, R.: Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices. In: 6th International Conference on Document Analysis and Recognition (ICDAR 2001), September 10-13 (2001) (Seattle, The United States)Google Scholar
  17. 17.
    Vitali, F.: Creating sophisticated web sites using well-known interfaces. In: HCI International 2003 Conference, Crete (Greece) (June 2003)Google Scholar
  18. 18.
    Vitali, F.: Functionalities are in systems, features in languages. What is the WWW? In: IV Hypertext Functionalities Workshop, Seventh International World Wide Web Conference, Brisbane (April 14, 1998),
  19. 19.
    Yee, K.: CritLink: Advanced Hyperlinks Enable Public Annotation on the Web. In: Demo to the CSCW 2002 conference, New Orleans (December 2002),
  20. 20.
    Yang, Y., Zhang, H.: HTML Page Analysis Based on Visual Cues. In: 6th International Conference on Document Analysis and Recognition (ICDAR 2001), Seattle, The United States, September 10-13 (2001)Google Scholar
  21. 21.
    Zhang, H., Chen, J., Shi, J., Zhou, B., Fengwu, B.: Function-Based Object Model Towards Website Adaptation. In: Proceedings of the Tenth International Conference on World Wide Web (WWW 2001), Hong Kong, May 1-5, pp. 587–596. ACM Press, New York (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Fabio Vitali
    • 1
  • Angelo Di Iorio
    • 1
  • Elisa Ventura Campori
    • 1
  1. 1.Department of Computer ScienceUniversity of Bologna 

Personalised recommendations