Summarizing Structured Documents through a Fractal Technique

  • M. Dolores Ruiz
  • Antonio B. Bailón
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 12)


Every day we search new information in the web, and we found a lot of documents which contain pages with a great amount of information. There is a big demand for automatic summarization in a rapid and precise way. Many methods have been used in automatic extraction but most of them do not take into account the hierarchical structure of the documents. A novel method using the structure of the document was introduced by Yang and Wang in 2004. It is based in a fractal view method for controlling the information displayed. We explain its drawbacks and we solve them using the new concept of fractal dimension of a text document to achieve a better diversification of the extracted sentences improving the performance of the method.


Fractal summarization fractal dimension summarization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: Text summarization for web browsing on handheld devices. In: 10th International WWW Conference, Hong Kong (2001)Google Scholar
  2. 2.
    Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Transactions on Pattern Analysis and Machine Intelligence (2002)Google Scholar
  3. 3.
    Dalamagas, T., Sheng, T., Winkel, K.J., Sellis, T.: A methodology for clustering xml documents by structure. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 137–148 (2004)Google Scholar
  4. 4.
    Daume III, H., Marcu, D.: Induction of word and phrase alignments for automatic document summarization. Computational Linguistics 31(4), 505–530 (2005)CrossRefGoogle Scholar
  5. 5.
    Edmundson, H.P.: New methods in automatic extracting. Journal of the Association for Computing Machinery 16(2), 264–285 (1969)Google Scholar
  6. 6.
    Goldstein, J., Kantrowitx, M., Mittal, V., Carbonell, J.: Summarizing text documents: Sentence selection and evaluation metrics. In: SIGIR 1999, pp. 121–128 (1999)Google Scholar
  7. 7.
    Grasberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica 9D, 189–208 (1983)Google Scholar
  8. 8.
    Guerrini, G., Mesiti, M., Sanz, I.: An overview of similarity measures for clustering XML documents. In: Vakali, A., Pallis, G. (eds.) (2006)Google Scholar
  9. 9.
    Hovy, E.: Text Summarization. Oxford Handbook of computational linguistics, ch. 32Google Scholar
  10. 10.
    Koike, H.: Fractal views: a fractal-based method for controlling information display. ACM Transactions on Information Systems 13(3), 305–323 (1995)CrossRefGoogle Scholar
  11. 11.
    Kraft, R.: Fractals and dimensions. HTTP-Protocol (1995),
  12. 12.
    Lian, W., Sheung, D., Mamoulis, N., Yiu, S.M.: An efficient and scalable algorithm for clustering xml documents by structure. TKDEE 16(1), 82–96 (2004)Google Scholar
  13. 13.
    Liebovitch, L.S., Toth, T.: A fast algorithm to determine fractal dimensions by box counting. Physics Letters A 141(8,9), 386–390 (1989)CrossRefGoogle Scholar
  14. 14.
    Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal, pp.159–165 (April 1958)Google Scholar
  15. 15.
    Mandelbrot, B.B.: The Fractal Geometry of Nature. W.H. Freeman, New York (1983)Google Scholar
  16. 16.
    Mandelbrot, B.B.: Self-affine fractal sets. In: Pietronero, L., Tosatti, E. (eds.) Fractals in Physics, Amsterdam (1986)Google Scholar
  17. 17.
    Marcu, D.: Improving summarization through rhetorical parsing tuning. In: The COLINGACL Workshop on Very Large Corpora, Montreal, Canada (1998)Google Scholar
  18. 18.
    Morris, G., Kasper, G.M., Adams, D.A.: The effect and limitation of automated text condensing on reading comprehension performance. Information System Research, 17–35 (1992)Google Scholar
  19. 19.
    Ruiz, M.D., Bailón, A.B.: Fractal dimension of text documents: Application in fractal summarization. In: IADIS International Conference WWW/Internet, vol. 2, pp. 349–353 (2006)Google Scholar
  20. 20.
    Salton, G., McGill, M.J.: Introduction to modern Information Retrieval. McGraw-Hill Book Co., New York (1983)Google Scholar
  21. 21.
    Sheskin, D.: Handbook of parametric and nonparametric statistical procedures, 3rd edn. Chapman & Hall/CRC (2003)Google Scholar
  22. 22.
    Yang, C.C., Chen, H., Hong, K.: Visualization of large category map for Internet browsing. Decision Support Systems 35, 89–102 (2003)CrossRefGoogle Scholar
  23. 23.
    Yang, C.C., Wang, F.L.: Fractal summarization for mobile devices to access large documents on the Web. In: 12th International WWW Conference, Budapest, Hungary (2003)Google Scholar
  24. 24.
    Yang, C.C., Wang, F.L.: Fractal summarization: Summarization based on fractal theory. In: SIGIR 2003, Toronto, Canada (2003)Google Scholar
  25. 25.
    Yang, C.C., Wang, F.L.: A relevance feedback model for fractal summarization. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 368–377. Springer, Heidelberg (2004)Google Scholar
  26. 26.
    Ko, Y., et al.: Topic keyword identification for text summarization using lexical clustering. IEICE transactions on information and systems, vol. E86-D, pp.1695–1701 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • M. Dolores Ruiz
    • 1
  • Antonio B. Bailón
    • 1
  1. 1.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations