Skip to main content

Summarizing Structured Documents through a Fractal Technique

  • Conference paper
Enterprise Information Systems (ICEIS 2007)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 12))

Included in the following conference series:

  • 1124 Accesses

Abstract

Every day we search new information in the web, and we found a lot of documents which contain pages with a great amount of information. There is a big demand for automatic summarization in a rapid and precise way. Many methods have been used in automatic extraction but most of them do not take into account the hierarchical structure of the documents. A novel method using the structure of the document was introduced by Yang and Wang in 2004. It is based in a fractal view method for controlling the information displayed. We explain its drawbacks and we solve them using the new concept of fractal dimension of a text document to achieve a better diversification of the extracted sentences improving the performance of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Seeing the whole in parts: Text summarization for web browsing on handheld devices. In: 10th International WWW Conference, Hong Kong (2001)

    Google Scholar 

  2. Camastra, F., Vinciarelli, A.: Estimating the intrinsic dimension of data with a fractal-based method. IEEE Transactions on Pattern Analysis and Machine Intelligence (2002)

    Google Scholar 

  3. Dalamagas, T., Sheng, T., Winkel, K.J., Sellis, T.: A methodology for clustering xml documents by structure. In: European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 137–148 (2004)

    Google Scholar 

  4. Daume III, H., Marcu, D.: Induction of word and phrase alignments for automatic document summarization. Computational Linguistics 31(4), 505–530 (2005)

    Article  Google Scholar 

  5. Edmundson, H.P.: New methods in automatic extracting. Journal of the Association for Computing Machinery 16(2), 264–285 (1969)

    Google Scholar 

  6. Goldstein, J., Kantrowitx, M., Mittal, V., Carbonell, J.: Summarizing text documents: Sentence selection and evaluation metrics. In: SIGIR 1999, pp. 121–128 (1999)

    Google Scholar 

  7. Grasberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica 9D, 189–208 (1983)

    Google Scholar 

  8. Guerrini, G., Mesiti, M., Sanz, I.: An overview of similarity measures for clustering XML documents. In: Vakali, A., Pallis, G. (eds.) (2006)

    Google Scholar 

  9. Hovy, E.: Text Summarization. Oxford Handbook of computational linguistics, ch. 32

    Google Scholar 

  10. Koike, H.: Fractal views: a fractal-based method for controlling information display. ACM Transactions on Information Systems 13(3), 305–323 (1995)

    Article  Google Scholar 

  11. Kraft, R.: Fractals and dimensions. HTTP-Protocol (1995), http://www.weihenstephan.de

  12. Lian, W., Sheung, D., Mamoulis, N., Yiu, S.M.: An efficient and scalable algorithm for clustering xml documents by structure. TKDEE 16(1), 82–96 (2004)

    Google Scholar 

  13. Liebovitch, L.S., Toth, T.: A fast algorithm to determine fractal dimensions by box counting. Physics Letters A 141(8,9), 386–390 (1989)

    Article  Google Scholar 

  14. Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal, pp.159–165 (April 1958)

    Google Scholar 

  15. Mandelbrot, B.B.: The Fractal Geometry of Nature. W.H. Freeman, New York (1983)

    Google Scholar 

  16. Mandelbrot, B.B.: Self-affine fractal sets. In: Pietronero, L., Tosatti, E. (eds.) Fractals in Physics, Amsterdam (1986)

    Google Scholar 

  17. Marcu, D.: Improving summarization through rhetorical parsing tuning. In: The COLINGACL Workshop on Very Large Corpora, Montreal, Canada (1998)

    Google Scholar 

  18. Morris, G., Kasper, G.M., Adams, D.A.: The effect and limitation of automated text condensing on reading comprehension performance. Information System Research, 17–35 (1992)

    Google Scholar 

  19. Ruiz, M.D., Bailón, A.B.: Fractal dimension of text documents: Application in fractal summarization. In: IADIS International Conference WWW/Internet, vol. 2, pp. 349–353 (2006)

    Google Scholar 

  20. Salton, G., McGill, M.J.: Introduction to modern Information Retrieval. McGraw-Hill Book Co., New York (1983)

    Google Scholar 

  21. Sheskin, D.: Handbook of parametric and nonparametric statistical procedures, 3rd edn. Chapman & Hall/CRC (2003)

    Google Scholar 

  22. Yang, C.C., Chen, H., Hong, K.: Visualization of large category map for Internet browsing. Decision Support Systems 35, 89–102 (2003)

    Article  Google Scholar 

  23. Yang, C.C., Wang, F.L.: Fractal summarization for mobile devices to access large documents on the Web. In: 12th International WWW Conference, Budapest, Hungary (2003)

    Google Scholar 

  24. Yang, C.C., Wang, F.L.: Fractal summarization: Summarization based on fractal theory. In: SIGIR 2003, Toronto, Canada (2003)

    Google Scholar 

  25. Yang, C.C., Wang, F.L.: A relevance feedback model for fractal summarization. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 368–377. Springer, Heidelberg (2004)

    Google Scholar 

  26. Ko, Y., et al.: Topic keyword identification for text summarization using lexical clustering. IEICE transactions on information and systems, vol. E86-D, pp.1695–1701 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ruiz, M.D., Bailón, A.B. (2008). Summarizing Structured Documents through a Fractal Technique. In: Filipe, J., Cordeiro, J., Cardoso, J. (eds) Enterprise Information Systems. ICEIS 2007. Lecture Notes in Business Information Processing, vol 12. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88710-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88710-2_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88709-6

  • Online ISBN: 978-3-540-88710-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics