Skip to main content

Questionnaire Free Text Summarisation Using Hierarchical Classification

  • 816 Accesses

Abstract

This paper presents an investigation into the summarisation of the free text element of questionnaire data using hierarchical text classification. The process makes the assumption that text summarisation can be achieved using a classification approach whereby several class labels can be associated with documents which then constitute the summarisation. A hierarchical classification approach is suggested which offers the advantage that different levels of classification can be used and the summarisation customised according to which branch of the tree the current document is located. The approach is evaluated using free text from questionnaires used in the SAVSNET (Small Animal Veterinary Surveillance Network) project. The results demonstrate the viability of using hierarchical classification to generate free text summaries.

Keywords

  • Support Vector Machine
  • Class Label
  • Child Node
  • Parent Node
  • Free Text

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4471-4739-8_3
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   229.00
Price excludes VAT (USA)
  • ISBN: 978-1-4471-4739-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   299.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afantenos, S. and Karkaletsis, V. and Stamatopoulos, P. (2005). Summarization from medical documents: a survey. Artificial Intelligence in Medicine Vol. 33, pp157-177.

    CrossRef  Google Scholar 

  2. Alonso, L. and Castell’on, I. and Climent, S. and Fuentes, M. and Padr’o, L. and Rodr’ıguez, H (2004). Approaches to text summarization: Questions and answers. Inteligencia Artificial Vol. 8, pp22.

    CrossRef  Google Scholar 

  3. Celikyilmaz, A. and Hakkani-T‥ur, D. (2011). Concept-based classification for multi-document summarization. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp5540-5543.

    Google Scholar 

  4. Chuang, W. and Tiyyagura, A. and Yang, J. and Giuffrida, G. (2000). A fast algorithm for hierarchical text classification. Data Warehousing and Knowledge Discovery, pp409-418.

    Google Scholar 

  5. Dhillon, I.S. and Mallela, S. and Kumar, R. (2002). Enhanced word clustering for hierarchical text classification. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp191-200.

    Google Scholar 

  6. Dumais, S. and Chen, H. (2000). Hierarchical classification of web content. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp256-263.

    Google Scholar 

  7. Duwairi, R. and Al-Zubaidi, R. (2011). A Hierarchical K-NN Classifier for Textual Data. The International Arab Journal of Information Technology. Vol. 8, pp251-259.

    Google Scholar 

  8. Fragoudis, D. and Meretakis, D. and Likothanassis, S. (2005). Best terms: an efficient featureselection algorithm for text categorization. Knowledge and Information Systems. Vol. 8, pp16- 33.

    CrossRef  Google Scholar 

  9. Gao, F. and Fu, W. and Zhong, Y. and Zhao, D. (2004). Large-Scale Hierarchical Text Classification Based on Path Semantic Vector and Prior Information. CIS’09. International Conference on Computational Intelligence and Security. Vol. 1, pp54-58.

    Google Scholar 

  10. Garcia-Constantino, M. F. and Coenen, F. and Noble, P. and Radford, A. and Setzkorn, C. and Tierney, A. (2011). An Investigation Concerning the Generation of Text Summarisation Classifiers using Secondary Data. Seventh International Conference on Machine Learning and Data Mining. Springer, pp387-398.

    Google Scholar 

  11. Garcia-Constantino, M. F. and Coenen, F. and Noble, P. and Radford, A. and Setzkorn, C. (2012). A Semi-Automated Approach to Building Text Summarisation Classifiers. To be presented at the Eight International Conference on Machine Learning and Data Mining. Springer.

    Google Scholar 

  12. Granitzer, M. (2003). Hierarchical text classification using methods from machine learning. Master’s Thesis, Graz University of Technology.

    Google Scholar 

  13. Hand, D.J. and Till, R.J. (2001). A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning, 45, pp171-186.

    MATH  CrossRef  Google Scholar 

  14. Hardy, H. and Shimizu, N. and Strzalkowski, T. and Ting, L. and Zhang, X. and Wise, G.B. (2002). Cross-document summarization by concept classification. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp121-128.

    Google Scholar 

  15. Jaoua, M. and Hamadou, A. (2003). Automatic text summarization of scientific articles based on classification of extracts population. Computational Linguistics and Intelligent Text Processing, pp363-377.

    Google Scholar 

  16. Jones, K.S. and others. (1999). Automatic summarizing: factors and directions. Advances in automatic text summarization, pp1-12.

    Google Scholar 

  17. Katakis, I. and Tsoumakas, G. and Vlahavas, I. (2008). Multilabel text classification for automated tag suggestion. Proceedings of the ECML/PKDD 2008. Workshop in Discovery Challenge, pp75-83. Antwerp, Belgium.

    Google Scholar 

  18. Koller, D. and Sahami, M. (1997). Hierarchically Classifying Documents Using Very Few Words. Proceedings of the Fourteenth International Conference on Machine Learning, pp170- 178.

    Google Scholar 

  19. Kumilachew, A. (2011). Hierarchical Amharic News Text Classification: Using Support Vector Machine Approach. VDM Verlag Dr. M‥uller.

    Google Scholar 

  20. Platt, J.C. (1999). Using analytic QP and sparseness to speed training of support vector machines. Advances in neural information processing systems, pp557-563.

    Google Scholar 

  21. Pulijala, A. and Gauch, S. (2004). Hierarchical text classification. International Conference on Cybernetics and Information Technologies, Systems and Applications: CITSA, pp21-25.

    Google Scholar 

  22. Qiu, X. and Huang, X. and Liu, Z. and Zhou, J. (2011). Hierarchical Text Classification with Latent Concepts. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Vol. 2, pp598-602.

    Google Scholar 

  23. Radford, A. and Tierney, A’. and Coyne, K.P. and Gaskell, R.M. and Noble, P.J. and Dawson, S. and Setzkorn, C. and Jones, P.H. and Buchan, I.E. and Newton, J.R. and Bryan, J.G.E. (2010). Developing a network for small animal disease surveillance. Veterinary Record. Vol. 167, pp472-474.

    CrossRef  Google Scholar 

  24. Rousu, J. and Saunders, C. and Szedmak, S. and Shawe-Taylor, J. (2005). Learning Hierarchical Multi-Category Text Classification Models. Proceedings of the 22nd International Conference on Machine Learning, pp744-751.

    CrossRef  Google Scholar 

  25. Ruiz, M.E. and Srinivasan, P. (2002). Hierarchical text categorization using neural networks. Information Retrieval. Vol. 5, pp87-118.

    MATH  CrossRef  Google Scholar 

  26. Saravanan, M. and Raj, P.C.R. and Raman, S. (2003). Summarization and categorization of text data in high-level data cleaning for information retrieval. Applied Artificial Intelligence, Vol. 17, pp461-474.

    CrossRef  Google Scholar 

  27. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR). Vol. 34, pp1-47.

    Google Scholar 

  28. Silla, C.N. and Freitas, A.A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery Vol. 22, pp31-72.

    MathSciNet  MATH  CrossRef  Google Scholar 

  29. Sun, A. and Lim, E.P. (2001). Hierarchical text classification and evaluation. ICDM 2001, Proceedings IEEE International Conference on Data Mining. IEEE, pp521-528.

    Google Scholar 

  30. Toutanova, K. and Chen, F. and Popat, K. and Hofmann, T. (2001). Text classification in a hierarchical mixture model for small training sets. Proceedings of the tenth international conference on Information and knowledge management, pp105-113.

    Google Scholar 

  31. Willett, P. (2006). The Porter stemming algorithm: then and now. Program: electronic library and information systems Vol. 40, pp219-223.

    Google Scholar 

  32. Zheng, Z. and Wu, X. and Srihari, R. (2004). Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter Vol. 6, pp80-89.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matias Garcia-Constantino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag London

About this paper

Cite this paper

Garcia-Constantino, M., Coenen, F., Noble, PJ., Radford, A. (2012). Questionnaire Free Text Summarisation Using Hierarchical Classification. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXIX. SGAI 2012. Springer, London. https://doi.org/10.1007/978-1-4471-4739-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4739-8_3

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4738-1

  • Online ISBN: 978-1-4471-4739-8

  • eBook Packages: Computer ScienceComputer Science (R0)