Skip to main content

A Semi-Automated Approach to Building Text Summarisation Classifiers

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2012)

Abstract

An investigation into the extraction of useful information from the free text element of questionnaires, using a semi-automated summarisation extraction technique to generate text summarisation classifiers, is described. A realisation of the proposed technique, SARSET (Semi-Automated Rule Summarisation Extraction Tool), is presented and evaluated using real questionnaire data. The results of this approach are compared against the results obtained using two alternative techniques to build text summarisation classifiers. The first of these uses standard rule-based classifier generators, and the second is founded on the concept of building classifiers using secondary data. The results demonstrate that the proposed semi-automated approach outperforms the other two approaches considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abd-Elrahman, A., Andreu, M., Abbott, T.: Using text data mining techniques for understanding free-style question answers in course evaluation forms. Research in Higher Education Journal 9, 11–21 (2010)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)

    Google Scholar 

  3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval, vol. 463. ACM press, New York (1999)

    Google Scholar 

  4. Chen, Y.L., Weng, C.H.: Mining fuzzy association rules from questionnaire data. Knowledge-Based Systems 22, 46–56 (2009)

    Article  Google Scholar 

  5. Coenen, F.: The LUCS-KDD TFP Association Rule Mining Algorithm. Department of Computer Science, The University of Liverpool, UK (2004), http://www.csc.liv.ac.uk/~frans/KDD/Software/Apriori_TFP/aprioriTFP.html

  6. Coenen, F.: The LUCS-KDD TFPC Classification Association Rule Mining Algorithm. Department of Computer Science, The University of Liverpool, UK (2004), http://www.csc.liv.ac.uk/~frans/KDD/Software/Apriori_TFPC/aprioriTFPC.html

  7. Garcia-Constantino, M., Coenen, F., Noble, P.-J., Radford, A., Setzkorn, C., Tierney, A.: An investigation concerning the generation of text summarisation classifiers using secondary data. In: Perner, P. (ed.) MLDM 2011. LNCS, vol. 6871, pp. 387–398. Springer, Heidelberg (2011)

    Google Scholar 

  8. Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45, 171–186 (2001)

    Article  MATH  Google Scholar 

  9. Hiramatsu, A., Oiso, H., Tamura, S., Komoda, N.: Support system for analyzing open-ended questionnaires data by culling typical opinions. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1377–1382 (2004)

    Google Scholar 

  10. Hirasawa, S.: Analyses of Student Questionnaires for Faculty Developments. A Short Course at Tamkang University Taipei, Taiwan, R.O.C., March 7-9 (2006)

    Google Scholar 

  11. Hirasawa, S., Chu, W.W.: Knowledge acquisition from documents with both fixed and free formats. In: 2003 IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4694–4699 (2003)

    Google Scholar 

  12. Hiroko, I., Masao, U., Hitoshi, I.: Criterion for judging request intention in response texts of open-ended questionnaires. In: Proceedings of the Second International Workshop on Paraphrasing, pp. 49–56. Association for Computational Linguistics (2003)

    Google Scholar 

  13. Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proceedings of the First International Conference on Machine Learning and Cybernetics, pp. 944–946 (2002)

    Google Scholar 

  14. Joshi, A.K.: Natural language processing. Science 253, 1242 (1991)

    Article  Google Scholar 

  15. McCallum, A.: Information extraction: Distilling structured data from unstructured text. ACM Queue 3, 48–57 (2005)

    Article  Google Scholar 

  16. Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–349 (2002)

    Google Scholar 

  17. Nagamachi, M.: Kansei engineering: a new ergonomic consumer-oriented technology for product development. International Journal of Industrial Ergonomics 15, 3–11 (1995)

    Article  Google Scholar 

  18. Radford, A., Noble, P.J., Coyne, K.P., Gaskell, R.M., Jones, P.H., Bryan, J.G.E., Setzkorn, C., Tierney, Á., Dawson, S.: Antibacterial prescribing patterns in small animal veterinary practice identified via SAVSNET: the small animal veterinary surveillance network. Veterinary Record 169, 310–318 (2011)

    Article  Google Scholar 

  19. Rosell, M., Velupillai, S.: Revealing relations between open and closed answers in questionnaires through text clustering evaluation. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 1716–1722 (2008)

    Google Scholar 

  20. Svátek, V.: Ontologies, Questionnaires and (Mining) Tabular Data. In: the 3rd European Semantic Web Conference (ESWC 2006) (2006)

    Google Scholar 

  21. Uchida, Y., Yoshikawa, T., Furuhashi, T., Hirao, E., Iguchi, H.: Extraction of important keywords in free text of questionnaire data and visualization of relationship among sentences. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2009), pp. 1604–1608 (2009)

    Google Scholar 

  22. Willett, P.: The Porter stemming algorithm: then and now. Program: Electronic Library and Information Systems 40, 219–223 (2006)

    Article  Google Scholar 

  23. Yamanishi, K., Li, H.: Mining open answers in questionnaire data. IEEE Intelligent Systems, pp. 58–63 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Garcia-Constantino, M., Coenen, F., Noble, P.J., Radford, A., Setzkorn, C. (2012). A Semi-Automated Approach to Building Text Summarisation Classifiers. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31537-4_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31536-7

  • Online ISBN: 978-3-642-31537-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics