Skip to main content

A Comparative Study of Text Preprocessing Techniques for Natural Language Call Routing

  • Chapter
  • First Online:
Dialogues with Social Robots

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 427))

Abstract

The article describes a comparative study of text preprocessing techniques for natural language call routing. Seven different unsupervised and supervised term weighting methods were considered. Four different dimensionality reduction methods were applied: stop-words filtering with stemming, feature selection based on term weights, feature transformation based on term clustering, and a novel feature transformation method based on terms belonging to classes. As classification algorithms we used k-NN and the SVM-based algorithm Fast Large Margin. The numerical experiments showed that the most effective term weighting method is Term Relevance Ratio (TRR). Feature transformation based on term clustering is able to significantly decrease dimensionality without significantly changing the classification effectiveness, unlike other dimensionality reduction methods. The novel feature transformation method reduces the dimensionality radically: number of features is equal to number of classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Suhm, B., Bers, J., McCarthy, D., Freeman, B., Getty, D., Godfrey, K., Peterson, P.: A comparative study of speech in the call center: natural language call routing vs. touch-tone menus. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 283–290. ACM (2002)

    Google Scholar 

  2. Lee, C., Jung, S., Kim, S., Lee, G.G.: Example-based dialog modeling for practical multi-domain dialog system. Speech Commun. 51(5), 466–484 (2009)

    Article  Google Scholar 

  3. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  4. Fox, C.: A stop list for general text. In: ACM SIGIR Forum, vol. 24, pp. 19–21. ACM (1989)

    Google Scholar 

  5. Porter, M.F.: Snowball: a language for stemming algorithms (2001)

    Google Scholar 

  6. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Proc. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  7. Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. In: Text Mining and its Applications, pp. 81–97. Springer (2004)

    Google Scholar 

  8. Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the vector space model. IJCAI 5, 1130–1135 (2005)

    Google Scholar 

  9. Xu, H., Li, C.: A novel term weighting scheme for automated text categorization. In: Seventh International Conference on Intelligent Systems Design and Applications, ISDA 2007, pp. 759–764. IEEE (2007)

    Google Scholar 

  10. Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 721–735 (2009)

    Article  Google Scholar 

  11. Ko, Y.: A study of term weighting schemes using class information for text classification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1029–1030. ACM (2012)

    Google Scholar 

  12. Gasanova, T., Sergienko, R., Akhmedova, S., Semenkin, E., Minker, W.: Opinion mining and topic categorization with novel term weighting. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 84–89. ACL (2014)

    Google Scholar 

  13. Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    MATH  Google Scholar 

  14. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. ICML 97, 412–420 (1997)

    Google Scholar 

  15. Sergienko, R., Gasanova, T., Semenkin, E., Minker, W.: Text categorization methods application for natural language call routing. In: 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO), vol. 2, pp. 827–831. IEEE (2014)

    Google Scholar 

  16. Momtazi, S., Klakow, D.: A word clustering approach for language model-based sentence retrieval in question answering systems. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1911–1914. ACM (2009)

    Google Scholar 

  17. Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)

    Article  MathSciNet  Google Scholar 

  18. Han, E.H.S., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. Springer (2001)

    Google Scholar 

  19. Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Tech. 1(1), 4–20 (2010)

    Google Scholar 

  20. Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods. Kluwer Academic Publishers, Theory and Algorithms (2002)

    Book  Google Scholar 

  21. Morariu, D.I., Vintan, L.N., Tresp, V.: Meta-classification using SVM classifiers for text documents. Int. J. Appl. Math. Comput. Sci. 1(1) (2005)

    Google Scholar 

  22. Shafait, F., Reif, M., Kofler, C., Breuel, T.M.: Pattern recognition engineering. In: RapidMiner Community Meeting and Conference, vol. 9. Citeseer (2010)

    Google Scholar 

  23. Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Advances in Information Retrieval, pp. 345–359. Springer (2005)

    Google Scholar 

  24. Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: using aggressive feature selection to make svms competitive with c4. 5. In: Proceedings of the Twenty-First International Conference on Machine learning, p. 41. ACM (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman Sergienko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this chapter

Cite this chapter

Sergienko, R., Shan, M., Schmitt, A. (2017). A Comparative Study of Text Preprocessing Techniques for Natural Language Call Routing. In: Jokinen, K., Wilcock, G. (eds) Dialogues with Social Robots. Lecture Notes in Electrical Engineering, vol 427. Springer, Singapore. https://doi.org/10.1007/978-981-10-2585-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2585-3_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2584-6

  • Online ISBN: 978-981-10-2585-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics