Document representation based on probabilistic word clustering in customer-voice classification

  • Younghoon Lee
  • Seokmin Song
  • Sungzoon ChoEmail author
  • Jinhae Choi
Industrial and commercial application


Customer-voice data have an important role in different fields including marketing, product planning, and quality assurance. However, owing to the manual processes involved, there are problems associated with the classification of customer-voice data. This study focuses on building automatic classifiers for customer-voice data with newly proposed document representation methods based on neural-embedding and probabilistic word-clustering approaches. Semantically similar terms are classified into a common cluster. The words generated from neural embedding are clustered according to the membership strength of each word relative to each cluster derived from a probabilistic clustering method such as the fuzzy C-means clustering method or Gaussian mixture model. It is expected that the proposed method can be suitable for the classification of customer-voice data consisting of unstructured text by considering the membership strength. The results demonstrate that the proposed method achieved an accuracy of 89.24% with respect to representational effectiveness and an accuracy of 87.76% with respect to the classification performance of customer-voice data consisting of 12 classes. Further, the method provided an intuitive interpretation for the generated representation.


Probabilistic word clustering Document representation Customer-voice Classification 



I would like to express my appreciation to LG Electronics who provided me the dataset of customer-voice used in experiments section in our study.


  1. 1.
    Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New YorkGoogle Scholar
  2. 2.
    Bekkerman R, El-Yaniv R, Tishby N, Winter Y (2003) Distributional word clusters vs. words for text categorization. J Mach Learn Res 3(Mar):1183–1208zbMATHGoogle Scholar
  3. 3.
    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022zbMATHGoogle Scholar
  4. 4.
    Bouziane H, Messabih B, Chouarfia A (2011) Profiles and majority voting-based ensemble method for protein secondary structure prediction. Evol Bioinform Online 7:171CrossRefGoogle Scholar
  5. 5.
    Cai L, Hofmann T (2003) Text categorization by boosting automatically extracted concepts. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 182–189Google Scholar
  6. 6.
    Cai Z, Hu X, Li H, Graesser A (2016) Can word probabilities from lda be simply added up to represent documents? In: Proceedings of the 9th international conference on educational data miningGoogle Scholar
  7. 7.
    Cost S, Salzberg S (1993) A weighted nearest neighbor algorithm for learning with symbolic features. Mach Learn 10(1):57–78Google Scholar
  8. 8.
    Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27CrossRefzbMATHGoogle Scholar
  9. 9.
    Dai AM, Olah C, Le QV (2015) Document embedding with paragraph vectors. arXiv:1507.07998Google Scholar
  10. 10.
    Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2–3):103–130CrossRefzbMATHGoogle Scholar
  11. 11.
    dos Santos CN, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78Google Scholar
  12. 12.
    Dumais ST (2004) Latent semantic analysis. Ann Rev Inf Sci Technol 38(1):188–230CrossRefGoogle Scholar
  13. 13.
    Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396CrossRefGoogle Scholar
  14. 14.
    Gallant SI (1993) Neural network learning and expert systems. MIT Press, CambridgeCrossRefzbMATHGoogle Scholar
  15. 15.
    Gaskin SP, Griffin A, Hauser JR, Katz GM, Klein RL (2010) Voice of the customer. Wiley, HobokenCrossRefGoogle Scholar
  16. 16.
    Ghayoomi M (2012) Word clustering for persian statistical parsing. In: Isahara H, Kanzaki K (eds) Advances in natural language processing. Springer, Berlin, Heidelberg, pp 126–137CrossRefGoogle Scholar
  17. 17.
    Griffin A, Hauser JR (1993) The voice of the customer. Mark Sci 12(1):1–27CrossRefGoogle Scholar
  18. 18.
    Harris ZS (1954) Distributional structure. Word 10(2–3):146–162CrossRefGoogle Scholar
  19. 19.
    James CB (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, DordrechtzbMATHGoogle Scholar
  20. 20.
    Katz GM (2001) The one right way to gather the voice of the customer. PDMA Vis Mag 25(4):1–6Google Scholar
  21. 21.
    Kim HK, Kim H, Cho S (2017) Bag-of-concepts: comprehending document representation through clustering words in distributed representation. Neurocomputing 266:336–352CrossRefGoogle Scholar
  22. 22.
    Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882Google Scholar
  23. 23.
    Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th international conference on artificial intelligence (AI’2015), vol 333, pp 2267–2273Google Scholar
  24. 24.
    Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Processes 25(2–3):259–284CrossRefGoogle Scholar
  25. 25.
    Langley P, Iba W, Thompson K (1992) An analysis of bayesian classifiers. Aaai 90:223–228Google Scholar
  26. 26.
    Le QV, Mikolov T (2014) Distributed representations of sentences and documents. ICML 14:1188–1196Google Scholar
  27. 27.
    Lewis DD (1998) Naive (bayes) at forty: the independence assumption in information retrieval. In: European conference on machine learning. Springer, pp 4–15Google Scholar
  28. 28.
    Manning CD, Schütze H (1999) Foundations of statistical natural language processing, vol 999. MIT Press, CambridgezbMATHGoogle Scholar
  29. 29.
    McCulloch WS, Pitts W (1990) A logical calculus of the ideas immanent in nervous activity. Bull Math Biol 52(1–2):99–115CrossRefzbMATHGoogle Scholar
  30. 30.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781Google Scholar
  31. 31.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp 3111–3119Google Scholar
  32. 32.
    Mitrofanova O (2009) Automatic word clustering in studying semantic structure of texts. Adv Comput Linguist Res Comput Sci Mexico 41:27–34Google Scholar
  33. 33.
    Mucherino A, Papajorgji PJ, Pardalos PM (2009) k-Nearest neighbor classification. In: Data mining in agriculture. Springer, New York, pp 83–106Google Scholar
  34. 34.
    Orrite C, Rodríguez M, Martínez F, Fairhurst M (2008) Classifier ensemble generation for the majority vote rule. In: Ruiz-Shulcloper J, Kropatsch WG (eds) Iberoamerican congress on pattern recognition. Springer, Berlin, Heidelberg, pp 340–347Google Scholar
  35. 35.
    Sagae K, Gordon AS (2009) Clustering words by syntactic similarity improves dependency parsing of predicate-argument structures. In: Proceedings of the 11th international conference on parsing technologies. Association for Computational Linguistics, pp 192–201Google Scholar
  36. 36.
    Saha SK, Mitra P, Sarkar S (2008) Word clustering and word selection based feature reduction for MaxEnt based Hindi NER. In: Proceedings of ACL-08, HLT, pp 488–495Google Scholar
  37. 37.
    Sahlgren M (2006) The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Institutionen för lingvistikGoogle Scholar
  38. 38.
    Sayeedunnissa SF, Hussain AR, Hameed MA (2013) Supervised opinion mining of social network data using a bag-of-words approach on the cloud. In: Proceedings of seventh international conference on bio-inspired computing: theories and applications (BIC-TA 2012). Springer, pp 299–309Google Scholar
  39. 39.
    Steinwart I, Christmann A (2008) Support vector machines. Springer, BerlinzbMATHGoogle Scholar
  40. 40.
    Suárez-Paniagua V, Segura-Bedmar I, Martínez P (2015) Word embedding clustering for disease named entity recognition. In: Proceedings of the fifth BioCreative challenge evaluation workshop. pp 299–304Google Scholar
  41. 41.
    Temkin BD, Chatham B, Amato M (2005) The customer experience value chain: an enterprisewide approach for meeting customer needs. Forrester ResGoogle Scholar
  42. 42.
    Vapnik VN, Vapnik V (1998) Statistical learning theory, vol 1. Wiley, New YorkzbMATHGoogle Scholar
  43. 43.
    Vapnik V (1995) The nature of statistical learning theory. Springer, New YorkCrossRefzbMATHGoogle Scholar
  44. 44.
    Walker SH, Duncan DB (1967) Estimation of the probability of an event as a function of several independent variables. Biometrika 54(1–2):167–179MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Xing C, Wang D, Zhang X, Liu C (2014) Document classification with distributions of word vectors. In: Signal and information processing association annual summit and conference (APSIPA), 2014 Asia-Pacific. IEEE, pp 1–5Google Scholar
  46. 46.
    Zhong S (2005) Efficient online spherical k-means clustering. In: Proceedings 2005 IEEE international joint conference on neural networks, 2005., vol 5. IEEE, pp 3180–3185Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Industrial Engineering and Institute for Industrial Systems InnovationSeoul National UniversitySeoulKorea
  2. 2.Data Driven User Experience Team, Mobile Communication LabLG ElectronicsSeoulKorea

Personalised recommendations