Abstract
In the recent years, many research methodologies are proposed to recognize the spoken language and translate them to text. In this paper, we propose a novel iterative clustering algorithm that makes use of the translated text and reduces error in it. The proposed methodology involves three steps executed over many iterations, namely: (1) unknown word probability assignment, (2) multi-probability normalization, and (3) probability filtering. In the first case, each iteration learns the unknown words from previous iterations and assigns a new probability to the unknown words based on the temporary results obtained in the previous iteration. This process continues until there are no unknown words left. The second case involves normalization of multiple probabilities assigned to a single word by considering neighbour word probabilities. The last step is to eliminate probabilities below the threshold, which ensures the reduction of noise. We measure the quality of clustering with many real-world benchmark datasets. Results show that our optimized algorithm produces more accurate clustering compared to other clustering algorithms.
Similar content being viewed by others
References
Abla Chouni, B., Asmaa, B., & Imane, B. (2019). A survey of clustering algorithms for an industrial context. Procedia Computer Science, 148, 291–302. https://doi.org/10.1016/j.procs.2019.01.022.
Al-Zoghby, A. M., & Khaled, S. (2018). Ontological optimization for latent semantic indexing of arabic corpus. Procedia Computer Science, 142, 206–213. https://doi.org/10.1016/j.procs.2018.10.477.
Atanu, D., Mamata, J., & Jitesh, J. (2018). Senti-N-Gram: An n-gram lexicon for sentiment analysis. Expert Systems with Applications, 103, 92–105. https://doi.org/10.1016/j.eswa.2018.03.004.
Berna, A., & Murat, C. G. (2018). Semantic text classification: A survey of past and recent advances. Information Processing & Management, 54(6), 1129–1153. https://doi.org/10.1016/j.ipm.2018.08.001.
Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.
Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D. M., Ng, A. Y., & Jordan, M. (2003). I: Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Bridgid, F., Ruthann, T., & Katherine, A. (2018). Learning more from feedback: Elaborating feedback with examples enhances concept learning. Learning and Instruction, 54, 104–113. https://doi.org/10.1016/j.learninstruc.2017.08.007.
Daniel Carlos, G. P., Ying, W., Alexandro, B., & Chaohuan, H. (2019). Semi-supervised and active learning through Manifold Reciprocal kNN graph for image retrieval. Neurocomputing, 340, 19–31. https://doi.org/10.1016/j.neucom.2019.02.016.
Elizaveta, K. M., & Vsevolod, I. T. (2018). Text clustering as graph community detection. Procedia Computer Science, 123, 271–277. https://doi.org/10.1016/j.procs.2018.01.042.
Fahd Saleh, A., & Vishal, G. (2018). A cognitive inspired unsupervised language-independent text stemmer for Information retrieval. Cognitive Systems Research, 52, 291–300. https://doi.org/10.1016/j.cogsys.2018.07.003.
Fuyuan, C., Joshua Zhexue, H., Jiye, L., Xingwang, Z., Yinfeng, M., Kai, F., et al. (2018). An algorithm for clustering categorical data with set-valued features. IEEE Transactions on Neural Networks and Learning Systems, 29(10), 4593–4606. https://doi.org/10.1109/TNNLS.2017.2770167.
Kaizhu, H., Haiqin, Y., Irwin, K., & Michael, R. (2008). Maxi-min margin machine: Learning large margin classifiers locally and globally. IEEE Transactions on Neural Networks, 19(12), 260–272. https://doi.org/10.1109/TNN.2007.905855.
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2011). Data mining. Mining of Massive Datasets,. https://doi.org/10.1017/cbo9781139924801.002.
Liang, B., Jiye, L., & Yike, G. (2018). An ensemble clusterer of multiple fuzzyk-means clusterings to recognize arbitrarily shaped clusters. IEEE Transactions on Fuzzy Systems, 26(6), 3524–3533. https://doi.org/10.1109/TFUZZ.2018.2835774.
Lulwah, A., & Mourad, Y. (2018). Interest-based clustering approach for social networks. Arabian Journal for Science and Engineering, 43(2), 935–947. https://doi.org/10.1007/s13369-017-2800-z.
Mane, D. T., & Kulkarni, U. V. (2018). Modified fuzzy hypersphere neural network for pattern classification using supervised clustering. Procedia Computer Science, 143, 295–302. https://doi.org/10.1016/j.procs.2018.10.399.
Mangi, K., Jaelim, A., & Kichun, L. (2018). Opinion mining using ensemble text hidden Markov models for text classification. Expert Systems with Applications, 94, 218–227. https://doi.org/10.1016/j.eswa.2017.07.019.
Manochandar, S., & Punniyamoorthy, M. (2018). Scaling feature selection method for enhancing the classification performance of Support Vector Machines in text mining. Computers & Industrial Engineering, 124, 139–156. https://doi.org/10.1016/j.cie.2018.07.008.
Marcos Wander, R., Seiji, I., & Luiz Enrique, Z. (2018). Educational data mining: A review of evaluation process in the e-learning. Telematics and Informatics, 35(6), 1701–1717. https://doi.org/10.1016/j.tele.2018.04.015.
Morteza, Z., Anteneh, A., Xing, Z., Heidar, D., & Aijun, A. (2019). A utility-based news recommendation system. Decision Support Systems, 117, 14–27. https://doi.org/10.1016/j.dss.2018.12.001.
Powers, D. M. W. (2011). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation. Journal of Machine Learning Technologies, 2, 37–63.
Roger Alan, S., Patricia, A. J., & João Francisco, V. (2019). An analysis of hierarchical text classification using word embeddings. Information Sciences, 471, 216–232. https://doi.org/10.1016/j.ins.2018.09.001.
Ryan, M., & Jeff, B. (2018). Towards justifying unsupervised stationary decisions for geostatistical modeling: Ensemble spatial and multivariate clustering with geomodeling specific clustering metrics. Computers & Geosciences, 120, 82–96. https://doi.org/10.1016/j.cageo.2018.08.005.
Ryosuke, M., & Tu, B. (2018). Semantic term weighting for clinical texts. Expert Systems with Applications, 114, 543–551. https://doi.org/10.1016/j.eswa.2018.08.028.
Sima, S., & Omid, F. (2018). Run-time mapping algorithm for dynamic workloads using association rule mining. Journal of Systems Architecture, 91, 1–10. https://doi.org/10.1016/j.sysarc.2018.09.005.
Smita, C., & Sudarson, J. (2018). Correlation based feature selection with clustering for high dimensional data. Journal of Electrical Systems and Information Technology, 5(3), 542–590. https://doi.org/10.1016/j.jesit.2017.06.004.
Tanvir Habib, S., & Zahid, A. (2018). An analysis of MapReduce efficiency in document clustering using parallel K-means algorithm. Future Computing and Informatics Journal, 3(2), 200–209. https://doi.org/10.1016/j.fcij.2018.03.003.
Uysal, A. K., & Gunal, S. (2014). The impact of preprocessing on text classification. Information Processing & Management, 50(1), 104–112. https://doi.org/10.1016/j.ipm.2013.08.006.
Ximing, L., Ang, Z., Changchun, L., Jihong, O., & Yi, C. (2018). Exploring coherent topics by topic modeling with term weighting. Information Processing & Management, 54(6), 1345–1358. https://doi.org/10.1016/j.ipm.2018.05.009.
Xuejuan, L., Jiabin, Y., & Hanchi, Z. (2018). Efficient and intelligent density and delta-distance clustering algorithm. Arabian Journal for Science and Engineering, 43(12), 7177–7187. https://doi.org/10.1007/s13369-017-3060-7.
Yang, L., Wenming, Z., Zhen, C., & Tong, Z. (2018). Face recognition based on recurrent regression neural network. Neurocomputing, 297, 50–58. https://doi.org/10.1016/j.neucom.2018.02.037.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Palanivinayagam, A., Nagarajan, S. An optimized iterative clustering framework for recognizing speech. Int J Speech Technol 23, 767–777 (2020). https://doi.org/10.1007/s10772-020-09728-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09728-5