Ensemble of multiple kNN classifiers for societal risk classification
- 123 Downloads
Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor (kNN) classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector (kNN-PV) shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification.
KeywordsSocietal risk classification Tianya Forum k-Nearest Neighbor ensemble Paragraph Vector
Unable to display preview. Download preview PDF.
This study is supported by the National Key Research and Development Program of China under grant No. 2016YFB1000902 and National Natural Science Foundation of China under grant Nos. 61473284, 71601023 and 71371107.
- Bao, Y., Ishii, N. & Du, X. (2004). Combining multiple k-nearest neighbor classifiers using different distance functions. In: Yang ZR, Yin HJ & Everson RM (eds.), Proceedings Intelligent Data Engineering and Automated Learning–IDEAL 2004, 634-641, Exeter, August 25-27, 2004, Springer Berlin Heidelberg.Google Scholar
- Chen, J.D. & Tang, X.J. (2014b). Societal risk classification of post based on paragraph vector and kNN method. In: Wang S Y, Nakamori Y & Huynh V N (eds.), Proceedings of the 15th International Symposium on Knowledge and Systems Sciences, 117–123, Sapporo, November 1-2, 2014, JAIST Press.Google Scholar
- Chen, J.D. & Tang, X.J. (2017). The distributed representation for societal risk classification toward BBS posts. Journal of Systems Science & Complexity. DOI:10.1007/s11424-016-5099-z.Google Scholar
- Hirsch, L., Hirsch, R. & Saeedi, M. (2007). Evolving Lucene search queries for text classification. In: Proceedings of 2007 Genetic and Evolutionary Computation Conference, 1604–1611, London, July 7 -11, 2007, ACM.Google Scholar
- Jeffrey, P., Richard, S. & Christopher, M. (2014). Glove: global vectors for wordrepresentation. In: Proceedings of the Empirical Methods in Natural Language Processing, 1532–1543, Doha, October 25-29, 2014, Association for Computational Linguistics.Google Scholar
- Le, Q. & Mikolov, T. (2014). Distributed representations of sentences and documents. Computer Science, 4: 1188–1196.Google Scholar
- Nie, D., Guan, Z., Hao, B., Bai, S. & Zhu, T.S. (2014). Predicting personality on social media with semi-supervised learning. In: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, 2: 158–165, Warsaw, August 11 -14, 2014, IEEE Computer Society.Google Scholar
- Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. Paper presented at International Conference on Learning Representations, Scottsdale, 1-12, 2013.Google Scholar
- Qiu, L., Cao, Y., Nie, Z.Q. & Rui, Y. (2014). Learning word representation considering proximity and ambiguity. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, 1572–1578, Québec, July 27 -31, 2014, AAAI Press.Google Scholar
- Rodriguez, M.G., Gummadi, K. & Schoelkopf, B. (2014). Quantifying information overload in social media and its impact on social contagions. arXiv preprint arXiv:1403.6838.Google Scholar
- Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1631–1642, Washington, 18-21 October 2013, Association for Computational Linguistics.Google Scholar
- Wen, S.Y. & Wan, X.J. (2014). Emotion classification in Microblog texts using class sequential rules. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, 187–193, Québec, July 27 -31, 2014, AAAI Press.Google Scholar
- Zhao, Y.L. & Tang, X.J. (2013). A preliminary research of pattern of users’ behavior based on Tianya Forum. In: Proceedings of the 14th International Symposium on Knowledge and Systems Sciences, 139–145, Ningbo, Oct. 25-27, 2013, JAIST Press.Google Scholar
- Zheng, R., Shi, K. & Li, S. (2009). The influence factors and mechanism of societal risk perception. In: Zhou J (ed.), Proceedings of the 1st International Conference on Complex Sciences: Theory and Application, 2266–2275, Shanghai, February 23-25, 2009, Springer Berlin Heidelberg.Google Scholar