Ensemble of multiple kNN classifiers for societal risk classification

Chen, Jindong; Tang, Xijin

doi:10.1007/s11518-017-5346-4

Ensemble of multiple kNN classifiers for societal risk classification

Published: 22 April 2017

Volume 26, pages 433–447, (2017)
Cite this article

Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

Jindong Chen^1,2 &
Xijin Tang¹

214 Accesses
3 Citations
Explore all metrics

Abstract

Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor (kNN) classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector (kNN-PV) shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble of SVM Classifiers with Different Representations for Societal Risk Classification

An Empirical Feasibility Study of Societal Risk Classification Toward BBS Posts

Article 27 June 2018

The distributed representation for societal risk classification toward BBS posts

Article 29 December 2016

References

Bao, Y., Ishii, N. & Du, X. (2004). Combining multiple k-nearest neighbor classifiers using different distance functions. In: Yang ZR, Yin HJ & Everson RM (eds.), Proceedings Intelligent Data Engineering and Automated Learning–IDEAL 2004, 634-641, Exeter, August 25-27, 2004, Springer Berlin Heidelberg.
Google Scholar
Bay, S.D. (1999). Combining nearest neighbor classifiers through multiple feature subsets. Intelligent Data Analysis, 3(3): 191–209.
Article Google Scholar
Bengio, Y., Ducharme, R., Vincent, P. & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3: 1137–1155.
MATH Google Scholar
Bijalwan, V., Kumar, V., Kumari, P. & Pascual, J. (2014). KNN based machine learning approach for text and document mining. International Journal of Database Theory and Application, 7(1): 61–70.
Article Google Scholar
Cao, L.N. & Tang, X.J. (2014). Topics and threads of the online public concerns based on Tianya Forum. Journal of Systems Science and Systems Engineering, 23(2): 212–230.
Article Google Scholar
Chen, J.D. & Tang, X.J. (2014a). Exploring societal risk classification of the posts of Tianya Club. International Journal of Knowledge and Systems Science, 5(1): 36–48.
Article MathSciNet Google Scholar
Chen, J.D. & Tang, X.J. (2014b). Societal risk classification of post based on paragraph vector and kNN method. In: Wang S Y, Nakamori Y & Huynh V N (eds.), Proceedings of the 15th International Symposium on Knowledge and Systems Sciences, 117–123, Sapporo, November 1-2, 2014, JAIST Press.
Google Scholar
Chen, J.D. & Tang, X.J. (2017). The distributed representation for societal risk classification toward BBS posts. Journal of Systems Science & Complexity. DOI:10.1007/s11424-016-5099-z.
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12: 2461–2505.
MATH Google Scholar
Hirsch, L., Hirsch, R. & Saeedi, M. (2007). Evolving Lucene search queries for text classification. In: Proceedings of 2007 Genetic and Evolutionary Computation Conference, 1604–1611, London, July 7 -11, 2007, ACM.
Google Scholar
Hu, Y. & Tang, X.J. (2013). Using support vector machine for classification of Baidu hot word. In: Wang MZ (ed.), Knowledge Science, Engineering and Management (KSEM2013), LNCS, 8041: 580–590, August 10-12, 2013, Dalian, Springer Berlin Heidelberg.
Chapter Google Scholar
Jeffrey, P., Richard, S. & Christopher, M. (2014). Glove: global vectors for wordrepresentation. In: Proceedings of the Empirical Methods in Natural Language Processing, 1532–1543, Doha, October 25-29, 2014, Association for Computational Linguistics.
Google Scholar
Le, Q. & Mikolov, T. (2014). Distributed representations of sentences and documents. Computer Science, 4: 1188–1196.
Google Scholar
Nie, D., Guan, Z., Hao, B., Bai, S. & Zhu, T.S. (2014). Predicting personality on social media with semi-supervised learning. In: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, 2: 158–165, Warsaw, August 11 -14, 2014, IEEE Computer Society.
Google Scholar
Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. Paper presented at International Conference on Learning Representations, Scottsdale, 1-12, 2013.
Google Scholar
Qiu, L., Cao, Y., Nie, Z.Q. & Rui, Y. (2014). Learning word representation considering proximity and ambiguity. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, 1572–1578, Québec, July 27 -31, 2014, AAAI Press.
Google Scholar
Rodriguez, M.G., Gummadi, K. & Schoelkopf, B. (2014). Quantifying information overload in social media and its impact on social contagions. arXiv preprint arXiv:1403.6838.
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1): 1–47.
Article Google Scholar
Shi, W., Wang, H.W. & He, S.Y. (2013). Sentiment analysis of Chinese micro-blogging based on sentiment ontology a case study of ‘7.23 Wenzhou Train Collision’. Connection Science, 25(4): 161–178.
Article Google Scholar
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. & Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1631–1642, Washington, 18-21 October 2013, Association for Computational Linguistics.
Google Scholar
Tang, X.J. (2013). Exploring online societal risk perception for harmonious society measurement. Journal of Systems Science and Systems Engineering, 22(4): 469–486.
Article Google Scholar
Wagner, R. & Fischer, M. (1974). The string-to-string correction problem. Journal of the ACM. 21(1): 168–178.
Article MathSciNet MATH Google Scholar
Wen, S.Y. & Wan, X.J. (2014). Emotion classification in Microblog texts using class sequential rules. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence, 187–193, Québec, July 27 -31, 2014, AAAI Press.
Google Scholar
Zhang, W., Yoshida, T. & Tang, X.J. (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems, 21(8): 879–886.
Article Google Scholar
Zhang, W., Yoshida, T. & Tang, X.J. (2011). A comparative study of TF* IDF, LSI and multi-words for text classification. Expert Systems with Applications, 38(3): 2758–2765.
Article Google Scholar
Zhao, Y.L. & Tang, X.J. (2013). A preliminary research of pattern of users’ behavior based on Tianya Forum. In: Proceedings of the 14th International Symposium on Knowledge and Systems Sciences, 139–145, Ningbo, Oct. 25-27, 2013, JAIST Press.
Google Scholar
Zheng, R., Shi, K. & Li, S. (2009). The influence factors and mechanism of societal risk perception. In: Zhou J (ed.), Proceedings of the 1st International Conference on Complex Sciences: Theory and Application, 2266–2275, Shanghai, February 23-25, 2009, Springer Berlin Heidelberg.
Google Scholar

Download references

Acknowledgement

This study is supported by the National Key Research and Development Program of China under grant No. 2016YFB1000902 and National Natural Science Foundation of China under grant Nos. 61473284, 71601023 and 71371107.

Author information

Authors and Affiliations

Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China
Jindong Chen & Xijin Tang
China Aerospace Academy of Systems Science and Engineering, Beijing, 100048, China
Jindong Chen

Authors

Jindong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xijin Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xijin Tang.

Additional information

Jindong Chen is a senior engineer in China Aerospace Academy of Systems Science and Engineering. He received his BEng (2006) on electrical engineering and automation, and PhD (2013) on control theory and control engineering from Jiangnan University. After his PhD, he worked for two years as a post-doctoral fellow at CAS Academy of Mathematics and Systems Science, where he investigated the mechanism of societal risk, and the effective methods for societal risk identification. His research interests include knowledge science, systems science, text mining.

Xijin Tang is a full professor in the Academy of Mathematics and Systems Science, Chinese Academy of Sciences. She received her BEng (1989) on computer science and engineering from Zhejiang University, MEng (1992) on management science and engineering from University of Science and Technology of China and PhD (1995) from CAS Institute of Systems Science. During her early systems research and practice, she developed several decision support systems for water resources management, weapon system evaluation, e-commerce evaluation, etc. Her recent interests are meta-synthesis and advanced modeling, decision support systems, opinion dynamics and opinion mining, systems approaches to societal complex problems, knowledge management and creativity support systems. She co-authored and published two influential books on meta-synthesis system approach and an oriental systems approach in Chinese. She was one of 99 who won the 10th National Award for Youth in Science and Technology in China in 2007. Currently Professor Tang is one of vice presidents and the secretary general of International Society for Knowledge and Systems Sciences (ISKSS), which is one member of International Federation for Systems Studies. She serves for Chinese Journal of Systems Engineering as deputy editor-in-chief, Journal of Systems Science and Complexity, Journal of System Science and Mathematical Science (Chinese series), and Systema as an Editorial Board member.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Tang, X. Ensemble of multiple kNN classifiers for societal risk classification. J. Syst. Sci. Syst. Eng. 26, 433–447 (2017). https://doi.org/10.1007/s11518-017-5346-4

Download citation

Published: 22 April 2017
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11518-017-5346-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ensemble of multiple kNN classifiers for societal risk classification

Abstract

Access this article

Similar content being viewed by others

Ensemble of SVM Classifiers with Different Representations for Societal Risk Classification

An Empirical Feasibility Study of Societal Risk Classification Toward BBS Posts

The distributed representation for societal risk classification toward BBS posts

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ensemble of multiple kNN classifiers for societal risk classification

Abstract

Access this article

Similar content being viewed by others

Ensemble of SVM Classifiers with Different Representations for Societal Risk Classification

An Empirical Feasibility Study of Societal Risk Classification Toward BBS Posts

The distributed representation for societal risk classification toward BBS posts

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation