Advertisement

Which Type of Classifier to Use for Networked Data, Connectivity Based or Feature Based?

  • Zan ZhangEmail author
  • Jiuyong Li
  • Hao Wang
  • Lin Liu
  • Jixue Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11233)

Abstract

Multi-label classification of social network data has become an important problem. Two types of information have been used to classify nodes in a social network: characteristics of nodes, and the connectivity between nodes. Existing classification methods can be categorized to two types too, feature based methods, and connectivity based methods. We observe that there are no one size fits all classification methods, since the performance is data dependent, but in general node’s class labels are determined by two factors, personal preference and peer influence. However, some data sets are personal preference dominated and are suitable for feature based methods, whereas some data sets are peer influence dominated and are suitable for connectivity based methods. The challenge then is how to judge if a data set is personal preference dominated or peer influence dominated, so a suitable classification method can be selected for its classification. In this paper, we develop a causality based criterion to determine the characteristics of a data set. Experiments on real-world data sets demonstrate the criterion can predict the suitability of a classification method for a data set.

Keywords

Networked data Multi-label classification Causal analysis Propensity score 

References

  1. 1.
    Tang, L., Liu, H.: Relational learning via latent social dimensions. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 817–826 (2009)Google Scholar
  2. 2.
    Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognit. 40(7), 2038–2048 (2007)CrossRefGoogle Scholar
  3. 3.
    Macskassy, S., Provost, F.: A simple relational classifier. In: Proceedings of the Second Workshop on Multi-Relational Data Mining at 9th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 64–76 (2003)Google Scholar
  4. 4.
    Nandanwar, S., Murty, M.N.: Structural neighborhood based classification of nodes in a network. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 1085–1094 (2016)Google Scholar
  5. 5.
    Wang, X., Sukthankar, G.: Multi-label relational neighbor classification using social context features. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 464–472 (2013)Google Scholar
  6. 6.
    Aral, S., Muchnik, L., Sundararajan, A.: Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc. Natl. Acad. Sci. 106, 21544–21549 (2009)CrossRefGoogle Scholar
  7. 7.
    McCallum, A.K.: Multi-label text classification with a mixture model trained by EM. In: Working Notes of the AAAI 1999 Workshop on Text Learning, pp. 1–7 (1999)Google Scholar
  8. 8.
    Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognit. 37(9), 1757–1771 (2004)CrossRefGoogle Scholar
  9. 9.
    Chen, Z., Chi, Z., Fu, H., Feng, D.: Multi-instance multi-label image classification: a neural approach. Neurocomputing 99, 298–306 (2013)CrossRefGoogle Scholar
  10. 10.
    Zhao, K., Zhang, H., Ma, Z., Song, Y., Guo, J.: Multi-label learning with prior knowledge for facial expression analysis. Neurocomputing 157, 280–289 (2015)CrossRefGoogle Scholar
  11. 11.
    Zhang, M.L., Peña, J.M., Robles, V.: Feature selection for multi-label naive Bayes classification. Inf. Sci. 179(19), 3218–3229 (2009)CrossRefGoogle Scholar
  12. 12.
    Barutcuoglu, Z., Schapire, R.E., Troyanskaya, O.G.: Hierarchical multi-label prediction of gene function. Bioinformatics 22(7), 830–836 (2006)CrossRefGoogle Scholar
  13. 13.
    Neville, J., Jensen, D.: Iterative classification in relational data. In: Proceedings of the Workshop on Learning Statistical Models from Relational Data at the 17th AAAI National Conference on Artificial Intelligence, pp. 42–49 (2000)Google Scholar
  14. 14.
    Heatherly, R., Kantarcioglu, M., Li, X.: Social network classification incorporating link type. In: Proceedings of the 2009 IEEE International Conference on Intelligence and Security Informatics, pp. 19–24 (2009)Google Scholar
  15. 15.
    Lin, F., Cohen, W.W.: Semi-supervised classification of network data using very few labels. In: Proceedings of the International Conference on Advances in Social Networks Analysis and Mining, pp. 192–199 (2010)Google Scholar
  16. 16.
    Rubin, D.: Comment of D. Basu, Randomization analysis of experimental data: the Fisher randomization test. J. Am. Stat. Assoc. 75, 591–593 (1980)Google Scholar
  17. 17.
    Rubin, D.: Comment of Neyman (1923) and causal inference in experiments and observational studies. Stat. Sci. 5, 472–480 (1990)CrossRefGoogle Scholar
  18. 18.
    Rubin, D.: Causal inference using potential outcomes. J. Am. Stat. Assoc. 100, 322–331 (2005)CrossRefGoogle Scholar
  19. 19.
    Sekhon, J.S.: The Neyman–Rubin model of causal inference and estimation via matching methods. In: The Oxford Handbook of Political Methodology (2007)Google Scholar
  20. 20.
    Rosenbaum, P.R., Rubin, D.: The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55 (1983)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Fan, R., Lin, C.: A study on threshold selection for multi-label classification. Technical report, National Taiwan University (2007)Google Scholar
  22. 22.
    Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)CrossRefGoogle Scholar
  23. 23.
    Daniel, H., Imai, K., King, G., Stuart, E.: MatchIt: nonparametric preprocessing for parametric causal inference. J. Stat. Softw. 42, 1–28 (2011)Google Scholar
  24. 24.
    Hirano, K., Imbens, G., Ridder, G.: Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4), 1161–1189 (2003)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Salemi, B., Noah, S., Aziz, M.: Rfboost: an improved multi-label boosting algorithm and its application to text categorization. Knowl.-Based Syst. 103, 104–117 (2016)CrossRefGoogle Scholar
  26. 26.
    Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.: Multi-label classification of music into emotions. In: Proceedings of the 9th International Conference on Music Information Retrieval, pp. 325–330 (2008)Google Scholar
  27. 27.
    Sanden, C., Zhang, J.: Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 705–714 (2011)Google Scholar
  28. 28.
    Tang, L., Wang, X., Liu, H.: Scalable learning of collective behavior. IEEE Trans. Knowl. Data Eng. 24(6), 1080–1091 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Zan Zhang
    • 1
    • 2
    Email author
  • Jiuyong Li
    • 2
  • Hao Wang
    • 1
  • Lin Liu
    • 2
  • Jixue Liu
    • 2
  1. 1.School of Computer Science and Information EngineeringHefei University of TechnologyHefeiChina
  2. 2.School of Information Technology and Mathematical SciencesUniversity of South AustraliaAdelaideAustralia

Personalised recommendations