Science China Mathematics

, Volume 61, Issue 4, pp 627–640 | Cite as

Network-based naive Bayes model for social network

  • Danyang Huang
  • Guoyu Guan
  • Jing Zhou
  • Hansheng Wang
Articles
  • 41 Downloads

Abstract

Naive Bayes (NB) is one of the most popular classification methods. It is particularly useful when the dimension of the predictor is high and data are generated independently. In the meanwhile, social network data are becoming increasingly accessible, due to the fast development of various social network services and websites. By contrast, data generated by a social network are most likely to be dependent. The dependency is mainly determined by their social network relationships. Then, how to extend the classical NB method to social network data becomes a problem of great interest. To this end, we propose here a network-based naive Bayes (NNB) method, which generalizes the classical NB model to social network data. The key advantage of the NNB method is that it takes the network relationships into consideration. The computational effciency makes the NNB method even feasible in large scale social networks. The statistical properties of the NNB model are theoretically investigated. Simulation studies have been conducted to demonstrate its finite sample performance. A real data example is also analyzed for illustration purpose.

Keywords

classification naive Bayes Sina Weibo social network data 

MSC(2010)

62H30 91D30 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 11701560, 11501093, 11631003, 11690012, 71532001, 11525101), the Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China (Grant No. 16XNLF01), the Beijing Municipal Social Science Foundation (Grant No. 17GLC051), Fund for Building World-Class Universities (Disciplines) of Renmin University of China, the Fundamental Research Funds for the Central Universities (Grant Nos. 130028613, 130028729 and 2412017FZ030), China’s National Key Research Special Program (Grant No. 2016YFC0207700) and Center for Statistical Science at Peking University.

References

  1. 1.
    Antonakis A C, Sfakianakis M E. Assessing naïve Bayes as a method for screening credit applicants. J Appl Stat, 2009, 36: 537–545MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res, 2006, 7: 2399–2434MathSciNetMATHGoogle Scholar
  3. 3.
    Bickel P J, Chen A. A nonparametric view of network models and Newman-Girvan and other modularities. Proc Natl Acad Sci USA, 2009, 106: 21068–21073CrossRefMATHGoogle Scholar
  4. 4.
    Breiman L. Random forest. Mach Learn, 2001, 45: 5–32CrossRefMATHGoogle Scholar
  5. 5.
    Buhlmann P, Yu B. Boosting with the L2 loss: Regression and classification. J Amer Statist Assoc, 2003, 98: 324–340MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Choi D, Wolfe P, Airoldi E. Stochastic blockmodels with a growing number of classes. Biometrika, 2012, 99: 273–284MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Craven M, McCallum A, PiPasquo D, et al. Learning to extract symbolic knowledge from the World Wide Web. In: Proceedings of the 15th National Conference on Artificial Intelligence. World Wide Web Internet and Web Information Systems, vol. 118. Menlo Park: Amer Assoc Artif Intell, 1998, 509–516Google Scholar
  8. 8.
    Erdős P, Rényi A. On the evolution of random graphs. Magyar Tud Akad Mat Kutató Int Közl, 1960, 5: 17–61MathSciNetMATHGoogle Scholar
  9. 9.
    Fan J, Feng Y, Jiang J, et al. Feature augmentation via nonparametrics and selection (FANS) in high-dimensional classification. J Amer Statist Assoc, 2016, 111: 275–287MathSciNetCrossRefGoogle Scholar
  10. 10.
    Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Mach Learn, 1997, 29: 131–163CrossRefMATHGoogle Scholar
  11. 11.
    Guan G, Guo J, Wang H. Varying naive Bayes models with applications to classification of Chinese text documents. J Bus Econom Statist, 2014, 32: 445–456MathSciNetCrossRefGoogle Scholar
  12. 12.
    Guan G, Shan N, Guo J. Feature screening for ultrahigh dimensional binary data. Stat Interface, 2018, 11: 41–50MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer, 2001CrossRefMATHGoogle Scholar
  14. 14.
    Holland P W, Leinhardt S. An exponential family of probability distributions for directed graphs. J Amer Statist Assoc, 1981, 76: 33–50MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Hunter D R, Handcock M S. Inference in curved exponential family models for networks. J Comput Graph Statist, 2006, 15: 565–583MathSciNetCrossRefGoogle Scholar
  16. 16.
    Hunter D R, Handcock M S, Butts C T, et al. Ergm: A package to fit, simulate and diagnose exponential-family models for networks. J Statist Softw, 2008, 24: 1–29CrossRefGoogle Scholar
  17. 17.
    Lewis D D. Evaluating and optimizing autonomous text classification systems. In: International Acm Sigir Conference on Research and Development in Information Retrieval. New York: ACM, 1995, 246–254Google Scholar
  18. 18.
    Lewis D D. Naive Bayes at forty: The independence assumption in information retrieval. In: Proceedings of ECML-98, 10th European Conference on Machine Learning. London: Springer-Verlag, 1998, 4–15CrossRefGoogle Scholar
  19. 19.
    Macskassy S A, Provost F. Classification in networked data: A toolkit and a univariate case study. J Mach Learn Res, 2007, 8: 935–983Google Scholar
  20. 20.
    Minnier J, Yuan M, Liu J S, et al. Risk classification with an adaptive naive Bayes kernel machine model. J Amer Statist Assoc, 2015, 110: 393–404MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Neville J, Jensen D. Iterative classification in relational data. In: Proceedings of American Association for Artificial Intelligence Workshop on Learning Statistical Models from Relational Data. Palo Alto: AAAI Press, 2000, 42–49Google Scholar
  22. 22.
    Nowicki K, Snijders T A B. Estimation and prediction for stochastic block structures. J Amer Statist Assoc, 2001, 96: 1077–1087MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Ozuysal M, Calonder M, Lepetit V, et al. Fast keypoint recognition using random ferns. IEEE Trans Pattern Anal Mach Intell, 2010, 32: 448–461CrossRefGoogle Scholar
  24. 24.
    Robins G, Pattison P, Elliott P. Network models for social in uence processes. Psychometrika, 2001, 66: 161–189MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Wang Y J, Wong G Y. Stochastic blockmodels for directed graphs. J Amer Statist Assoc, 1987, 82: 8–19MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press, 1994CrossRefMATHGoogle Scholar
  27. 27.
    Webb G I, Boughton J R, Wang Z. Not so naive Bayes: Aggregating one-dependence estimators. Mach Learn, 2005, 58: 5–24CrossRefMATHGoogle Scholar
  28. 28.
    Wu Y, Liu Y. Robust truncated-hinge-loss support vector machines. J Amer Statist Assoc, 2007, 102: 974–983MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Zaidi N A, Cerquides J, Carman M, et al. Alleviating naive Bayes attribute independence assumption by attribute weighting. J Mach Learn Res, 2013, 14: 1947–1988MathSciNetMATHGoogle Scholar
  30. 30.
    Zanin M, Papo D, Sousa P A, et al. Combining complex networks and data mining: Why and how. Phys Rep, 2016, 635: 1–44MathSciNetCrossRefGoogle Scholar
  31. 31.
    Zheng Z, Webb G I. Lazy learning of Bayesian rules. Mach Learn, 2000, 41: 53–84CrossRefGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  • Danyang Huang
    • 1
  • Guoyu Guan
    • 2
  • Jing Zhou
    • 1
  • Hansheng Wang
    • 3
  1. 1.School of StatisticsRenmin University of ChinaBeijingChina
  2. 2.KLAS of MOE, and School of EconomicsNortheast Normal UniversityChangchunChina
  3. 3.Guanghua School of ManagementPeking UniversityBeijingChina

Personalised recommendations