Annals of Operations Research

, Volume 263, Issue 1–2, pp 501–527 | Cite as

Evaluating the importance of different communication types in romantic tie prediction on social media

  • Matthias Bogaert
  • Michel Ballings
  • Dirk Van den Poel
Data Mining and Analytics


The purpose of this paper is to evaluate which communication types on social media are most indicative for romantic tie prediction. In contrast to analyzing communication as a composite measure, we take a disaggregated approach by modeling separate measures for commenting, liking and tagging focused on an alter’s status updates, photos, videos, check-ins, locations and links. To ensure that we have the best possible model we benchmark 8 classifiers using different data sampling techniques. The results indicate that we can predict romantic ties with very high accuracy. The top performing classification algorithm is adaboost with an accuracy of up to 97.89 %, an AUC of up to 97.56 %, a G-mean of up to 81.81 %, and a F-measure of up to 81.45 %. The top drivers of romantic ties were related to socio-demographic similarity and the frequency and recency of commenting, liking and tagging on photos, albums, videos and statuses. Previous research has largely focused on aggregate measures whereas this study focuses on disaggregate measures. Therefore, to the best of our knowledge, this study is the first to provide such an extensive analysis of romantic tie prediction on social media.


Online social network Tie strength Predictive models Social media Machine learning Facebook 



The authors are thankful to the three anonymous reviewers whose comments have helped significantly improve an earlier version of this paper. The authors are also grateful to the Guest Editor of the Data Mining & Analytics Special Issue, Dr. Asil Oztekin, for his guidance and very timely management of this manuscript.


  1. Alpaydin, E. (1998). Combined 5 \(\times \) 2 cv F test for comparing supervised classification learning algorithms. Neural Computation, 11, 1885–1892.CrossRefGoogle Scholar
  2. Aral, S., & Walker, D. (2014). Tie strength, embeddedness, and social influence: A large-scale networked experiment. Management Science, 60(6), 1352–1370.CrossRefGoogle Scholar
  3. Arnaboldi, V., Conti, M., Passarella, A., & Pezzoni, F. (2012). Analysis of ego network structure in online social networks. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom) (pp. 31–40).Google Scholar
  4. Arnaboldi, V., Conti, M., Passarella, A., & Pezzoni, F. (2013a). Ego networks in Twitter: An experimental analysis. In 2013 Proceedings IEEE INFOCOM (pp. 3459–3464).Google Scholar
  5. Arnaboldi, V., Guazzini, A., & Passarella, A. (2013b). Egocentric online social networks: Analysis of key features and prediction of tie strength in facebook. Computer Communications, 36(10–11), 1130–1144.CrossRefGoogle Scholar
  6. Baatarjav, E.-A., Amin, A., Dantu, R., & Gupta, N. (2010). Are you my friend? [Twitter response estimator]. In 2010 7th IEEE Consumer Communications and Networking Conference (CCNC) (pp. 1–5).Google Scholar
  7. Backstrom, L., & Kleinberg, J. (2014). Romantic partnerships and the dispersion of social ties: A network analysis of relationship status on facebook. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing. CSCW ’14 (pp. 831–841). New York, NY: ACMGoogle Scholar
  8. Ballings, M., & Van Den Poel, D. (2013). Kernel factory: An ensemble of Kernel machines. Expert Systems with Applications, 40(8), 2904–2913.CrossRefGoogle Scholar
  9. Ballings, M., & Van den Poel, D. (2015). CRM in social media: Predicting increases in facebook usage frequency. European Journal of Operational Research, 244(1), 248–260.CrossRefGoogle Scholar
  10. Ballings, M., & Van Den Poel, D. (2015a). R-package kernelFactory: Kernel factory: An ensemble of Kernel machines.Google Scholar
  11. Ballings, M., & Van Den Poel, D. (2015b). R-package rotationForest: Fit and deploy rotation forest models.Google Scholar
  12. Ballings, M., Van den Poel, D., & Bogaert, M. (2016). Social media optimization: Identifying an optimal strategy for increasing network size on facebook. Omega, 59(Part A), 15–25.CrossRefGoogle Scholar
  13. Baym, N. K., & Ledbetter, A. (2009). Tunes that bind? Information, Communication and Society, 12(3), 408–427.CrossRefGoogle Scholar
  14. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.CrossRefGoogle Scholar
  15. Berk, R. A. (2008). Statistical learning from a regression perspective. New York: Springer.Google Scholar
  16. Beygelzimer, A., Kakadet, S., Langford, J., Arya, S., & Mount, D. (2013). R-package FNN: Fast nearest neighbor search algorithms and applications.Google Scholar
  17. Bogaert, M., Ballings, M., & Van den Poel, D. (2015). The added value of facebook friends data in event attendance prediction. Decision Support Systems.Google Scholar
  18. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.Google Scholar
  19. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefGoogle Scholar
  20. Burez, J., & Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications, 36(3), 4626–4636.CrossRefGoogle Scholar
  21. Burke, M., & Kraut, R. E. (2014). Growing closer on facebook: Changes in tie strength through social network site use. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems. CHI ’14 (pp. 4187–4196). New York, NY: ACMGoogle Scholar
  22. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.Google Scholar
  23. Choi, J.-H., Kang, D.-o., Jung, J., & Bae, C. (2014). Investigating correlations between human social relationships and online communications. In 2014 International Conference on Information and Communication Technology Convergence (ICTC) (pp. 736–737).Google Scholar
  24. Culp, M., Johnson, K., & Michailidis, A. G. (2012). ada: An R package for stochastic boosting.Google Scholar
  25. De Meo, P., Ferrara, E., Fiumara, G., & Provetti, A. (2014). On facebook most ties are weak. Communications of the ACM, 57(11), 78–84.CrossRefGoogle Scholar
  26. de Vries, L., Gensler, S., & Leeflang, P. S. H. (2012). Popularity of brand posts on brand fan pages: An investigation of the effects of social media marketing. Journal of Interactive Marketing, 26(2), 83–91.CrossRefGoogle Scholar
  27. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.Google Scholar
  28. Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple Classifier Systems. No. 1857 in Lecture Notes in Computer Science (pp. 1–15). Berlin, Heidelberg: Springer. doi: 10.1007/3-540-45014-9_1.
  29. Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C., & Kuncheva, L. I. (2015). Random balance: ensembles of variable priors classifiers for imbalanced data. Knowledge Based Systems, 85, 96–111.CrossRefGoogle Scholar
  30. Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: A methodology review. Journal of Biomedical Informatics, 35(5–6), 352–359.CrossRefGoogle Scholar
  31. Dunbar, R. I. M., Arnaboldi, V., Conti, M., & Passarella, A. (2015). The structure of online social networks mirrors those in the offline world. Social Networks, 43, 39–47.CrossRefGoogle Scholar
  32. Dunbar, R. I. M., & Spoors, M. (1995). Social networks, support cliques, and kinship. Human Nature, 6(3), 273–290.CrossRefGoogle Scholar
  33. Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52–64.CrossRefGoogle Scholar
  34. Freund, Y et al. (1996). Experiments with a new boosting algorithm. In ICML. Vol. 96.Google Scholar
  35. Friedman, J., Hastie, T., Simon, N., & Tibshirani, R. (2015). R-package glmnet: Lasso and elastic-net regularized generalized linear models.Google Scholar
  36. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378.CrossRefGoogle Scholar
  37. Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22(9), 1365–1381.CrossRefGoogle Scholar
  38. Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1), 86–92.CrossRefGoogle Scholar
  39. Gilbert, E. (2012). Predicting tie strength in a new medium. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. CSCW ’12 (pp. 1047–1056). New York, NY: ACMGoogle Scholar
  40. Gilbert, E., & Karahalios, K. (2009). Predicting tie strength with social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’09 (pp. 211–220). New York, NY: ACMGoogle Scholar
  41. Granovetter, M. S. (1973). The strength of weak ties. American journal of sociology, 1360–1380.Google Scholar
  42. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.CrossRefGoogle Scholar
  43. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.CrossRefGoogle Scholar
  44. Hernandez-Orallo, J., Flach, P., & Ferri, C. (2012). A unified view of performance metrics: Translating threshold choice into expected classification loss. Journal of Machine Learning Research, 13, 2813–2869.Google Scholar
  45. Hill, R. A., & Dunbar, R. I. M. (2003). Social network size in humans. Human Nature, 14(1), 53–72.CrossRefGoogle Scholar
  46. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to statistical learning: with applications in R (1st ed.). New York: Springer.CrossRefGoogle Scholar
  47. Janitza, S., Strobl, C., & Boulesteix, A.-L. (2013). An AUC-based permutation variable importance measure for random forests. BMC Bioinformatics, 14, 119.CrossRefGoogle Scholar
  48. Jeners, N., Nicolaescu, P., & Prinz, W. (2012). Analyzing tie-strength across different media. In P. Herrero, H. Panetto, R. Meersman, & T. Dillon (Eds.), On the move to meaningful internet systems: OTM 2012 workshops (pp. 554–563)., No. 7567 in lecture notes in computer science Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
  49. Jones, J. J., Settle, J. E., Bond, R. M., Fariss, C. J., Marlow, C., & Fowler, J. H. (2013). Inferring tie strength from online directed behavior. PLoS One, 8(1), e52168.CrossRefGoogle Scholar
  50. Kahanda, I., & Neville, J. (2009). Using transactional information to predict link strength in online social networks. ICWSM, 9, 74–81.Google Scholar
  51. Kemp, S. (2014). Global social media users pass 2 Billion.
  52. Kossinets, G., & Watts, D. J. (2006). Empirical analysis of an evolving social network. Science, 311(5757), 88–90.CrossRefGoogle Scholar
  53. Kwok, L., & Yu, B. (2013). Spreading social media messages on facebook: An analysis of restaurant business-to-consumer communications. Cornell Hospitality Quarterly, 54(1), 84–94.CrossRefGoogle Scholar
  54. Lampe, C. A., Ellison, N., & Steinfield, C. (2007). A familiar face(book): profile elements as signals in an online social network. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’07(pp. 435–444). New York, NY: ACMGoogle Scholar
  55. Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI’92 (pp. 223–228). San Jose, CA: AAAI PressGoogle Scholar
  56. Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: A new social network dataset using Social Networks, 30(4), 330–342.CrossRefGoogle Scholar
  57. Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R news, 2(3), 18–22.Google Scholar
  58. Lin, N., Dayton, P. W., & Greenwald, P. (1978). Analyzing the instrumental use of relations in the context of social structure. Sociological Methods and Research, 7(2), 149–166.CrossRefGoogle Scholar
  59. Liu, X., Shen, H., Ma, F., & Liang, W. (2014). Topical influential user analysis with relationship strength estimation in Twitter. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW) (pp. 1012–1019).Google Scholar
  60. Marsden, P. V., & Campbell, K. E. (1984). Measuring tie strength. Social Forces, 63(2), 482–501.CrossRefGoogle Scholar
  61. McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 415–444.Google Scholar
  62. Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2015). R-package e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien.Google Scholar
  63. Nemenyi, P. (1963). Distribution-free multiple comparisons. Princeton: princeton University.Google Scholar
  64. Ng, A. Y. (2002). On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14 (pp. 841–848). Cambridge: MIT Press.Google Scholar
  65. Novet, J. (2014). Facebook’s Valentine’s Day gift to all of us: Data about our relationships.
  66. Ogata, H., Yano, Y., Furugori, N., & Jin, Q. (2001). Computer supported social networking for augmenting cooperation. Computer Supported Cooperative Work (CSCW), 10(2), 189–209.CrossRefGoogle Scholar
  67. Oztekin, A., Delen, D., Turkyilmaz, A., & Zaim, S. (2013). A machine learning-based usability evaluation method for eLearning systems. Decision Support Systems, 56, 63–73.CrossRefGoogle Scholar
  68. Pappalardo, L., Rossetti, G., & Pedreschi, D. (2012). ’How well do we know each other?’ Detecting tie strength in multidimensional social networks. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 1040–1045).Google Scholar
  69. Ripley, B., & Venables, W. (2015). R-package nnet: Feed-forward neural networks and multinomial log-linear models.Google Scholar
  70. Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: cambridge University Press.CrossRefGoogle Scholar
  71. Roberts, S. G. B., Dunbar, R. I. M., Pollet, T. V., & Kuppens, T. (2009). Exploring variation in active network size: Constraints and ego characteristics. Social Networks, 31(2), 138–146.CrossRefGoogle Scholar
  72. Rodriguez, J., Kuncheva, L., & Alonso, C. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619–1630.CrossRefGoogle Scholar
  73. Servia-Rodríguez, S., Díaz-Redondo, R. P., Fernández-Vilas, A., Blanco-Fernández, Y., & Pazos-Arias, J. J. (2014). A tie strength based model to socially-enhance applications and its enabling implementation: MySocialSphere. Expert Systems with Applications, 41(5), 2582–2594.CrossRefGoogle Scholar
  74. Sevim, C., Oztekin, A., Bali, O., Gumus, S., & Guresen, E. (2014). Developing an early warning system to predict currency crises. European Journal of Operational Research, 237(3), 1095–1104.CrossRefGoogle Scholar
  75. Sheng, D., Sun, T., Wang, S., Wang, Z., & Zhang, M. (2013). Measuring strength of ties in social network. In Y. Ishikawa, J. Li, W. Wang, R. Zhang, & W. Zhang (Eds.), Web technologies and applications (pp. 292–300)., No. 7808 in lecture notes in computer science Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
  76. Spackman, K. A. (1991). Maximum likelihood training of connectionist models: comparison with least squares back-propagation and logistic regression. In Proceedings of the Annual Symposium on Computer Application in Medical Care (pp. 285–289).Google Scholar
  77. Spence, M. (1973). Job market signaling. The Quarterly Journal of Economics, 87(3), 355–374.CrossRefGoogle Scholar
  78. Thorleuchter, D., & Van den Poel, D. (2012). Predicting e-commerce company success by mining the text of its publicly-accessible website. Expert Systems with Applications, 39(17), 13026–13034.CrossRefGoogle Scholar
  79. Trattner, C., & Steurer, M. (2015). Detecting partnership in location-based and online social networks. Social Network Analysis and Mining, 5(1), 1–15.CrossRefGoogle Scholar
  80. Wiese, J., Min, J.-K., Hong, J. I., & Zimmerman, J. (2015). “You never call, you never write”: Call and SMS logs do not always indicate tie strength. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW ’15 (pp. 765–774). New York, NY: ACMGoogle Scholar
  81. Xiang, R., Neville, J., & Rogati, M. (2010). Modeling relationship strength in online social networks. In Proceedings of the 19th International Conference on World Wide Web. WWW ’10 (pp. 981–990). New York, NY: ACMGoogle Scholar
  82. Xu, K., Zou, K., Huang, Y., Yu, X., & Zhang, X. (2016). Mining community and inferring friendship in mobile social networks. Neurocomputing, 174(Part B), 605–616.CrossRefGoogle Scholar
  83. Zhang, H., & Dantu, R. (2010). Predicting social ties in mobile phone networks. In 2010 IEEE International Conference on Intelligence and Security Informatics (ISI) (pp. 25–30).Google Scholar
  84. Zhao, J., Wu, J., Liu, G., Tao, D., Xu, K., & Liu, C. (2014). Being rational or aggressive? A revisit to Dunbar’s number in online social networks. Neurocomputing, 142, 343–353.CrossRefGoogle Scholar
  85. Zhao, X., Yuan, J., Li, G., Chen, X., & Li, Z. (2012). Relationship strength estimation for online social networks with the study on Facebook. Neurocomputing, 95, 89–97.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Matthias Bogaert
    • 1
  • Michel Ballings
    • 2
  • Dirk Van den Poel
    • 1
  1. 1.Department of MarketingGhent UniversityGhentBelgium
  2. 2.Department of Business Analytics and StatisticsThe University of TennesseeKnoxvilleUSA

Personalised recommendations