Evaluating the importance of different communication types in romantic tie prediction on social media

Bogaert, Matthias; Ballings, Michel; Van den Poel, Dirk

doi:10.1007/s10479-016-2295-0

Evaluating the importance of different communication types in romantic tie prediction on social media

Data Mining and Analytics
Published: 17 August 2016

Volume 263, pages 501–527, (2018)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Matthias Bogaert¹,
Michel Ballings² &
Dirk Van den Poel¹

1176 Accesses
10 Citations
Explore all metrics

Abstract

The purpose of this paper is to evaluate which communication types on social media are most indicative for romantic tie prediction. In contrast to analyzing communication as a composite measure, we take a disaggregated approach by modeling separate measures for commenting, liking and tagging focused on an alter’s status updates, photos, videos, check-ins, locations and links. To ensure that we have the best possible model we benchmark 8 classifiers using different data sampling techniques. The results indicate that we can predict romantic ties with very high accuracy. The top performing classification algorithm is adaboost with an accuracy of up to 97.89 %, an AUC of up to 97.56 %, a G-mean of up to 81.81 %, and a F-measure of up to 81.45 %. The top drivers of romantic ties were related to socio-demographic similarity and the frequency and recency of commenting, liking and tagging on photos, albums, videos and statuses. Previous research has largely focused on aggregate measures whereas this study focuses on disaggregate measures. Therefore, to the best of our knowledge, this study is the first to provide such an extensive analysis of romantic tie prediction on social media.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Online Extremism, Content Adopters, and Interaction Reciprocity

Tweet! – And I Can Tell How Many Followers You Have

Using Network Flows to Identify Users Sharing Extremist Content on Social Media

Notes

All time-related variables are expressed as number of days.

References

Alpaydin, E. (1998). Combined 5 \(\times \) 2 cv F test for comparing supervised classification learning algorithms. Neural Computation, 11, 1885–1892.
Article Google Scholar
Aral, S., & Walker, D. (2014). Tie strength, embeddedness, and social influence: A large-scale networked experiment. Management Science, 60(6), 1352–1370.
Article Google Scholar
Arnaboldi, V., Conti, M., Passarella, A., & Pezzoni, F. (2012). Analysis of ego network structure in online social networks. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom) (pp. 31–40).
Arnaboldi, V., Conti, M., Passarella, A., & Pezzoni, F. (2013a). Ego networks in Twitter: An experimental analysis. In 2013 Proceedings IEEE INFOCOM (pp. 3459–3464).
Arnaboldi, V., Guazzini, A., & Passarella, A. (2013b). Egocentric online social networks: Analysis of key features and prediction of tie strength in facebook. Computer Communications, 36(10–11), 1130–1144.
Article Google Scholar
Baatarjav, E.-A., Amin, A., Dantu, R., & Gupta, N. (2010). Are you my friend? [Twitter response estimator]. In 2010 7th IEEE Consumer Communications and Networking Conference (CCNC) (pp. 1–5).
Backstrom, L., & Kleinberg, J. (2014). Romantic partnerships and the dispersion of social ties: A network analysis of relationship status on facebook. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing. CSCW ’14 (pp. 831–841). New York, NY: ACM
Ballings, M., & Van Den Poel, D. (2013). Kernel factory: An ensemble of Kernel machines. Expert Systems with Applications, 40(8), 2904–2913.
Article Google Scholar
Ballings, M., & Van den Poel, D. (2015). CRM in social media: Predicting increases in facebook usage frequency. European Journal of Operational Research, 244(1), 248–260.
Article Google Scholar
Ballings, M., & Van Den Poel, D. (2015a). R-package kernelFactory: Kernel factory: An ensemble of Kernel machines.
Ballings, M., & Van Den Poel, D. (2015b). R-package rotationForest: Fit and deploy rotation forest models.
Ballings, M., Van den Poel, D., & Bogaert, M. (2016). Social media optimization: Identifying an optimal strategy for increasing network size on facebook. Omega, 59(Part A), 15–25.
Article Google Scholar
Baym, N. K., & Ledbetter, A. (2009). Tunes that bind? Information, Communication and Society, 12(3), 408–427.
Article Google Scholar
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.
Article Google Scholar
Berk, R. A. (2008). Statistical learning from a regression perspective. New York: Springer.
Google Scholar
Beygelzimer, A., Kakadet, S., Langford, J., Arya, S., & Mount, D. (2013). R-package FNN: Fast nearest neighbor search algorithms and applications.
Bogaert, M., Ballings, M., & Van den Poel, D. (2015). The added value of facebook friends data in event attendance prediction. Decision Support Systems.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article Google Scholar
Burez, J., & Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications, 36(3), 4626–4636.
Article Google Scholar
Burke, M., & Kraut, R. E. (2014). Growing closer on facebook: Changes in tie strength through social network site use. In Proceedings of the 32Nd Annual ACM Conference on Human Factors in Computing Systems. CHI ’14 (pp. 4187–4196). New York, NY: ACM
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Google Scholar
Choi, J.-H., Kang, D.-o., Jung, J., & Bae, C. (2014). Investigating correlations between human social relationships and online communications. In 2014 International Conference on Information and Communication Technology Convergence (ICTC) (pp. 736–737).
Culp, M., Johnson, K., & Michailidis, A. G. (2012). ada: An R package for stochastic boosting.
De Meo, P., Ferrara, E., Fiumara, G., & Provetti, A. (2014). On facebook most ties are weak. Communications of the ACM, 57(11), 78–84.
Article Google Scholar
de Vries, L., Gensler, S., & Leeflang, P. S. H. (2012). Popularity of brand posts on brand fan pages: An investigation of the effects of social media marketing. Journal of Interactive Marketing, 26(2), 83–91.
Article Google Scholar
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Google Scholar
Dietterich, T. G. (2000). Ensemble methods in machine learning. In Multiple Classifier Systems. No. 1857 in Lecture Notes in Computer Science (pp. 1–15). Berlin, Heidelberg: Springer. doi:10.1007/3-540-45014-9_1.
Díez-Pastor, J. F., Rodríguez, J. J., García-Osorio, C., & Kuncheva, L. I. (2015). Random balance: ensembles of variable priors classifiers for imbalanced data. Knowledge Based Systems, 85, 96–111.
Article Google Scholar
Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: A methodology review. Journal of Biomedical Informatics, 35(5–6), 352–359.
Article Google Scholar
Dunbar, R. I. M., Arnaboldi, V., Conti, M., & Passarella, A. (2015). The structure of online social networks mirrors those in the offline world. Social Networks, 43, 39–47.
Article Google Scholar
Dunbar, R. I. M., & Spoors, M. (1995). Social networks, support cliques, and kinship. Human Nature, 6(3), 273–290.
Article Google Scholar
Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52–64.
Article Google Scholar
Freund, Y et al. (1996). Experiments with a new boosting algorithm. In ICML. Vol. 96.
Friedman, J., Hastie, T., Simon, N., & Tibshirani, R. (2015). R-package glmnet: Lasso and elastic-net regularized generalized linear models.
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378.
Article Google Scholar
Friedman, J. H., & Meulman, J. J. (2003). Multiple additive regression trees with application in epidemiology. Statistics in Medicine, 22(9), 1365–1381.
Article Google Scholar
Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1), 86–92.
Article Google Scholar
Gilbert, E. (2012). Predicting tie strength in a new medium. In Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. CSCW ’12 (pp. 1047–1056). New York, NY: ACM
Gilbert, E., & Karahalios, K. (2009). Predicting tie strength with social media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’09 (pp. 211–220). New York, NY: ACM
Granovetter, M. S. (1973). The strength of weak ties. American journal of sociology, 1360–1380.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36.
Article Google Scholar
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Article Google Scholar
Hernandez-Orallo, J., Flach, P., & Ferri, C. (2012). A unified view of performance metrics: Translating threshold choice into expected classification loss. Journal of Machine Learning Research, 13, 2813–2869.
Google Scholar
Hill, R. A., & Dunbar, R. I. M. (2003). Social network size in humans. Human Nature, 14(1), 53–72.
Article Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to statistical learning: with applications in R (1st ed.). New York: Springer.
Book Google Scholar
Janitza, S., Strobl, C., & Boulesteix, A.-L. (2013). An AUC-based permutation variable importance measure for random forests. BMC Bioinformatics, 14, 119.
Article Google Scholar
Jeners, N., Nicolaescu, P., & Prinz, W. (2012). Analyzing tie-strength across different media. In P. Herrero, H. Panetto, R. Meersman, & T. Dillon (Eds.), On the move to meaningful internet systems: OTM 2012 workshops (pp. 554–563)., No. 7567 in lecture notes in computer science Berlin, Heidelberg: Springer.
Chapter Google Scholar
Jones, J. J., Settle, J. E., Bond, R. M., Fariss, C. J., Marlow, C., & Fowler, J. H. (2013). Inferring tie strength from online directed behavior. PLoS One, 8(1), e52168.
Article Google Scholar
Kahanda, I., & Neville, J. (2009). Using transactional information to predict link strength in online social networks. ICWSM, 9, 74–81.
Google Scholar
Kemp, S. (2014). Global social media users pass 2 Billion. http://wearesocial.net/blog/2014/08/global-social-media-users-pass-2-billion/.
Kossinets, G., & Watts, D. J. (2006). Empirical analysis of an evolving social network. Science, 311(5757), 88–90.
Article Google Scholar
Kwok, L., & Yu, B. (2013). Spreading social media messages on facebook: An analysis of restaurant business-to-consumer communications. Cornell Hospitality Quarterly, 54(1), 84–94.
Article Google Scholar
Lampe, C. A., Ellison, N., & Steinfield, C. (2007). A familiar face(book): profile elements as signals in an online social network. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’07(pp. 435–444). New York, NY: ACM
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI’92 (pp. 223–228). San Jose, CA: AAAI Press
Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A., & Christakis, N. (2008). Tastes, ties, and time: A new social network dataset using Facebook.com. Social Networks, 30(4), 330–342.
Article Google Scholar
Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R news, 2(3), 18–22.
Google Scholar
Lin, N., Dayton, P. W., & Greenwald, P. (1978). Analyzing the instrumental use of relations in the context of social structure. Sociological Methods and Research, 7(2), 149–166.
Article Google Scholar
Liu, X., Shen, H., Ma, F., & Liang, W. (2014). Topical influential user analysis with relationship strength estimation in Twitter. In 2014 IEEE International Conference on Data Mining Workshop (ICDMW) (pp. 1012–1019).
Marsden, P. V., & Campbell, K. E. (1984). Measuring tie strength. Social Forces, 63(2), 482–501.
Article Google Scholar
McPherson, M., Smith-Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 415–444.
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2015). R-package e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien.
Nemenyi, P. (1963). Distribution-free multiple comparisons. Princeton: princeton University.
Google Scholar
Ng, A. Y. (2002). On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems 14 (pp. 841–848). Cambridge: MIT Press.
Google Scholar
Novet, J. (2014). Facebook’s Valentine’s Day gift to all of us: Data about our relationships. http://venturebeat.com/2014/02/15/facebooks-valentines-day-gift-to-all-of-us-data-about-our-relationships/.
Ogata, H., Yano, Y., Furugori, N., & Jin, Q. (2001). Computer supported social networking for augmenting cooperation. Computer Supported Cooperative Work (CSCW), 10(2), 189–209.
Article Google Scholar
Oztekin, A., Delen, D., Turkyilmaz, A., & Zaim, S. (2013). A machine learning-based usability evaluation method for eLearning systems. Decision Support Systems, 56, 63–73.
Article Google Scholar
Pappalardo, L., Rossetti, G., & Pedreschi, D. (2012). ’How well do we know each other?’ Detecting tie strength in multidimensional social networks. In 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 1040–1045).
Ripley, B., & Venables, W. (2015). R-package nnet: Feed-forward neural networks and multinomial log-linear models.
Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge: cambridge University Press.
Book Google Scholar
Roberts, S. G. B., Dunbar, R. I. M., Pollet, T. V., & Kuppens, T. (2009). Exploring variation in active network size: Constraints and ego characteristics. Social Networks, 31(2), 138–146.
Article Google Scholar
Rodriguez, J., Kuncheva, L., & Alonso, C. (2006). Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619–1630.
Article Google Scholar
Servia-Rodríguez, S., Díaz-Redondo, R. P., Fernández-Vilas, A., Blanco-Fernández, Y., & Pazos-Arias, J. J. (2014). A tie strength based model to socially-enhance applications and its enabling implementation: MySocialSphere. Expert Systems with Applications, 41(5), 2582–2594.
Article Google Scholar
Sevim, C., Oztekin, A., Bali, O., Gumus, S., & Guresen, E. (2014). Developing an early warning system to predict currency crises. European Journal of Operational Research, 237(3), 1095–1104.
Article Google Scholar
Sheng, D., Sun, T., Wang, S., Wang, Z., & Zhang, M. (2013). Measuring strength of ties in social network. In Y. Ishikawa, J. Li, W. Wang, R. Zhang, & W. Zhang (Eds.), Web technologies and applications (pp. 292–300)., No. 7808 in lecture notes in computer science Berlin, Heidelberg: Springer.
Chapter Google Scholar
Spackman, K. A. (1991). Maximum likelihood training of connectionist models: comparison with least squares back-propagation and logistic regression. In Proceedings of the Annual Symposium on Computer Application in Medical Care (pp. 285–289).
Spence, M. (1973). Job market signaling. The Quarterly Journal of Economics, 87(3), 355–374.
Article Google Scholar
Thorleuchter, D., & Van den Poel, D. (2012). Predicting e-commerce company success by mining the text of its publicly-accessible website. Expert Systems with Applications, 39(17), 13026–13034.
Article Google Scholar
Trattner, C., & Steurer, M. (2015). Detecting partnership in location-based and online social networks. Social Network Analysis and Mining, 5(1), 1–15.
Article Google Scholar
Wiese, J., Min, J.-K., Hong, J. I., & Zimmerman, J. (2015). “You never call, you never write”: Call and SMS logs do not always indicate tie strength. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing. CSCW ’15 (pp. 765–774). New York, NY: ACM
Xiang, R., Neville, J., & Rogati, M. (2010). Modeling relationship strength in online social networks. In Proceedings of the 19th International Conference on World Wide Web. WWW ’10 (pp. 981–990). New York, NY: ACM
Xu, K., Zou, K., Huang, Y., Yu, X., & Zhang, X. (2016). Mining community and inferring friendship in mobile social networks. Neurocomputing, 174(Part B), 605–616.
Article Google Scholar
Zhang, H., & Dantu, R. (2010). Predicting social ties in mobile phone networks. In 2010 IEEE International Conference on Intelligence and Security Informatics (ISI) (pp. 25–30).
Zhao, J., Wu, J., Liu, G., Tao, D., Xu, K., & Liu, C. (2014). Being rational or aggressive? A revisit to Dunbar’s number in online social networks. Neurocomputing, 142, 343–353.
Article Google Scholar
Zhao, X., Yuan, J., Li, G., Chen, X., & Li, Z. (2012). Relationship strength estimation for online social networks with the study on Facebook. Neurocomputing, 95, 89–97.
Article Google Scholar

Download references

Acknowledgments

The authors are thankful to the three anonymous reviewers whose comments have helped significantly improve an earlier version of this paper. The authors are also grateful to the Guest Editor of the Data Mining & Analytics Special Issue, Dr. Asil Oztekin, for his guidance and very timely management of this manuscript.

Author information

Authors and Affiliations

Department of Marketing, Ghent University, Tweekerkenstraat 2, 9000, Ghent, Belgium
Matthias Bogaert & Dirk Van den Poel
Department of Business Analytics and Statistics, The University of Tennessee, 249 Stokely Management Center, 916 Volunteer Blvd, Knoxville, TN, 37996, USA
Michel Ballings

Authors

Matthias Bogaert
View author publications
You can also search for this author in PubMed Google Scholar
Michel Ballings
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Van den Poel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michel Ballings.

Appendix

See Table 8.

Table 8 Median \(5\times 2\)cv accuracy, G-mean, F-measure and AUC per algorithm and per data sampling technique

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bogaert, M., Ballings, M. & Van den Poel, D. Evaluating the importance of different communication types in romantic tie prediction on social media. Ann Oper Res 263, 501–527 (2018). https://doi.org/10.1007/s10479-016-2295-0

Download citation

Published: 17 August 2016
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10479-016-2295-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating the importance of different communication types in romantic tie prediction on social media

Abstract

Access this article

Similar content being viewed by others

Predicting Online Extremism, Content Adopters, and Interaction Reciprocity

Tweet! – And I Can Tell How Many Followers You Have

Using Network Flows to Identify Users Sharing Extremist Content on Social Media

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating the importance of different communication types in romantic tie prediction on social media

Abstract

Access this article

Similar content being viewed by others

Predicting Online Extremism, Content Adopters, and Interaction Reciprocity

Tweet! – And I Can Tell How Many Followers You Have

Using Network Flows to Identify Users Sharing Extremist Content on Social Media

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation