Combining Behaviors and Demographics to Segment Online Audiences: Experiments with a YouTube Channel

  • Bernard J. Jansen
  • Soon-gyo Jung
  • Joni SalminenEmail author
  • Jisun An
  • Haewoon Kwak
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11193)


Social media channels with audiences in the millions are increasingly common. Efforts at segmenting audiences for populations of these sizes can result in hundreds of audience segments, as the compositions of the overall audiences tend to be complex. Although understanding audience segments is important for strategic planning, tactical decision making, and content creation, it is unrealistic for human decision makers to effectively utilize hundreds of audience segments in these tasks. In this research, we present efforts at simplifying the segmentation of audience populations to increase their practical utility. Using millions of interactions with hundreds of thousands of viewers with an organization’s online content collection, we first isolate the maximum number of audience segments, based on behavioral profiling, and then demonstrate a computational approach of using non-negative matrix factorization to reduce this number to 42 segments that are both impactful and representative segments of the overall population. Initial results are promising, and we present avenues for future research leveraging our approach.


Audience segmentation Audience analytics User profiling 


  1. 1.
    Nguyen, T., Zhou, L., Spiegler, V., Ieromonachou, P., Lin, Y.: Big data analytics in supply chain management: a state-of-the-art literature review. Comput. Oper. Res., 254–264 (2017)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Agarwal, R., Dhar, V.: Editorial—big data, data science, and analytics: the opportunity and challenge for IS research. Inf. Syst. Res. 25, 443–448 (2014)CrossRefGoogle Scholar
  3. 3.
    Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manag. 35, 137–144 (2015)CrossRefGoogle Scholar
  4. 4.
    Edwards, J.S., Taborda, E.R.: Using knowledge management to give context to analytics and big data and reduce strategic risk. Proc. Comput. Sci. 99, 36–49 (2016)CrossRefGoogle Scholar
  5. 5.
    Hendahewa, C., Shah, C.: Evaluating user search trails in exploratory search tasks. Inf. Process. Manag. 53, 905–922 (2017)CrossRefGoogle Scholar
  6. 6.
    Salminen, J., et al.: From 2,772 segments to five personas: summarizing a diverse online audience by generating culturally adapted personas. First Monday 23 (2018). Article no. 8415Google Scholar
  7. 7.
    Sweller, J.: Cognitive load during problem solving: effects on learning. Cogn. Sci. 12, 257–285 (1988)CrossRefGoogle Scholar
  8. 8.
    Cho, M., Auger, G.A.: Extrovert and engaged? Exploring the connection between personality and involvement of stakeholders and the perceived relationship investment of nonprofit organizations. Publ. Relat. Rev. 43, 729–737 (2017)CrossRefGoogle Scholar
  9. 9.
    Shafto, A.: Mastering audience segmentation: how to apply segmentation techniques to improve internal communication. Melcrum (2006)Google Scholar
  10. 10.
    Stern, B.B.: A revised communication model for advertising: multiple dimensions of the source, the message, and the recipient. J. Advert. 23, 5–15 (1994)CrossRefGoogle Scholar
  11. 11.
    Smith, W.R.: Product differentiation and market segmentation as alternative marketing strategies. J. Mark. 21, 3–8 (1956)CrossRefGoogle Scholar
  12. 12.
    Ortiz-Cordova, A., Jansen, B.J.: Classifying web search queries to identify high revenue generating customers. J. Am. Soc. Inf. Sci. Technol. 63, 1426–1441 (2012)CrossRefGoogle Scholar
  13. 13.
    Tkaczynski, A., Rundle-Thiele, S.R., Prebensen, N.K.: To segment or not? That is the question. J. Vacat. Mark. 24, 16–28 (2018)CrossRefGoogle Scholar
  14. 14.
    An, J., Kwak, H.: Multidimensional analysis of the news consumption of different demographic groups on a nationwide scale. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10539, pp. 124–142. Springer, Cham (2017). Scholar
  15. 15.
    Jansen, B.J., Booth, D.: Classifying web queries by topic and user intent. In: CHI 2010 Extended Abstracts on Human Factors in Computing Systems, pp. 4285–4290. ACM, New York (2010)Google Scholar
  16. 16.
    Liu, Z., Jansen, B.J.: Questioner or question: predicting the response rate in social question and answering on Sina Weibo. Inf. Process. Manag. 54, 159–174 (2018)CrossRefGoogle Scholar
  17. 17.
    Gonzalez Camacho, L.A., Alves-Souza, S.N.: Social network data to alleviate cold-start in recommender system: a systematic review. Inf. Process. Manag. 54, 529–544 (2018)CrossRefGoogle Scholar
  18. 18.
    Nguyen, H.T., Le Nguyen, M.: Multilingual opinion mining on YouTube—a convolutional N-gram BiLSTM word embedding. Inf. Process. Manag. 54, 451–462 (2018)CrossRefGoogle Scholar
  19. 19.
    Han, S., He, D., Chi, Y.: Understanding and modeling behavior patterns in cross-device web search. Proc. Assoc. Inf. Sci. Technol. 54, 150–158 (2017)CrossRefGoogle Scholar
  20. 20.
    Garcia, D., Abisheva, A., Schweitzer, F.: Evaluative patterns and incentives in YouTube. In: Ciampaglia, G.L., Mashhadi, A., Yasseri, T. (eds.) SocInfo 2017. LNCS, vol. 10540, pp. 301–315. Springer, Cham (2017). Scholar
  21. 21.
    Zhou, Q., Zhang, C.: Detecting dietary preference of social media users in China via sentiment analysis. Proc. Assoc. Inf. Sci. Technol. 54, 523–527 (2017)CrossRefGoogle Scholar
  22. 22.
    Fletcher, R., Nielsen, R.K.: Are news audiences increasingly fragmented? A cross-national comparative analysis of cross-platform news audience fragmentation and duplication. J. Commun. 67, 476–498 (2017)CrossRefGoogle Scholar
  23. 23.
    Lo, S.L., Chiong, R., Cornforth, D.: Ranking of high-value social audiences on Twitter. Decis. Support Syst. 85, 34–48 (2016)CrossRefGoogle Scholar
  24. 24.
    Araújo, C.S., Magno, G., Meira Jr., W., Almeida, V., Hartung, P., Doneda, D.: Characterizing videos, audience and advertising in Youtube channels for kids (2017). arXiv:1707.00971 [cs]Google Scholar
  25. 25.
    Salminen, J., Jung, S.-G., An, J., Kwak, H., Jansen, B.J.: Findings of a user study of automatically generated personas. In: Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, pp. LBW097:1–LBW097:6. ACM, New York (2018)Google Scholar
  26. 26.
    Burkell, J., Fortier, A.: Could we do better? Behavioural tracking on recommended consumer health websites. Health Inf. Libr. J. 32, 182–194 (2015)CrossRefGoogle Scholar
  27. 27.
    Kim, Y., Miller, A., Chon, M.-G.: Communicating with key publics in crisis communication: the synthetic approach to the public segmentation in CAPS (communicative action in problem solving). J. Conting. Crisis Manag. 24, 82–94 (2016)CrossRefGoogle Scholar
  28. 28.
    Nelson, J.L.: And deliver us to segmentation. J. Pract. 12, 204–219 (2018)Google Scholar
  29. 29.
    Ashley, C., Tuten, T.: Creative strategies in social media marketing: an exploratory study of branded social content and consumer engagement. Psychol. Mark. 32, 15–27 (2015)CrossRefGoogle Scholar
  30. 30.
    Nielsen, L., Storgaard Hansen, K.: Personas is applicable: a study on the use of personas in Denmark. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1665–1674. ACM (2014)Google Scholar
  31. 31.
    An, J., Kwak, H., Jansen, B.J.: Personas for content creators via decomposed aggregate audience statistics. In: Proceedings of Advances in Social Network Analysis and Mining (ASONAM 2017), Sydney, Australia (2017)Google Scholar
  32. 32.
    Jung, S.-G., An, J., Kwak, H., Ahmad, M., Nielsen, L., Jansen, B.J.: Persona generation from aggregated social media data. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 1748–1755. ACM, New York (2017)Google Scholar
  33. 33.
    Jansen, B.J., An, J., Kwak, H., Salminen, J., Jung, S.-G.: Viewed by too many or viewed too little: using information dissemination for audience segmentation. Presented at the Association for Information Science and Technology Annual Meeting 2017 (ASIST2017), Washington DC, USA, 27 November 2017Google Scholar
  34. 34.
    Miller, G.A.: The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol. Rev. 63, 81–97 (1956)CrossRefGoogle Scholar
  35. 35.
    Salminen, J., et al.: Generating cultural personas from social data: a perspective of middle eastern users. In: Proceedings of the Fourth International Symposium on Social Networks Analysis, Management and Security (SNAMS-2017), Prague, Czech Republic (2017)Google Scholar
  36. 36.
    AL-Smadi, M., Jaradat, Z., AL-Ayyoub, M., Jararweh, Y.: Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features. Inf. Process. Manag. 53, 640–652 (2017)CrossRefGoogle Scholar
  37. 37.
    Jansen, B.J., Sobel, K., Cook, G.: Classifying ecommerce information sharing behaviour by youths on social networking sites. J. Inf. Sci. 37, 120–136 (2011)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Bernard J. Jansen
    • 1
  • Soon-gyo Jung
    • 1
  • Joni Salminen
    • 1
    • 2
    Email author
  • Jisun An
    • 1
  • Haewoon Kwak
    • 1
  1. 1.Qatar Computing Research InstituteHamad Bin Khalifa UniversityDohaQatar
  2. 2.University of TurkuTurkuFinland

Personalised recommendations