Automatic Persona Generation for Online Content Creators: Conceptual Rationale and a Research Agenda

  • Joni Salminen
  • Bernard J. Jansen
  • Jisun An
  • Haewoon Kwak
  • Soon-Gyo Jung
Part of the Human–Computer Interaction Series book series (HCIS)


As the quantity of social and online analytics data has drastically increased, a wide variety of methods are deployed to make sense of this data, typically via computational and algorithmic approaches. However, in many cases, these approaches trade one form of complexity for another by ignoring the principles of human cognitive processing. In this perspective manuscript, we propose an approach of employing Personas as an alternative form of making large volumes of online user analytics information useful to end users of the user and customer analytics, with results applicable in software development, business sectors, communication industry, and other domains where understanding online user behavior is deemed important. Toward this end, we have developed a system that automatically generates data-driven Personas from social media and online analytics data, capable of handling hundreds of millions of user interactions from tens of thousands of pieces of content on YouTube, Facebook and Google Analytics, while retaining the privacy of individual users of those channels. Our approach (1) identifies and prioritizes user segments by their online behavior, (2) associates the segments with demographic data, and (3) creates rich Persona profiles by dynamically adding characteristics, such as names, photos, and descriptive quotes. This chapter characterizes the currently open research problems in automatic Persona generation, such as de-aggregation of data, cross-platform data mapping, filtering of toxic comments, and choosing the right information content according to end-user needs. Addressing these problems requires the use of state-of-the-art techniques of computer and information science within one system and benefits greatly from inter-disciplinary collaboration. Overall, the research agenda set in this work aims at achieving the vision for automatic user profiling using diverse online and social media platforms and advanced data processing methods for the end goal of making complex analytics data more useful for human decision makers, especially those working with online content.



We would like to thank the employees of the Al Jazeera Media Network, Qatar Airways, and Qatar Foundation who have collaborated with us on this project.


  1. Agarwal R, Dhar V (2014) Editorial—big data, data science, and analytics: the opportunity and challenge for is research. Inf Syst Res 25(3):443–448CrossRefGoogle Scholar
  2. Aigner J, Durchardt A, Kersting T, Kattenbeck M, Elsweiler D (2017) Manipulating the perception of credibility in refugee related social media posts. In: Proceedings of the 2017 conference on conference human information interaction and retrieval. ACM, New York, NY, USA, pp 297–300Google Scholar
  3. An J, Haewoon K, Jansen BJ (2016a). Towards Automatic Persona Generation Using Social Media. In Proc. of The Third International Symposium on Social Networks Analysis, Management and Security (SNAMS 2016), The 4th International Conference on Future Internet of Things and Cloud. 22–24 AugustGoogle Scholar
  4. An J, Kwak H, Jansen BJ (2016b) Validating social media data for automatic Persona generation. In: Proceedings of the second international workshop on online social networks technologies (OSNT-2016), 13th ACS/IEEE international conference on computer systems and applications AICCSA 2016, 29 Nov–2 DecGoogle Scholar
  5. An J, Haewoon K, Jansen BJ (2017) Personas for content creators via decomposed aggregate audience statistics. In: Proceedings of Advances in Social Network Analysis and Mining (ASONAM 2017), 31 JulyGoogle Scholar
  6. Badache I, Boughanem M (2014) Harnessing social signals to enhance a search. In: 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT) (Presented at the 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol 1, pp 303–309Google Scholar
  7. Blomquist, AAsa, Arvola M (2002). Personas in action: ethnography in an interaction design team. In: Proceedings of the second Nordic conference on human-computer interaction, pp 197–200Google Scholar
  8. Bürgi P, Victor B, Lentz J (2004) Modeling how their business really works prepares managers for sudden change. Strat Leadersh 32(2):28–35CrossRefGoogle Scholar
  9. Chapman CN, Milham RP (2006) The Personas’ new clothes: methodological and practical arguments against a popular method. Proc Hum Factors Ergon Soc Annu Meet 50(5):634–636CrossRefGoogle Scholar
  10. Chapman CN, Love E, Milham RP, ElRif P, Alford JL (2008) Quantitative evaluation of Personas as information. Proc Hum Factors Ergon Soc Annu Meet 52(16):1107–1111CrossRefGoogle Scholar
  11. Cooper A (2004) The inmates are running the asylum: why high tech products drive us crazy and how to restore the sanity, 1st edn. Sams—Pearson Education, Indianapolis, INGoogle Scholar
  12. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89CrossRefGoogle Scholar
  13. Fernandez-Luque L, Bau T (2015) Health and social media: perfect storm of information. Healthc Inform Res 21(2):67–73CrossRefGoogle Scholar
  14. Friess E (2012) Personas and decision making in the design process: an ethnographic case study. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI’12). ACM, New York, NY, USA, pp 1209–1218Google Scholar
  15. Gandomi A, Haider M (2015) Beyond the hype: big data concepts, methods, and analytics. Int J Inf Manage 35(2):137–144CrossRefGoogle Scholar
  16. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S et al (2014) Generative adversarial networks. arXiv:1406.2661 [cs, stat]. Accessed 27 Feb 2018
  17. Goodwin, K. (2011). Designing for the digital age: how to create human-centered products and services. Wiley, New YorkGoogle Scholar
  18. Guo G, Zhu F, Chen E, Liu Q, Wu L, Guan C (2016) From footprint to evidence: an exploratory study of mining social data for credit scoring. ACM Trans Web 10(4):1–38CrossRefGoogle Scholar
  19. Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Market Sci 28(2):202–223CrossRefGoogle Scholar
  20. Hill CG, Haag M, Oleson A, Mendez C, Marsden N, Sarma A, Burnett M (2017) Gender-inclusiveness Personas vs. stereotyping: can we have it both ways? In: Proceedings of CHI ‘17, ACM Press, pp 6658–6671Google Scholar
  21. Hu Z, Yang Z, Liang X, Salakhutdinov R, Xing EP (2017) Controllable text generation. ArXiv preprint arXiv:1703.00955
  22. Jansen BJ (2009) Understanding user-web interactions via web analytics. Synth Lect Inf Concepts Retrieval Serv 1(1):1–102Google Scholar
  23. Jansen BJ, Mullen T (2008) Sponsored search: an overview of the concept, history, and technology. Int J Electron Bus 6(2):114–131CrossRefGoogle Scholar
  24. Jansen BJ, Spink A (2006) How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Inf Process Manage 42(1):248–263CrossRefGoogle Scholar
  25. Jansen BJ, Sobel K, Cook G (2011) Classifying ecommerce information sharing behaviour by youths on social networking sites. J Inf Sci 37(2):120–136CrossRefGoogle Scholar
  26. Jansen BJ, An J, Kwak H, Salminen J, Jung S-G (2017) Viewed by too many or viewed too little: using information dissemination for audience segmentation (pp 189–196). In: Presented at the association for information science and technology annual meeting 2017 (ASIST2017), Washington DC, USAGoogle Scholar
  27. Jenkinson A (1994) Beyond segmentation. J Target Measure Anal Market 3(1):60–72Google Scholar
  28. Jung S-G, An J, Kwak H, Ahmad M, Nielsen L, Jansen BJ (2017) Persona generation from aggregated social media data. In: Proceedings of the 2017 CHI conference extended abstracts on human factors in computing systems (pp 1748–1755). ACM, New York, NY, USAGoogle Scholar
  29. Kwak H, An J, Jansen BJ (2017) Automatic generation of Personas using youtube social media data (pp 833–842). In: Proceedings of the Hawaii international conference on system sciences (HICSS-50). 4–7 Jan, Waikoloa, HawaiiGoogle Scholar
  30. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791CrossRefGoogle Scholar
  31. LeRouge C, Ma J, Sneha S, Tolle K (2013) User profiles and Personas in the design and development of consumer health technologies. Int J Med Inform 82(11):251–268CrossRefGoogle Scholar
  32. Matthews T, Judge T, Whittaker S (2012) How do designers and user experience professionals actually perceive and use Personas? In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, pp 1219–1228Google Scholar
  33. McGinn JJ, Kotamraju N (2008) Data-driven Persona development. In Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1521–1524Google Scholar
  34. Miaskiewicz T, Kozar KA (2011) Personas and user-centered design: How can Personas benefit product design processes? Des Stud 32(5):417–430CrossRefGoogle Scholar
  35. Miaskiewicz T, Sumner T, Kozar KA (2008) A latent semantic analysis methodology for the identification and creation of Personas. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1501–1510Google Scholar
  36. Miller GA (1956) The magical number seven plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81–97CrossRefGoogle Scholar
  37. Nguyen D.-P., Gravel R, Trieschnigg RB, Meder T (2013) “How old do you think I am?” A study of language and age in Twitter. In: Proceedings of the seventh international AAAI conference on weblogs and social media (ICWSM). Cambridge, Massachusetts, USAGoogle Scholar
  38. Nielsen L (2002) From user to character: an investigation into user-descriptions in scenarios. In: Proceedings of the 4th conference on designing interactive systems: processes, practices, methods, and techniques. ACM, New York, NY, USA, pp 99–104Google Scholar
  39. Nielsen L (2004) Engaging Personas and narrative scenarios (vol 17). Samfundslitteratur.
  40. Nielsen L, Storgaard Hansen K (2014) Personas is applicable: a study on the use of Personas in Denmark. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 1665–1674Google Scholar
  41. Nielsen L, Jung S-G, An J, Salminen J, Kwak H, Jansen BJ (2017) Who are your users?: comparing media professionals’ preconception of users to data-driven Personas. In: Proceedings of the 29th Australian conference on computer-human interaction. ACM, New York, NY, USA, pp 602–606Google Scholar
  42. Oviatt S (2006) Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the 14th ACM international conference on Multimedia. ACM, pp 871–880Google Scholar
  43. Pruitt J, Grudin J (2003) Personas: practice and theory. In: Proceedings of the 2003 conference on designing for user experiences. ACM, New York, NY, USA, pp 1–15Google Scholar
  44. Rönkkö K, Hellman M, Kilander B, Dittrich Y (2004) Personas is not applicable: local remedies interpreted in a wider context. In: Proceedings of the eighth conference on participatory design: artful integration: interweaving media, materials and practices-volume 1 (PDC 04). vol. 1, ACM, New York, NY, USA, pp 112–120 Google Scholar
  45. Rönkkö K (2005) An empirical study demonstrating how different design constraints, project organization and contexts limited the utility of personas. In: Proceedings of the 38th annual hawaii international conference on system sciences-volume 08 (HICSS ’05), vol. 8. IEEE Computer Society, Washington, DC, USA, p 220 Google Scholar
  46. Salminen J (2014) Startup dilemmas—Strategic problems of early-stage platforms on the internet (Doctoral dissertation). Turku School of Economics, Turku. Retrieved from
  47. Salminen J, Milenković M, Jansen BJ (2017a) Problems of data science in organizations: an explorative qualitative analysis of business professionals’ concerns. In: Proceedings of International Conference on Electronic Business (ICEB 2017). DubaiGoogle Scholar
  48. Salminen J, Şengün S, Haewoon K, Jansen BJ, An J, Jung S et al (2017b) Generating cultural Personas from social data: a perspective of middle eastern users. In: Proceedings of the fourth international symposium on social networks analysis, management and security (SNAMS-2017), Prague, Czech Republic. Accessed 26 Aug 2017Google Scholar
  49. Salminen J, Kwak H, Santos JM, Jung S-G, An J, Jansen BJ (2018a) Persona perception scale: developing and validating an instrument for human-like representations of data. In: CHI’18 extended abstracts: CHI conference on human factors in computing systems extended abstracts proceedings, Montréal, CanadaGoogle Scholar
  50. Salminen J, Nielsen L, Jung S-G, An J, Kwak H, Jansen BJ (2018b) “Is more better?”: impact of multiple photos on perception of Persona profiles. In: Proceedings of ACM CHI conference on human factors in computing systems (CHI’18), Montréal, CanadaGoogle Scholar
  51. Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M et al (2013) Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9):e73791CrossRefGoogle Scholar
  52. Scott DM (2007) The new rules of marketing. Wiley, Hoboken, New JerseyGoogle Scholar
  53. Stauss B, Heinonen K, Strandvik T, Mickelsson K-J, Edvardsson B, Sundström E, Andersson P (2010) A customer-dominant logic of service. J Serv Manage 21(4):531–548CrossRefGoogle Scholar
  54. Thorson E (2008) Changing patterns of news consumption and participation: News recommendation engines. Inf Commun Soc 11(4):473–489CrossRefGoogle Scholar
  55. Tversky A, Kahneman D (1974) Judgment under uncertainty: heuristics and biases. Science 185(4157):1124–1131CrossRefGoogle Scholar
  56. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101Google Scholar
  57. Zagheni E, Garimella VRK, Weber I, State B (2014) Inferring international and internal migration patterns from twitter data. In: Proceedings of the 23rd international conference on World Wide Web, ACM, New York, NY, USA, pp 439–444Google Scholar
  58. Zhang X, Brown H-F, Shankar A (2016) Data-driven Personas: constructing archetypal users with Clickstreams and user telemetry. In: Proceedings of the 2016 CHI conference on human factors in computing systems (pp. 5350–5359). ACM, New York, NY, USA. Accessed 4 Nov 2017Google Scholar
  59. Zhang Y, Gan Z, Fan K, Chen Z, Henao R, Shen D, Carin L (2017) Adversarial feature matching for text generation. ArXiv preprint arXiv:1706.03850

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  • Joni Salminen
    • 1
    • 2
  • Bernard J. Jansen
    • 1
  • Jisun An
    • 1
  • Haewoon Kwak
    • 1
  • Soon-Gyo Jung
    • 1
  1. 1.Qatar Computing Research InstituteHamad Bin Khalifa UniversityDohaQatar
  2. 2.Turku School of EconomicsTurkuFinland

Personalised recommendations