Abstract
In this chapter, we will first give a brief overview of the mobile instant messaging landscape. Subsequently, we focus on the instant messaging application “WhatsApp” and describe its current features and which kinds of data can be extracted from it. Based on the existing literature, we provide practical advice for researchers seeking to work with WhatsApp data with respect to data collection, participant incentivization, data processing, informed consent, anonymization, and reproducibility of research. These insights might also prove useful to researchers seeking to work with other kinds of chat log data. We conclude that WhatsApp is an intriguing data source for social science research questions but that the data have to be treated with great caution to ensure ethical conduct. To facilitate this, we present several issues to contemplate for designing studies and briefly introduce the “WhatsR” package for R - our own package for parsing and visualizing data from exported WhatsApp chat logs with convenience features for tailoring, anonymizing, and extracting metadata from them.
This is a preview of subscription content, access via your institution.
Buying options









Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
This can be done using the WhatsApp Web API and the Selenium package for R or python.
- 10.
- 11.
- 12.
- 13.
The following corpora were used in the study: https://db.mocoda2.de/c/home, https://smsdbms.sprache-interaktion.de/.
- 14.
- 15.
- 16.
- 17.
- 18.
See also the download_emoji() function in the same package.
- 19.
References
Aharony N (2015) What’s App: a social capital perspective. Online Inf Rev 39(1):26–42. https://doi.org/10.1108/oir-08-2014-0177
Aizenkot D, Kashy-Rosenbaum G (2019) Cyberbullying victimization in WhatsApp classmate groups among israeli elementary, middle, and high school students. J Interpersonal Violence, 36(15–16):NP8498– NP8519. https://doi.org/10.1177/0886260519842860
Aizenkot D, Kashy-Rosenbaum G (2020) The effectiveness of safe surfing intervention program in reducing whatsapp cyberbullying and improving classroom climate and student sense of class belonging in elementary school. J Early Adolescence 41(4):550–576. https://doi.org/10.1177/0272431620931203
Argamon S, Koppel M, Pennebaker JW, Schler J (2009) Automatically profiling the author of an anonymous text. Commun ACM 52(2):119–123. https://doi.org/10.1145/1461928.1461959
Barbosa S, Milan S (2019) Do not harm in private chat apps: ethical issues for research on and with WhatsApp. Westminster Papers in Commun Culture 14(1):49–65. https://doi.org/10.16997/wpcc.313
Bursztyn VS, Birnbaum L (2019) Thousands of small, constant rallies: a large-scale analysis of partisan Whatsapp groups. In: Proceedings of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp 484–488. https://doi.org/10.1145/3341161.3342905
Cheng N, Chandramouli R, Subbalakshmi KP (2011) Author gender identification from text. Digital Investigation 8(1):78–88. https://doi.org/10.1016/j.diin.2011.04.002
Church K, de Oliveira R (2013) What’s up with Whatsapp? comparing mobile instant messaging behaviors with traditional SMS. In: Proceedings of the 15th international conference on human-computer interaction with mobile devices and services (MobileHCI ’13), pp 352–361. https://doi.org/10.1145/2493190.2493225
Clement J (2020) Number of monthly active WhatsApp users worldwide from April 2013 to March 2020. https://www.statista.com/statistics/260819/number-ofmonthly-active-whatsapp-users/
Costa-Sánchez C, Guerrero-Pico M (2020) What is WhatsApp for? developing transmedia skills and informal learning strategies through the use of whatsapp—a case study with teenagers from Spain. Social Media + Society 6(3):2–11.https://doi.org/10.1177/2056305120942886
Coulthard M (2004) Author identification, idiolect, and linguistic uniqueness. Appl Linguis 25(4):431–447. https://doi.org/10.1093/applin/25.4.431
Council of European Union (2016) Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/EC (General Data Protection Regulation). http://data.europa.eu/eli/reg/2016/679/oj
De Freitas M, Vieira PCC, Garimella K, Vaz de Melo POS, Benevenuto F (2020) Can WhatsApp counter misinformation by limiting message forwarding? In: Cherifi H, Gaito S, Mendes JF, Moro E, Rocha LM (eds) Complex networks and their applications VIII. Cham, Springer International Publishing, pp 372–384. https://doi.org/10.1007/978-3-030-36687-2_31
De Vel, Olivier AA, Corney M, Mohay G (2001) Mining e-mail content for author identification forensics. ACM SIGMOD Rec 30(4):55–64. https://doi.org/10.1145/604264.604272
Desjardins J (2016) The evolution of instant messaging. https://www.visualcapitalist.com/evolution-instant-messaging/
Dürscheid C, Frick K (2014) Keyboard-to-screen-kommunikation gestern und heute: SMS und WhatsApp im vergleich. In: Runkehl J, Schlobinski P, Siever T (eds) Sprachen? Vielfalt! Sprache und Kommunikation in der Gesellschaft und den Medien. pp 149–181
Facebook Messenger, Website (2020) Messenger rooms. https://www.messenger.com/rooms
Finck M, Pallas F (2020) They who must not be identified—distinguishing personal from non-personal data under the GDPR. Int Data Privacy Law 10(1):11–36. https://doi.org/10.1093/idpl/ipz026
Flores-Salgado E, Castineira-Benitez TA (2018) The use of politeness in WhatsApp discourse and move ‘requests.’ J Pragmat 133:79–92. https://doi.org/10.1016/j.pragma.2018.06.009
García-Gómez A (2018) Managing conflict on WhatsApp: a contrastive study of British and Spanish family disputes. J Language Aggression and Conflict 6(2):320–343. https://doi.org/10.1075/jlac.00015.gar
Garimella K, Tyson G (2018) WhatsApp, doc? a first look at Whats-app public group data. In: Proceedings of the twelfth international AAAI conference on web and social media (ICWSM 2018). https://ojs.aaai.org/index.php/ICWSM/issue/view/270
Gudipaty LP, Jhala KY (2015) WhatsApp forensics: decryption of encrypted whatsapp databases on non rooted android devices. J Inform Technol Softw Eng 5(2):2–4. https://doi.org/10.4172/21657866.1000147
Haselton T (2020) WhatsApp will soon let you set messages to automatically delete after seven days. https://www.cnbc.com/2020/11/05/whatsapp messages-can-autodelete-after-seven-days.html
Hern A (2021a) WhatsApp loses millions of users after terms update. https://www.theguardian.com/technology/2021a/jan/24/whatsapp-loses-millionsof-users-after-terms-update
Hern A (2021b) WhatsApp to try again to change privacy policy in mid-may. https://www.theguardian.com/technology/2021b/feb/22/whatsapp-to-tryagain-to-change-privacy-policy-in-mid-may
Information Commissioner’s Office (2012) Anonymisation: managing data protection risk code of practice. https://ico.org.uk/media/1061/anonymisationcode.pdf
Jucker AH, Dürscheid C (2012) The linguistics of keyboard-to-screen communication. a new terminological framework. Linguistik Online 56(6):39–64. https://doi.org/10.13092/lo.56.255
Kemp S (2020) DIGITAL 2020: 3.8 billion people use social media. https://wearesocial.com/uk/blog/2020/07/digital-use-around-the-world-in-july2020/
König K (2019) Stance taking with ‘laugh’ particles and emojis—sequential and functional patterns of ‘laughter’ in a corpus of German WhatsApp chats. J Pragmat 142:156–170. https://doi.org/10.1016/j.pragma.2019.01.008
Levitan SI, Levitan Y, An G, Levine M, Levitan R, Rosenberg A, Hirschberg J (2016) Identifying individual differences in gender, ethnicity, and personality from dialogue for deception detection. In: Proceedings of the second workshop on computational approaches to deception detection, pp 40–44. https://doi.org/10.18653/v1/W16-0806
Lynch KE, Alba P, Viernes B, DuVall SL (2019) Using enriched samples for semi-automated vocabulary expansion to identify rare events in clinical text: sexual orientation as a use case. Stud Health Technol Inform 264:1532–1533. https://doi.org/10.3233/SHTI190520
Machado C, Kira B, Narayanan V, Kollanyi B, Howard P (2019) A study of misinformation in WhatsApp groups with a focus on the Brazilian presidential elections. In: Liu L, Ryen W (eds) Companion proceedings of the 2019 world wide web conference, pp 1013–1019. https://doi.org/10.1145/3308560.3316738
Madigan D, Genkin A, Lewis DD, Argamon S, Fradkin D, Ye L (2005) Author identification on the large scale. In: Proceedings of the 2005 meeting of the classification society of north America (CSNA). http://www.classification-society.org/if_csna_2005_meeting/allabs2.pdf
Maız-Arévalo C (2018) Emotional self-presentation on WhatsApp: analysis of the profile status. Russian J Linguistics 22(1):144–160. https://doi.org/10.22363/2312-9182-2018-22-1-144-160
Massung S, Zhai CX, Hockenmaier J (2013) Structural parse tree features for text representation. In: 2013 IEEE seventh international conference on semantic computing, pp 9–16. https://doi.org/10.1109/ICSC.2013.13
Melo P, Messias J, Resende G, Garimella K, Almeida J, Benevenuto F (2019) WhatsApp monitor: a fact-checking system for WhatsApp. In: Proceedings of the 2019 international AAAI conference on web and social media, pp 676–677. https://ojs.aaai.org/index.php/ICWSM/article/view/3271
Miller H, Thebault-Spieker J, Chang S, Johnson I, Terveen L, Hecht B (2016) Blissfully happy“ or” ready to fight: varying interpretations of emoji. In: Proceedings of the international AAAI conference on web and social media 10(1):259–268. https://ojs.aaai.org/index.php/ICWSM/article/view/14757
Montag C, Baumeister H, Kannen C, Sariyska R, Meßner EM, Brand M (2019) Concept, possibilities and pilot-testing of a new smartphone application for the social and life sciences to study human behavior including validation data from personality psychology. J— Multidisciplinary Scientif J 2(2):102–115. https://doi.org/10.3390/j2020008
Montag C, Błaszkiewicz K, Sariyska R, Lachmann B, Andone I, Trendafilov B, Eibes M, Markowetz A (2015) Smartphone usage in the 21st century: who is active on WhatsApp? BMC Res Notes 8(1):331. https://doi.org/10.1186/s13104-015-1280-z
Moretón A, Jaramillo A (2021) Anonymisation and re-identification risk for voice data. European Data Protection Law Rev 7:274–284. https://doi.org/10.21552/edpl/2021/2/20
Mozes M, Kleinberg B (2021) No intruder, no validity: evaluation criteria for privacy-preserving text anonymization. ArXiv preprint. https://arxiv.org/abs/2103.09263
Narayanan A, Paskov H, Gong NZ, Bethencourt J, Stefanov E, Shin ECR, Song D (2012) On the feasibility of internet-scale author identification. In: 2012 IEEE symposium on security and privacy, pp 300–314. https://doi.org/10.1109/SP.2012.46
Narayanan V, Kollanyi B, Hajela R, Barthwal A, Marchal N, Howard PN (2019) News and information over Facebook and WhatsApp during the indian election campaign. Data Memo 2:1–8. https://demtech.oii.ox.ac.uk/research/posts/news-and-information-over-facebookand-whatsapp-during-the-indian-election-campaign
Nunan D, Yenicioglu B (2013) Informed, uninformed and participative consent in social media research. Int J Mark Res 55(6):791–808. https://doi.org/10.2501/IJMR-2013-067
O’Hara KP, Massimi M, Harper R, Rubens S, Morris J (2014) Everyday dwelling with WhatsApp. In: CSCW ’14: proceedings of the 17th ACM conference on computer supported cooperative work and social computing, pp 1131–1143. https://doi.org/10.1145/2531602.2531679
Olson P (2014) Facebook closes $19 billion WhatsApp deal. https://www.entrepreneur.com/article/239558
Orebaugh A, Allnutt J (2009) Classification of instant messaging communications for forensics analysis. The Int J Forensic Comput Sci 4(1):22–28. https://doi.org/10.5769/J200901002
Pang N, Woo YT (2020) What about WhatsApp? a systematic review of WhatsApp and its role in civic and political engagement. First Monday 25(12). https://doi.org/10.5210/fm.v25i12.10417
Pappert S (2017) Zu kommunikativen Funktionen von Emojis in der WhatsAppKommunikation. In: Beißwenger M (ed) Empirische Erforschung internetbasierter Kommunikation, pp 175–211. https://doi.org/10.1515/9783110567786-007
Perez S (2018) WhatsApp has launched person-to-person payments into beta in India. https://techcrunch.com/2018/02/08/whatsapp-has-launched-personto-person-payments-into-beta-in-india/
Petitjean C, Morel E (2017) “Hahaha”: laughter as a resource to manage WhatsApp conversations. J Pragmatics 110:1–19. https://doi.org/10.1016/j.pragma.2017.01.001
Raiman L, Antbring R, Mahmood A (2017) WhatsApp messenger as a tool to supplement medical education for medical students on clinical attachment. BMC Med Educ 17(1):1–9. https://doi.org/10.1186/s12909017-0855-x
Rao D, Paul M, Fink C, Yarowsky D, Oates T, Coppersmith G (2011) Hierarchical bayesian models for latent attribute detection in social media. In: Adamic LA, Baeza-Yates R, Counts S (eds) Proceedings of the international AAAI conference on web and social media, vol 5. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2881
Resende G, Melo P, Reis JCS, Vasconcelos M, Almeida JM, Benevenuto F (2019a) Analyzing textual (Mis)information shared in WhatsApp groups. In: WebSci ’19: proceedings of the 10th ACM conference on web science, pp 225–234. https://doi.org/10.1145/3292522.3326029
Resende G, Melo P, Sousa H, Messias J, Vasconcelos M, Almeida J, Benevenuto F (2019b) (Mis) Information dissemination in WhatsApp: gathering, analyzing and countermeasures. In: WWW ’19: the world wide web conference, pp 818–828. https://doi.org/10.1145/3308558.3313688
Reuters (2020) Brazil’s central bank says Whatsapp payment tests have begun. https://www.reuters.com/article/us-facebook-brazil-whatsapp-idUSKBN24Z1I4
Rosenberg H, Asterhan CSC (2018) “WhatsApp, teacher?”—student perspectives on teacher-student WhatsApp interactions in secondary schools. J Inform Technol Educ: Res 17:205–226. https://doi.org/10.28945/4081
Rosenfeld A, Sina S, Sarne D, Avidov O, Kraus S (2018) Whatsapp usage patterns and prediction of demographic characteristics without access to message content. Demographic Res 39:647–670. https://doi.org/10.4054/DemRes.2018.39.22
Salmons J (2017) Getting to yes: informed consent in qualitative social media research. In: Woodfield K (ed) The ethics of online research, pp 109–134. https://doi.org/10.1108/S2398-601820180000002005
Sampietro A (2019) Emoji and rapport management in spanish WhatsApp chats. J Pragmat 143:109–120. https://doi.org/10.1016/j.pragma.2019.02.009
Sánchez-Moya A, Cruz-Moya O (2015) “Hey there! I am using WhatsApp”: a preliminary study of recurrent discursive realisations in a corpus of Whatsapp statuses. Proc Soc Behav Sci 212(2):52–60. https://doi.org/10.1016/j.sbspro.2015.11.298
Schwind A, Seufert M (2018) WhatsAnalyzer: a tool for collecting and analyzing WhatsApp mobile messaging communication data. In: Yuming J, Schmitt J, Fidler M (eds) Proceedings of the 2018 international workshop on network calculus and applications (NetCal2018), vol 1. pp 85–88. https://doi.org/10.1109/ITC30.2018.00020
Seufert M, Hoßfeld T, Schwind A, Burger V, TranGia P (2016) Group-based communication in Whatsapp. In: Proceedings of the IFIP networking conference and workshops 2016, pp 536–541.https://doi.org/10.1109/IFIPNetworking.2016.7497256
Seufert M, Schwind A, Hoßfeld T, Tran-Gia P (2015) Analysis of group-based communication in WhatsApp. In: Agüero R, Zinner T, García-Lozano M (eds) Proceedings of the 7th international conference on mobile networks and management. Springer, pp 225–238
Shandrow KL (2014) From ICQ to AIM to WhatsApp: the rise and fall of instant messenger apps. https://www.entrepreneur.com/article/239558
Sindermann C, Lachmann B, Elhai JD, Montag C (2021) Personality associations with WhatsApp usage and usage of alternative messaging applications to protect one’s own data. J Individual Differences 42(2):1–8. https://doi.org/10.1027/1614-0001/a000343
Singer E, Ye C (2013) The use and effects of incentives in surveys. Ann Am Acad Pol Soc Sci 645(1):112–141. https://doi.org/10.1177/0002716212458082
Skatova A, Goulding J (2019) Psychology of personal data donation. PLoS ONE 14(11):1–20. https://doi.org/10.1371/journal.pone.0224240
Smit I (2015) WhatsApp with learning preferences? In: Proceedings of the 2015 IEEE frontiers in education conference (FIE). pp 1–6. https://doi.org/10.1109/FIE.2015.7344366
Sprugnoli R, Menini S, Tonelli S, Oncini F, Piras E (2018) Creating a WhatsApp dataset to study pre-teen cyberbullying. In: Proceedings of the 2nd workshop on abusive language online (ALW2), pp 51–59. https://doi.org/10.18653/v1/W18-5107
Tam J, Martell CH (2009) Age detection in chat. In: Proceedings of the 2009 IEEE international conference on semantic computing, pp 33–39. https://doi.org/10.1109/ICSC.2009.37
Tausczik YR, Pennebaker JW (2010) The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol 29(1):24–54. https://doi.org/10.1177/0261927X09351676
Thurlow C, Poff M (2013) Text messaging. In: Susan CH, Stein D, Virtanen T (eds) Pragmatics of computer-mediated communication, chap. 7, pp 163–189. Berlin/Boston: De Gruyter Mouton. https://doi.org/10.1515/9783110214468.163
Townsend L, Wallace C (2016) Social media research: a guide to ethics. University of Aberdeen. pp 1–16 https://www.gla.ac.uk/media/Media_487729_smxx.pdf
Ueberwasser S, Stark E (2017) What’s up, Switzerland? A corpusbased research project in a multilingual Country. Linguistik Online 84(5). https://doi.org/10.13092/lo.84.3849
Verheijen L, Stoop W (2016) Collecting Facebook posts and WhatsApp chats. In: Proceedings of the 9th international conference on text, speech, and dialogue. Springer, pp 249–258. https://doi.org/10.1007/978-3-319-45510-529
WhatsApp, Website (2020) Send and receive money right where you chat. https://www.whatsapp.com/payments/br
WhatsApp FAQ, Website (2020a) About messenger rooms. https://faq.whatsapp.com/general/voice-and-video-calls/about-messenger-rooms/?lang=en
WhatsApp FAQ, Website (2020b) Group video and voice calls now support 8 participants. https://blog.whatsapp.com/group-video-and-voice-calls-now-support8-participants/?lang=en
WhatsApp FAQ, Website (2020c) How to delete messages. https://faq.whatsapp.com/android/chats/how-to-delete-messages/?lang=en
WhatsApp FAQ, Website (2020d) How to format your messages. https://faq.whatsapp.com/general/chats/how-to-format-your-messages/?lang=fb
WhatsApp FAQ, Website (2020e) How to send and open view once media. https://faq.whatsapp.com/android/chats/how-to-send-and-open-view-oncemedia
WhatsApp FAQ, Website (2020f) How to use broadcast lists. https://faq.whatsapp.com/android/chats/how-to-use-broadcast-lists/?lang=en
WhatsApp FAQ, Website (2020g) I get a message that my video is too long and it won’t send. https://faq.whatsapp.com/general/i-get-a-message-that-myvideo-is-too-long-and-it-wont-send/?lang=en
Whatsapp FAQ, Website (2021) How to save your chat history. https://faq.whatsapp.com/android/chats/how-to-save-your-chat-history/?lang=en
Williams ML, Burnap P, Sloan L, Jessop C, Lepps H (2017) Users’ views of ethics in social media research: informed consent, anonymity, and harm. In: Woodfield K (ed) The ethics of online research. Emerald Publishing Limited. https://doi.org/10.1108/S2398-601820180000002002
Zheng R, Li J, Chen H, Huang Z (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. J Am Soc Inform Sci Technol 57(3):378–393. https://doi.org/10.1002/asi.20316
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kohne, J., Elhai, J.D., Montag, C. (2023). A Practical Guide to WhatsApp Data in Social Science Research. In: Montag, C., Baumeister, H. (eds) Digital Phenotyping and Mobile Sensing. Studies in Neuroscience, Psychology and Behavioral Economics. Springer, Cham. https://doi.org/10.1007/978-3-030-98546-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-98546-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98545-5
Online ISBN: 978-3-030-98546-2
eBook Packages: EngineeringEngineering (R0)