Lexicon-Based Sentiment Analysis in Behavioral Research

Cero, Ian; Luo, Jiebo; Falligant, John Michael

doi:10.1007/s40614-023-00394-x

Lexicon-Based Sentiment Analysis in Behavioral Research

SI: Big Data and Behavior Science
Published: 24 January 2024

Volume 47, pages 283–310, (2024)
Cite this article

Perspectives on Behavior Science Aims and scope Submit manuscript

207 Accesses
1 Citation
Explore all metrics

Abstract

A complete science of human behavior requires a comprehensive account of the verbal behavior those humans exhibit. Existing behavioral theories of such verbal behavior have produced compelling insight into language’s underlying function, but the expansive program of research those theories deserve has unfortunately been slow to develop. We argue that the status quo’s manually implemented and study-specific coding systems are too resource intensive to be worthwhile for most behavior analysts. These high input costs in turn discourage research on verbal behavior overall. We propose lexicon-based sentiment analysis as a more modern and efficient approach to the study of human verbal products, especially naturally occurring ones (e.g., psychotherapy transcripts, social media posts). In the present discussion, we introduce the reader to principles of sentiment analysis, highlighting its usefulness as a behavior analytic tool for the study of verbal behavior. We conclude with an outline of approaches for handling some of the more complex forms of speech, like negation, sarcasm, and speculation. The appendix also provides a worked example of how sentiment analysis could be applied to existing questions in behavior analysis, complete with code that readers can incorporate into their own work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment Analysis in Social Media

Sentiment Analysis: What’s Your Opinion?

Reflections on Sentiment/Opinion Analysis

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

Whether these key assumptions hold well enough for a given application is of course an empirical question. However, some version of them must be true in order for humans to communicate at all. For if the meanings of words were totally unique from context to context, verbal communication itself would be impossible.
If the reader is having trouble visualizing how all of the components fit together, this is understandable. A sentiment analysis often involves a few different components that come together to produce the desired output. Because understanding that process is often easier with a concrete example, we have prepared a worked illustration with a familiar analysis question (e.g., related to the matching law) in the Appendix.
This intimidating acronym stands for Representational State Transfer Application Programming interface, but that is not especially informative to most readers.

References

Araujo, M., Reis, J., Pereira, A., & Benevenuto, F. (2016). An evaluation of machine translation for multilingual sentence-level sentiment analysis. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, (pp. 1140–1145). https://doi.org/10.1145/2851613.2851817
Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1(1), 91.
Bailey, J. D., Baker, J. C., Rzeszutek, M. J., & Lanovaz, M. J. (2021). Machine learning for supplementing behavioral assessment. Perspectives on Behavior Science, 44(4), 605–619.
Article PubMed PubMed Central Google Scholar
Barnes-Holmes, D., Hayden, E., Barnes-Holmes, Y., & Stewart, I. (2008). The implicit relational assessment procedure (IRAP) as a response-time and event-related-potentials methodology for testing natural verbal relations: A preliminary study. Psychological Record, 58(4), 497–515.
Article Google Scholar
Barrie, C., Ho, J. C., Chan, C., Rico, N., König, T., & Davidson, T. (2022). academictwitteR: Access the Twitter Academic Research Product Track V2 API Endpoint (0.3.1) [Computer software]. https://CRAN.R-project.org/package=academictwitteR
Becirevic, A., Critchfield, T. S., & Reed, D. D. (2016). On the social acceptability of behavior-analytic terms: Crowdsourced comparisons of lay and technical language. The Behavior Analyst, 39, 305–317.
Becirevic, A., Reed, D. D., Amlung, M., Murphy, J. G., Stapleton, J. L., & Hillhouse, J. J. (2017). An initial study of behavioral addiction symptom severity and demand for indoor tanning. Experimental and Clinical Psychopharmacology, 25(5), 346.
Boyd, R. L., Ashokkumar, A., Seraj, S., & Pennebaker, J. W. (2022). The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin, pp 1–47.
Brandt, P. M., & Herzberg, P. Y. (2020). Is a cover letter still needed? Using LIWC to predict application success. International Journal of Selection & Assessment, 28(4), 417–429.
Article Google Scholar
Cero, I., & Witte, T. K. (2020). Assortativity of suicide-related posting on social media. American Psychologist, 75(3), 365–379. https://doi.org/10.1037/amp0000477
Article PubMed Google Scholar
Cieliebak, M., Dürr, O., & Uzdilli, F. (2013). Potential and limitations of commercial sentiment detection tools. In: ESSEM@ AI* IA, (pp. 47–58).
Critchfield, T. S., Becirevic, A., & Reed, D. D. (2016). In Skinner's early footsteps: Analyzing verbal behavior in large published corpora. The Psychological Record, 66, 639–647.
Critchfield, T. S., & Doepke, K. J. (2018). Emotional overtones of behavior analysis terms in English and five other languages. Behavior Analysis in Practice, 11, 97–105.
Critchfield, T. S., Doepke, K. J., Kimberly Epting, L., Becirevic, A., Reed, D. D., Fienup, D. M., ... & Ecott, C. L. (2017). Normative emotional responses to behavior analysis jargon or how not to use words to win friends and influence people. Behavior Analysis in Practice, 10, 97–106.
Cutler, A. D., Carden, S. W., Dorough, H. L., & Holtzman, N. S. (2021). Inferring grandiose narcissism from text: LIWC versus machine learning. Journal of Language & Social Psychology, 40(2), 260–276.
Article Google Scholar
De Choudhury, M., Counts, S., Horvitz, E. J., & Hoff, A. (2014). Characterizing and predicting postpartum depression from shared facebook data. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing—CSCW 14, 626–638. https://doi.org/10.1145/2531602.2531675
De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G., & Kumar, M. (2016). Discovering shifts to suicidal ideation from mental health content in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems—CHI Conference, 2016, (pp. 2098–2110). https://doi.org/10.1145/2858036.2858207
Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., & Danforth, C. M. (2011). Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS ONE, 6(12), 1–26. https://doi.org/10.1371/journal.pone.0026752
Article Google Scholar
Dragut, E., & Fellbaum, C. (2014, June). The role of adverbs in sentiment analysis. In Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014) (pp. 38–41).
Dragut, E. C., Wang, H., Sistla, P., Yu, C., & Meng, W. (2014). Polarity consistency checking for domain independent sentiment dictionaries. IEEE Transactions on Knowledge and Data Engineering, 27(3), 838–851.
Dubey, S., Biswas, P., Ghosh, R., Chatterjee, S., Dubey, M. J., Chatterjee, S., & Lavie, C. J. (2020). Psychosocial impact of COVID-19. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(5), 779–788.
Duong, V., Luo, J., Pham, P., Yang, T., & Wang, Y. (2020). The ivory tower lost: How college students respond differently than the general public to the covid-19 pandemic. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020, 126–130.
Google Scholar
Emerson, G., & Declerck, T. (2014, August). SentiMerge: Combining sentiment lexicons in a Bayesian framework. In Proceedings of workshop on lexical and grammatical resources for language processing (pp. 30–38).
Friman, P. C., Hayes, S. C., & Wilson, K. G. (1998). Why behavior analysts should study emotion: The example of anxiety. Journal of Applied Behavior Analysis, 31(1), 137–156.
Article PubMed PubMed Central Google Scholar
Hayes, S. C., Barnes-Holmes, D., & Roche, B. (Eds.). (2001). Relational frame theory: A post-Skinnerian account of human language and cognition (2001st ed.). Springer.
Google Scholar
Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13(2), 243–266.
Article PubMed PubMed Central Google Scholar
Hii, D. (2019). Using meaning specificity to aid negation handling in sentiment analysis.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (pp. 168–177).
Hussey, I., Daly, T., & Barnes-Holmes, D. (2015). Life is good, but death ain’t bad either: Counter-intuitive implicit biases to death in a normative population. Psychological Record, 65(4), 731–742. https://doi.org/10.1007/s40732-015-0142-3
Article Google Scholar
Imtiaz, A., Khan, D., Lyu, H., & Luo, J. (2022). Taking sides: Public opinion over the Israel-Palestine Conflict in 2021. arXiv Preprint arXiv:2201.05961.
Jia, J. (2009). An AI framework to teach English as a foreign language: CSIEC. Ai Magazine, 30(2), 59–59.
Joshi, A., Bhattacharyya, P., & Carman, M. J. (2016). Automatic sarcasm detection: A survey (arXiv:1602.03426). arXiv. http://arxiv.org/abs/1602.03426
Jurafsky, D., & Martin, J. (2008). Speech and language processing (2nd ed.). Prentice Hall.
Google Scholar
Kaity, M., & Balakrishnan, V. (2020). Sentiment lexicons and non-English languages: A survey. Knowledge & Information Systems, 62(12), 4445–4480. https://doi.org/10.1007/s10115-020-01497-6
Article Google Scholar
Khoo, C. S., & Johnkhan, S. B. (2018). Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons. Journal of Information Science, 44(4), 491–511. https://doi.org/10.1177/0165551517703514
Article Google Scholar
Kiritchenko, S., & Mohammad, S. (2017). The effect of negators, modals, and degree adverbs on sentiment composition. arXiv Preprint arXiv:1712.01794.
Kotelnikova, A., Paschenko, D., Bochenina, K., & Kotelnikov, E. (2021). Lexicon-based Methods vs. BERT for Text Sentiment Analysis. arXiv Preprint arXiv:2111.10097.
Lanovaz, M. J., Giannakakos, A. R., & Destras, O. (2020). Machine learning to analyze single-case data: A proof of concept. Perspectives on Behavior Science, 43(1), 21–38.
Article PubMed PubMed Central Google Scholar
Lanovaz, M. J., & Hranchuk, K. (2021). Machine learning to analyze single-case graphs: A comparison to visual inspection. Journal of Applied Behavior Analysis, 54(4), 1541–1552.
Article PubMed Google Scholar
Liu, B. (2020). Sentiment analysis: Mining opinions, sentiments, and emotions (2nd ed.). Cambridge University Press.
Book Google Scholar
Lanovaz, R. Z. III (2020). Seeing the invisible: Extracting signs of depression and suicidal ideation from college students’ writing using LIWC a computerized text analysis. International Journal of Research, 9(4), 31–44.
Google Scholar
Lumontod III, R. Z. (2020). Seeing the invisible: Extracting signs of depression and suicidal ideation from college students' writing using LIWC a computerized text analysis. International Journal of Research Studies in Education, 9, 31–44.
Luna, O. (2019). Matching analyses as an evaluative tool: Characterizing behavior in juvenile residential settings.
McDowell, J. J. (2013). On the theoretical and empirical status of the matching law and matching theory. Psychological Bulletin, 139(5), 1000–1028. https://doi.org/10.1037/a0029924
Article PubMed Google Scholar
McDowell, J. J., & Caron, M. L. (2010). Matching in an undisturbed natural human environment. Journal of the Experimental Analysis of Behavior, 93(3), 415–433.
Article PubMed PubMed Central Google Scholar
Mohammad, S., & Turney, P. (2010, June). Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text (pp. 26–34).
Mohammad, S., & Turney, P. D. (2013). NRC emotion lexicon. National Research Council, Canada, 2.
Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs (arXiv:1103.2903). arXiv. https://doi.org/10.48550/arXiv.1103.2903
Normand, M. P., & Donohue, H. E. (2022). Behavior analytic jargon does not seem to influence treatment acceptability ratings. Journal of Applied Behavior Analysis, 55(4), 1294–1305.
O’Reilly, A., Roche, B., Ruiz, M., Tyndall, I., & Gavin, A. (2012). The function acquisition speed test (fast): A behavior analytic implicit test for assessing stimulus relations. Psychological Record, 62(3), 507–528.
Article Google Scholar
Palmer, D. C. (2023). Toward a behavioral interpretation of english grammar. Perspectives on Behavior Science. https://doi.org/10.1007/s40614-023-00368-z
Pröllochs, N., Feuerriegel, S., & Neumann, D. (2015). Enhancing sentiment analysis of financial news by detecting negation scopes. In: 48th Hawaii International Conference on System Sciences, (pp. 959–968). https://doi.org/10.1109/HICSS.2015.119
Reed, D. D. (2016). Matching theory applied to MLB team-fan social media interactions: An opportunity for behavior analysis.
Rhanoui, M., Mikram, M., Yousfi, S., & Barzali, S. (2019). A CNN-BiLSTM model for document-level sentiment analysis. Machine Learning & Knowledge Extraction, 1(3), 832–847.
Article Google Scholar
Salameh, M., Mohammad, S., & Kiritchenko, S. (2015). Sentiment after translation: A case-study on arabic social media posts. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 767–777. https://doi.org/10.3115/v1/N15-1078
Sarsam, S. M., Al-Samarraie, H., Alzahrani, A. I., Alnumay, W., & Smith, A. P. (2021). A lexicon-based approach to detecting suicide-related messages on Twitter. Biomedical Signal Processing and Control, 65, 102355.
Schneider, A., & Dragut, E. (2015, July). Towards debugging sentiment lexicons. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1024–1034).
Silge, J., & Robinson, D. (2022). Text mining with R: A tidy approach (2022-05-03 ed.). https://www.tidytextmining.com/
Simon, C., & Baum, W. M. (2017). Allocation of speech in conversation. Journal of the Experimental Analysis of Behavior, 107(2), 258–278. https://doi.org/10.1002/jeab.249
Article PubMed Google Scholar
Skinner, B. F. (1939). Alliteration in Shakespeare’s sonnets: A study in Liberary behavior. The Psychological Record, 3, 185.
Google Scholar
Skinner, B. F. (1957). Verbal behavior. Copley Publishing Group.
Book Google Scholar
Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., & Lin, J. (2019). Distilling task-specific knowledge from bert into simple neural networks. arXiv Preprint arXiv:1903.12136.
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language & Social Psychology, 29(1), 24–54.
Article Google Scholar
Taylor, T., & Lanovaz, M. J. (2021). Machine learning to support visual inspection of data: A clinical application. Behavior Modification, 46(5), 1109–1136. https://doi.org/10.1177/01454455211038208
Article PubMed Google Scholar
Turgeon, S., & Lanovaz, M. J. (2020). Tutorial: Applying machine learning in behavioral research. Perspectives on Behavior Science, 43(4), 697–723.
Article PubMed PubMed Central Google Scholar
Turgeon, S., & Lanovaz, M. J. (2021). Perceptions of behavior analysis in France: Accuracy and tone of posts in an internet forum on autism. Behavior & Social Issues, 30, 308–322.
Article Google Scholar
Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, tidy, transform, visualize, and model data. O’Reilly Media.
Google Scholar
Wickham, H., & RStudio. (2017). tidyverse: Easily install and load the “tidyverse” [Computer software]. https://CRAN.R-project.org/package=tidyverse
Wong, C. A., Sap, M., Schwartz, A., Town, R., Baker, T., Ungar, L., & Merchant, R. M. (2015). Twitter sentiment predicts Affordable Care Act marketplace enrollment. Journal of Medical Internet Research, 17(2), e51.
Yeung, N., Lai, J., & Luo, J. (2020). Face off: Polarized public opinions on personal face mask usage during the COVID-19 pandemic. IEEE International Conference on Big Data (Big Data), 2020, 4802–4810.
Article Google Scholar
Zhang, H., Gan, W., & Jiang, B. (2014). Machine learning and lexicon based methods for sentiment classification: A survey. In: 11th Web Information System and Application Conference, (pp. 262–265).
Zhang, X., Wang, Y., Lyu, H., Zhang, Y., Liu, Y., & Luo, J. (2021). The influence of COVID-19 on the well-being of people: Big data methods for capturing the well-being of working adults and protective factors nationwide. Frontiers in Psychology, 12, 2327.
Google Scholar

Download references

Funding

This work was supported by a grant (KL2 TR001999) from National Center for Advancing Translational Sciences (NCATS) at the National Institutes of Health (NIH). It was also supported by a National Institutes of Health Extramural Loan Repayment Award for Clinical Research (L30 MH120727).

Author information

Authors and Affiliations

Department of Psychiatry, University of Rochester Medical Center, 300 Crittenden Blvd, Rochester, NY, 14642, USA
Ian Cero
Department of Computer Science, University of Rochester, Rochester, NY, USA
Jiebo Luo
Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA
John Michael Falligant
Department of Behavioral Psychology, Kennedy Krieger Institute, Baltimore, MD, USA
John Michael Falligant

Authors

Ian Cero
View author publications
You can also search for this author in PubMed Google Scholar
Jiebo Luo
View author publications
You can also search for this author in PubMed Google Scholar
John Michael Falligant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ian Cero.

Ethics declarations

Conflicts of Interest

The authors declare that they have no financial or nonfinancial interests that are directly or indirectly related to the work submitted for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

For this worked example, we assume only a basic familiarity with the R programming language and the tidyverse suite of packages within it (Wickham & RStudio, 2017). We have intentionally written the code for maximum readability (sometimes at the cost of brevity), so even readers without this background should still be able to read along. Readers interested in brushing up on R and the tidyverse are encouraged to work through any of the excellent and freely available tutorials available online (Wickham & Grolemund, 2017). Readers interested in a more copy/paste-able format of this appendix can find the annotated raw code in our supplemental file here: https://osf.io/sp6mx/?view_only=cdcd6ff0df71417590672e34386e6beb .

A basic behaviorally informed sentiment analysis involves several steps, which we now demonstrate in order.

1.
Select a previously validated lexicon or create a new one
2.
Acquire raw verbal data (documents)
3.
Tokenize your documents and wrangle them into a “tidy” format.
4.
Remove stop words / stop tokens
5.
Use the lexicon to score each token
6.
Compute summary statistics (e.g., proportion of positive words)
7.
Analyze with standard behavior analytic methods (e.g., regression, visual analysis)

We will implement these steps to perform an analysis reminiscent of McDowell and Caron’s (2010) work connecting rule-break talk to received praise, in accordance with the GML. Except in this case, we will be examining whether two U.S. politicians—vice presidents Mike Pence and Kamala Harris—post tweets in accordance with the GML.

Step 1: Acquire an Appropriate Lexicon

Technically, steps 1 and 2 can be conducted out of order. We begin with the lexicon in this discussion simply because we needed to begin somewhere. When acquiring a lexicon, a researcher has two options. They can either utilize a prevalidated lexicon from previous research or create a new one. We encourage anyone new to sentiment analysis to use a pre-validated lexicon, which is both safer and faster. The lexicon you choose can come from a range of sources (Khoo & Johnkhan, 2018). The easiest to use will be those already available in an R package like tidytext (Silge & Robinson, 2022), which includes a helper function to download the lexicons displayed in Table 1. For most of the sentiment analysis a researcher would want to conduct in behavior analysis, these will be sufficient because they include several emotional categories that will get a researcher through their first few studies. By the time a researcher completes their first few studies with these well-known lexicons, they should already have a sense of the kind of things they would want in their next lexicon.

One other lexicon behavioral researchers should be aware of right away, however, is the Linguistic Inquiry and Word Count (LIWC; Boyd et al., 2022; Tausczik & Pennebaker, 2010). This lexicon was created for psychological research and has been evaluated and revised several times. It is especially valuable for its comprehensiveness, including many more word categories than is common in other lexicons (e.g., words related to cognitive processes, social processes, hierarchy). For this reason, LIWC has already been used extensively to study the connection between subjects’ linguistic content and a range of psychological topics and in a number of languages (Brandt & Herzberg, 2020; Cutler et al., 2021; Lumontod, 2020). Researchers who find themselves saying “I feel like the basic lexicons aren’t enough, I wish I had a lexicon that covered my niche topic” should immediately check whether LIWC covers their particular case.

In this case, we will use the National Research Council Word-Emotion Association Lexicon (NRC) lexicon, which was built up from a range of sources, including the preexisting WordNet affective lexicon and 8,000 terms from the General Inquirer (Mohammad & Turney, 2010, 2013). Previous work has used it specifically to study Twitter tweets, including the identification of suicide-related posts, predicting Affordable Care Act enrollment, and to evaluate global pandemic reactions (Dubey, 2020; Sarsam et al., 2021; Wong et al., 2015). This diversity of topics, including one using the NRC to predict overt behavior (e.g., insurance enrollment), all increase the plausibility this lexicon tracks behaviorally meaningful verbal content. It has the added advantage of being included in the tidytext R package, so we can load it directly like this.

With the NRC lexicon loaded into memory, we can narrow down the kind of sentiment we want to study in this analysis. Here, we retain only words that are related to the dimension of trust / mistrust. We expect this dimension is especially relevant to the occupational success of our two subjects, so it is likely to be a function of some salient reinforcer—like the number of “likes” from Twitter followers. Below, we also provide a random sample of the remaining trust-related words.

Step 2: Acquire Raw Verbal Data

Most researchers will already be aware of verbal data sources relevant to their research (e.g., intervention session transcripts), so we will avoid repeating some of the most common sources here. Instead, we point out that there are likely a few data sources that researchers have not previously considered. For example, some video conference platforms (e.g., Zoom) have built-in support for automatic transcription of recorded meetings. And readers will be pleased to learn that these transcriptions are both accurate and arrive in a standardized format. In a yet unpublished study, our own research group has already taken advantage of these resources, finding that the latency, duration, and content of speech is associated with intervention satisfaction, recall, and self-reported adoption at 1-month follow-up (manuscript in preparation).

Another example is Project Gutenberg, which provides digital versions of public domain literature. Although this is outside the scope of most modern behavior studies, we mention it to interested readers who might want to follow in Skinner’s early footsteps, which actually began with an analysis of alliteration in the works of William Shakespeare (Skinner, 1939).

The last approach—and the one we use for our worked example—is to use a REST API.^{Footnote 3} Usually shortened to just “API,” this is a system for communicating with a web server via code, rather than a point-and-click interface. This process requires some initial effort, but is often simpler than it sounds and is a quick way to access a substantial amount of data. One of the most well-known APIs in research is the Twitter API, which allows people outside of Twitter to access a substantial amount of granular data on the activity of Twitter’s users. To give readers a sense of the scope, the first author was able to gather 64 million tweets from 17 million different for a recent study—all for free (Cero & Witte, 2020).

Although a comprehensive introduction to APIs is beyond the scope of the current discussion, Twitter’s own tutorial is a great introduction and will remain up-to-date whenever they implement changes (Twitter, 2022). In practice, the process involves filling out a brief application to Twitter, who will then provide a set of tokens that function like a username and password. Researchers can then pass these tokens and a search query to an R package (“academictwitteR”) that knows how to handle the Twitter API (Barrie et al., 2022), doing most of the work under the hood.

For example, to save the roughly 30,000 posts Pence and Harris have produced from 2016 through 2021, a researcher simply provides their bearer token from Twitter, a formatted search query for tweets from Pence’s and Harris’s accounts, the dates to search through, and a data path (folder) in which to save the results.

The get_all_tweets() function unfortunately saves tweet and user data in separate places, so we’ll need to load and merge them ourselves. For the loading, we can use the bind_tweets() function from the academictwitteR package to bring the tweets and user data into memory. Along the way, we extract (unnest(public_metrics)) some information about each tweet, including the like_count—our putative reinforcer in this mini matching study. We’ll also filter out retweets (which always start with an “RT”), retaining only the tweets generated by Pence and Harris themselves. By coincidence, this leaves exactly 18,000 tweets in total.

We can then use the left_join() function of the tidyverse package to add user information to each tweet.

Step 3: Tokenize Your Documents and Wrangle Them into a “Tidy” Format

In its current form, our full_df dataframe stores each tweet as a line of text. Although it is easy to read, this makes it hard for our code to access each individual word and compare it to the entries in our NRC lexicon. To get around this, we need to tokenize all of our tweets, so that each row of our dataframe will represent a single word. This is called the tidy format in R. Fortunately, the tidytext package makes this process easy, providing us with the unnest_tokens() function that handles everything automatically. We simply tell it we want a new column named word, which is made up of the individual words from the old text column. Careful readers will thus notice the first several entries in the word column of the tokenized_df now represent the first several words of the first text in the text column of the full_df.

Step 4: Remove Stopwords from the Dataset

Stop words or stop tokens (in the case of multiword n-grams) are those that occur so often that they are uninformative to the meaning of a text (e.g., “of,” “and,” “the”). Fortunately, just by loading the tidytext package, we have already loaded a precompiled list of stopwords called stop_words in the background. Thus, the quickest way to get these stopwords out of our tokenized_df dataframe is simply to anti_join() them. In an anti-join or anti-merge, only the records from the first dataset (tokenized_df) that DO NOT match anything in the second dataset (stop_words) are retained.

While we are removing unhelpful tokens, we’ll also filter out “t.co” and “https.” Visual inspection of our tokenized_df revealed these are both fragments of web links Pence and Harris posted in some of their tweets, which were accidentally included during the tokenization process (unnest_tokens() thought they were words worth retaining). Because our lexicon does not cover them, we can explicitly filter them out here too.

Step 5: Use the Lexicon to Score Each Token

We expect this likely sounds as though it will be the most intensive part of sentiment analysis. After all, we estimated for McDowell and Caron’s group to hand-score a much smaller sample of text likely took over 140 person hr. Scoring all the words from 18,000 tweets must be quite laborious, right? In fact, all of our words are effectively scored with just two lines of code, which join the words from our observed dataset to the values in our NRC trust lexicon.

A minor snag is that our nrc_trust lexicon only includes words that are trust-related. It produces missing values for everything on which the lexicon is silent (i.e., nontrust words). To simplify our upcoming analysis, we’ll compute a new true/false column called trust_word, which will indicate whether a given word in our dataset is a trust word, based on the values in the adjacent sentiment column.

As a quick sanity check, we’ll now peak at a random sample of trust and nontrust words from both subjects.

Here, we get a quick sense of the kinds of trust- and nontrust-related words each subject might be using. These randomly selected words are overall somewhat banal, but they are consistent with what we would expect. Words like “system” imply something that needs to be relied on, so they exist somewhere along a dimension of trustworthiness. Words like “fair” are morally salient, but do not imply something related to reliance and thus not scored as trust-related. The same is true of words like persecution, which is unfair to be sure, but does not indicate a dimension of trust.

Step 6: Compute Summary Statistics

For our upcoming matching analysis, we’ll want to know whether each subject produces tweets with trust-related words in proportion to the likes those tweets received. To get this far, we needed to break up (“tokenize”) whole tweets into individual words, so that we could score those words with a lexicon. Now that they have been scored in the trust_word column, we need to start going in reverse. We need to recombine words into tweets and summarize each tweet by whether any of its words is a trust word.

Once individual tweets have been scored in the is_trust_tweet column, we arrange tweets by their ID numbers (which are strictly in order of date produced) and assign them to blocks of 50 people. This final line, block = floor(row_number() - 1) / 50, is just a shorthand way of saying “take the row number of each tweet, divide by 50, and round down to the nearest integer, and treat that as its block number.”

With blocks assigned, we simply compute the familiar matching statistics. One trick to note is that R will treat TRUE and FALSE values as 1 and 0 when they are forced into mathematical computations. Thus, sum(like_count*is_trust_tweet) can be read “the sum of likes produced when is_trust_tweet is true.” We also proactively filter() to retain only cases where the log_b and log_r are still finite, which in this case is all of them because there were no blocks with 0 trust/non-trust tweets or 0 likes for either of those cases.

Step 7: Analyze with Standard Behavior-Analytic Methods (e.g., Visual
Analysis, Regression)

At this point, all that is left to do is perform a matching analysis. Because we have two subjects who will need separate regressions, we use the group-nest-map-tidy-unnest approach. It is probably overkill for only two regressions, but in the common case that a matching analysis includes a half-dozen or more subjects to regress, this strategy is both faster and safer than copy-pasting code.

Unnesting the regression coefficients reveals some interesting results. Both subjects are highly sensitive to the likes associated with trust-related words.

What is even more interesting are these substantial bias terms, which suggest that even if trust-related tweets produced likes in equal proportion to nontrust-related tweets, both subjects would still produce trust-related tweets in substantial excess. In particular, even if the likes received for each tweet type were perfectly balanced, Harris would be expected to produce 2^0.278 = 1.21x more trust-related tweets than nontrust related ones—and Pence would produce 1.34x more.

Examining the efficacy of the matching model to explain such behavior, note that the R-squared values for both subjects are significant, but much higher for Pence. Combined with a sensitivity very near 1.0 for this subject, such a finding suggests this learning model is a compelling (if as yet, nonexperimental) account of his verbal behavior over many years.

We can see this by visually examining the log behavior and log reinforcement rates for each subject for each block, observing that Pence’s blocks conform much more closely to the theoretically perfect matching (the dashed line). (Fig. 1).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cero, I., Luo, J. & Falligant, J.M. Lexicon-Based Sentiment Analysis in Behavioral Research. Perspect Behav Sci 47, 283–310 (2024). https://doi.org/10.1007/s40614-023-00394-x

Download citation

Accepted: 18 December 2023
Published: 24 January 2024
Issue Date: March 2024
DOI: https://doi.org/10.1007/s40614-023-00394-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lexicon-Based Sentiment Analysis in Behavioral Research

Abstract

Access this article