Skip to main content
Log in

Lexicon-Based Sentiment Analysis in Behavioral Research

  • SI: Big Data and Behavior Science
  • Published:
Perspectives on Behavior Science Aims and scope Submit manuscript

Abstract

A complete science of human behavior requires a comprehensive account of the verbal behavior those humans exhibit. Existing behavioral theories of such verbal behavior have produced compelling insight into language’s underlying function, but the expansive program of research those theories deserve has unfortunately been slow to develop. We argue that the status quo’s manually implemented and study-specific coding systems are too resource intensive to be worthwhile for most behavior analysts. These high input costs in turn discourage research on verbal behavior overall. We propose lexicon-based sentiment analysis as a more modern and efficient approach to the study of human verbal products, especially naturally occurring ones (e.g., psychotherapy transcripts, social media posts). In the present discussion, we introduce the reader to principles of sentiment analysis, highlighting its usefulness as a behavior analytic tool for the study of verbal behavior. We conclude with an outline of approaches for handling some of the more complex forms of speech, like negation, sarcasm, and speculation. The appendix also provides a worked example of how sentiment analysis could be applied to existing questions in behavior analysis, complete with code that readers can incorporate into their own work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

  1. Whether these key assumptions hold well enough for a given application is of course an empirical question. However, some version of them must be true in order for humans to communicate at all. For if the meanings of words were totally unique from context to context, verbal communication itself would be impossible.

  2. If the reader is having trouble visualizing how all of the components fit together, this is understandable. A sentiment analysis often involves a few different components that come together to produce the desired output. Because understanding that process is often easier with a concrete example, we have prepared a worked illustration with a familiar analysis question (e.g., related to the matching law) in the Appendix.

  3. This intimidating acronym stands for Representational State Transfer Application Programming interface, but that is not especially informative to most readers.

References

  • Araujo, M., Reis, J., Pereira, A., & Benevenuto, F. (2016). An evaluation of machine translation for multilingual sentence-level sentiment analysis. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, (pp. 1140–1145). https://doi.org/10.1145/2851613.2851817

  • Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1(1), 91.

  • Bailey, J. D., Baker, J. C., Rzeszutek, M. J., & Lanovaz, M. J. (2021). Machine learning for supplementing behavioral assessment. Perspectives on Behavior Science, 44(4), 605–619.

    Article  PubMed  PubMed Central  Google Scholar 

  • Barnes-Holmes, D., Hayden, E., Barnes-Holmes, Y., & Stewart, I. (2008). The implicit relational assessment procedure (IRAP) as a response-time and event-related-potentials methodology for testing natural verbal relations: A preliminary study. Psychological Record, 58(4), 497–515.

    Article  Google Scholar 

  • Barrie, C., Ho, J. C., Chan, C., Rico, N., König, T., & Davidson, T. (2022). academictwitteR: Access the Twitter Academic Research Product Track V2 API Endpoint (0.3.1) [Computer software]. https://CRAN.R-project.org/package=academictwitteR

  • Becirevic, A., Critchfield, T. S., & Reed, D. D. (2016). On the social acceptability of behavior-analytic terms: Crowdsourced comparisons of lay and technical language. The Behavior Analyst, 39, 305–317.

  • Becirevic, A., Reed, D. D., Amlung, M., Murphy, J. G., Stapleton, J. L., & Hillhouse, J. J. (2017). An initial study of behavioral addiction symptom severity and demand for indoor tanning. Experimental and Clinical Psychopharmacology, 25(5), 346.

  • Boyd, R. L., Ashokkumar, A., Seraj, S., & Pennebaker, J. W. (2022). The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin, pp 1–47.

  • Brandt, P. M., & Herzberg, P. Y. (2020). Is a cover letter still needed? Using LIWC to predict application success. International Journal of Selection & Assessment, 28(4), 417–429.

    Article  Google Scholar 

  • Cero, I., & Witte, T. K. (2020). Assortativity of suicide-related posting on social media. American Psychologist, 75(3), 365–379. https://doi.org/10.1037/amp0000477

    Article  PubMed  Google Scholar 

  • Cieliebak, M., Dürr, O., & Uzdilli, F. (2013). Potential and limitations of commercial sentiment detection tools. In: ESSEM@ AI* IA, (pp. 47–58).

  • Critchfield, T. S., Becirevic, A., & Reed, D. D. (2016). In Skinner's early footsteps: Analyzing verbal behavior in large published corpora. The Psychological Record, 66, 639–647. 

  • Critchfield, T. S., & Doepke, K. J. (2018). Emotional overtones of behavior analysis terms in English and five other languages. Behavior Analysis in Practice, 11, 97–105.

  • Critchfield, T. S., Doepke, K. J., Kimberly Epting, L., Becirevic, A., Reed, D. D., Fienup, D. M., ... & Ecott, C. L. (2017). Normative emotional responses to behavior analysis jargon or how not to use words to win friends and influence people. Behavior Analysis in Practice, 10, 97–106.

  • Cutler, A. D., Carden, S. W., Dorough, H. L., & Holtzman, N. S. (2021). Inferring grandiose narcissism from text: LIWC versus machine learning. Journal of Language & Social Psychology, 40(2), 260–276.

    Article  Google Scholar 

  • De Choudhury, M., Counts, S., Horvitz, E. J., & Hoff, A. (2014). Characterizing and predicting postpartum depression from shared facebook data. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing—CSCW 14, 626–638. https://doi.org/10.1145/2531602.2531675

  • De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G., & Kumar, M. (2016). Discovering shifts to suicidal ideation from mental health content in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems—CHI Conference, 2016, (pp. 2098–2110). https://doi.org/10.1145/2858036.2858207

  • Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., & Danforth, C. M. (2011). Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS ONE, 6(12), 1–26. https://doi.org/10.1371/journal.pone.0026752

    Article  Google Scholar 

  • Dragut, E., & Fellbaum, C. (2014, June). The role of adverbs in sentiment analysis. In Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014) (pp. 38–41).

  • Dragut, E. C., Wang, H., Sistla, P., Yu, C., & Meng, W. (2014). Polarity consistency checking for domain independent sentiment dictionaries. IEEE Transactions on Knowledge and Data Engineering, 27(3), 838–851. 

  • Dubey, S., Biswas, P., Ghosh, R., Chatterjee, S., Dubey, M. J., Chatterjee, S., & Lavie, C. J. (2020). Psychosocial impact of COVID-19. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(5), 779–788.

  • Duong, V., Luo, J., Pham, P., Yang, T., & Wang, Y. (2020). The ivory tower lost: How college students respond differently than the general public to the covid-19 pandemic. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020, 126–130.

    Google Scholar 

  • Emerson, G., & Declerck, T. (2014, August). SentiMerge: Combining sentiment lexicons in a Bayesian framework. In Proceedings of workshop on lexical and grammatical resources for language processing (pp. 30–38). 

  • Friman, P. C., Hayes, S. C., & Wilson, K. G. (1998). Why behavior analysts should study emotion: The example of anxiety. Journal of Applied Behavior Analysis, 31(1), 137–156.

    Article  PubMed  PubMed Central  Google Scholar 

  • Hayes, S. C., Barnes-Holmes, D., & Roche, B. (Eds.). (2001). Relational frame theory: A post-Skinnerian account of human language and cognition (2001st ed.). Springer.

    Google Scholar 

  • Herrnstein, R. J. (1970). On the law of effect. Journal of the Experimental Analysis of Behavior, 13(2), 243–266.

    Article  PubMed  PubMed Central  Google Scholar 

  • Hii, D. (2019). Using meaning specificity to aid negation handling in sentiment analysis.

  • Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (pp. 168–177).

  • Hussey, I., Daly, T., & Barnes-Holmes, D. (2015). Life is good, but death ain’t bad either: Counter-intuitive implicit biases to death in a normative population. Psychological Record, 65(4), 731–742. https://doi.org/10.1007/s40732-015-0142-3

    Article  Google Scholar 

  • Imtiaz, A., Khan, D., Lyu, H., & Luo, J. (2022). Taking sides: Public opinion over the Israel-Palestine Conflict in 2021. arXiv Preprint arXiv:2201.05961.

  • Jia, J. (2009). An AI framework to teach English as a foreign language: CSIEC. Ai Magazine, 30(2), 59–59. 

  • Joshi, A., Bhattacharyya, P., & Carman, M. J. (2016). Automatic sarcasm detection: A survey (arXiv:1602.03426). arXiv. http://arxiv.org/abs/1602.03426

  • Jurafsky, D., & Martin, J. (2008). Speech and language processing (2nd ed.). Prentice Hall.

    Google Scholar 

  • Kaity, M., & Balakrishnan, V. (2020). Sentiment lexicons and non-English languages: A survey. Knowledge & Information Systems, 62(12), 4445–4480. https://doi.org/10.1007/s10115-020-01497-6

    Article  Google Scholar 

  • Khoo, C. S., & Johnkhan, S. B. (2018). Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons. Journal of Information Science, 44(4), 491–511. https://doi.org/10.1177/0165551517703514

    Article  Google Scholar 

  • Kiritchenko, S., & Mohammad, S. (2017). The effect of negators, modals, and degree adverbs on sentiment composition. arXiv Preprint arXiv:1712.01794.

  • Kotelnikova, A., Paschenko, D., Bochenina, K., & Kotelnikov, E. (2021). Lexicon-based Methods vs. BERT for Text Sentiment Analysis. arXiv Preprint arXiv:2111.10097.

  • Lanovaz, M. J., Giannakakos, A. R., & Destras, O. (2020). Machine learning to analyze single-case data: A proof of concept. Perspectives on Behavior Science, 43(1), 21–38.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lanovaz, M. J., & Hranchuk, K. (2021). Machine learning to analyze single-case graphs: A comparison to visual inspection. Journal of Applied Behavior Analysis, 54(4), 1541–1552.

    Article  PubMed  Google Scholar 

  • Liu, B. (2020). Sentiment analysis: Mining opinions, sentiments, and emotions (2nd ed.). Cambridge University Press.

    Book  Google Scholar 

  • Lanovaz, R. Z. III (2020). Seeing the invisible: Extracting signs of depression and suicidal ideation from college students’ writing using LIWC a computerized text analysis. International Journal of Research, 9(4), 31–44.

    Google Scholar 

  • Lumontod III, R. Z. (2020). Seeing the invisible: Extracting signs of depression and suicidal ideation from college students' writing using LIWC a computerized text analysis. International Journal of Research Studies in Education, 9, 31–44.

  • Luna, O. (2019). Matching analyses as an evaluative tool: Characterizing behavior in juvenile residential settings.

  • McDowell, J. J. (2013). On the theoretical and empirical status of the matching law and matching theory. Psychological Bulletin, 139(5), 1000–1028. https://doi.org/10.1037/a0029924

    Article  PubMed  Google Scholar 

  • McDowell, J. J., & Caron, M. L. (2010). Matching in an undisturbed natural human environment. Journal of the Experimental Analysis of Behavior, 93(3), 415–433.

    Article  PubMed  PubMed Central  Google Scholar 

  • Mohammad, S., & Turney, P. (2010, June). Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text (pp. 26–34).

  • Mohammad, S., & Turney, P. D. (2013). NRC emotion lexicon. National Research Council, Canada, 2.

  • Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs (arXiv:1103.2903). arXiv. https://doi.org/10.48550/arXiv.1103.2903

  • Normand, M. P., & Donohue, H. E. (2022). Behavior analytic jargon does not seem to influence treatment acceptability ratings. Journal of Applied Behavior Analysis, 55(4), 1294–1305.

  • O’Reilly, A., Roche, B., Ruiz, M., Tyndall, I., & Gavin, A. (2012). The function acquisition speed test (fast): A behavior analytic implicit test for assessing stimulus relations. Psychological Record, 62(3), 507–528.

    Article  Google Scholar 

  • Palmer, D. C. (2023). Toward a behavioral interpretation of english grammar. Perspectives on Behavior Science. https://doi.org/10.1007/s40614-023-00368-z

  • Pröllochs, N., Feuerriegel, S., & Neumann, D. (2015). Enhancing sentiment analysis of financial news by detecting negation scopes. In: 48th Hawaii International Conference on System Sciences, (pp. 959–968). https://doi.org/10.1109/HICSS.2015.119

  • Reed, D. D. (2016). Matching theory applied to MLB team-fan social media interactions: An opportunity for behavior analysis.

  • Rhanoui, M., Mikram, M., Yousfi, S., & Barzali, S. (2019). A CNN-BiLSTM model for document-level sentiment analysis. Machine Learning & Knowledge Extraction, 1(3), 832–847.

    Article  Google Scholar 

  • Salameh, M., Mohammad, S., & Kiritchenko, S. (2015). Sentiment after translation: A case-study on arabic social media posts. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 767–777. https://doi.org/10.3115/v1/N15-1078

  • Sarsam, S. M., Al-Samarraie, H., Alzahrani, A. I., Alnumay, W., & Smith, A. P. (2021). A lexicon-based approach to detecting suicide-related messages on Twitter. Biomedical Signal Processing and Control, 65, 102355.

  • Schneider, A., & Dragut, E. (2015, July). Towards debugging sentiment lexicons. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1024–1034). 

  • Silge, J., & Robinson, D. (2022). Text mining with R: A tidy approach (2022-05-03 ed.). https://www.tidytextmining.com/

  • Simon, C., & Baum, W. M. (2017). Allocation of speech in conversation. Journal of the Experimental Analysis of Behavior, 107(2), 258–278. https://doi.org/10.1002/jeab.249

    Article  PubMed  Google Scholar 

  • Skinner, B. F. (1939). Alliteration in Shakespeare’s sonnets: A study in Liberary behavior. The Psychological Record, 3, 185.

    Google Scholar 

  • Skinner, B. F. (1957). Verbal behavior. Copley Publishing Group.

    Book  Google Scholar 

  • Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., & Lin, J. (2019). Distilling task-specific knowledge from bert into simple neural networks. arXiv Preprint arXiv:1903.12136.

  • Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language & Social Psychology, 29(1), 24–54.

    Article  Google Scholar 

  • Taylor, T., & Lanovaz, M. J. (2021). Machine learning to support visual inspection of data: A clinical application. Behavior Modification, 46(5), 1109–1136. https://doi.org/10.1177/01454455211038208

    Article  PubMed  Google Scholar 

  • Turgeon, S., & Lanovaz, M. J. (2020). Tutorial: Applying machine learning in behavioral research. Perspectives on Behavior Science, 43(4), 697–723.

    Article  PubMed  PubMed Central  Google Scholar 

  • Turgeon, S., & Lanovaz, M. J. (2021). Perceptions of behavior analysis in France: Accuracy and tone of posts in an internet forum on autism. Behavior & Social Issues, 30, 308–322.

    Article  Google Scholar 

  • Wickham, H., & Grolemund, G. (2017). R for Data Science: Import, tidy, transform, visualize, and model data. O’Reilly Media.

    Google Scholar 

  • Wickham, H., & RStudio. (2017). tidyverse: Easily install and load the “tidyverse” [Computer software]. https://CRAN.R-project.org/package=tidyverse

  • Wong, C. A., Sap, M., Schwartz, A., Town, R., Baker, T., Ungar, L., & Merchant, R. M. (2015). Twitter sentiment predicts Affordable Care Act marketplace enrollment. Journal of Medical Internet Research, 17(2), e51.

  • Yeung, N., Lai, J., & Luo, J. (2020). Face off: Polarized public opinions on personal face mask usage during the COVID-19 pandemic. IEEE International Conference on Big Data (Big Data), 2020, 4802–4810.

    Article  Google Scholar 

  • Zhang, H., Gan, W., & Jiang, B. (2014). Machine learning and lexicon based methods for sentiment classification: A survey. In: 11th Web Information System and Application Conference, (pp. 262–265).

  • Zhang, X., Wang, Y., Lyu, H., Zhang, Y., Liu, Y., & Luo, J. (2021). The influence of COVID-19 on the well-being of people: Big data methods for capturing the well-being of working adults and protective factors nationwide. Frontiers in Psychology, 12, 2327.

    Google Scholar 

Download references

Funding

This work was supported by a grant (KL2 TR001999) from National Center for Advancing Translational Sciences (NCATS) at the National Institutes of Health (NIH). It was also supported by a National Institutes of Health Extramural Loan Repayment Award for Clinical Research (L30 MH120727).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ian Cero.

Ethics declarations

Conflicts of Interest

The authors declare that they have no financial or nonfinancial interests that are directly or indirectly related to the work submitted for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

For this worked example, we assume only a basic familiarity with the R programming language and the tidyverse suite of packages within it (Wickham & RStudio, 2017). We have intentionally written the code for maximum readability (sometimes at the cost of brevity), so even readers without this background should still be able to read along. Readers interested in brushing up on R and the tidyverse are encouraged to work through any of the excellent and freely available tutorials available online (Wickham & Grolemund, 2017). Readers interested in a more copy/paste-able format of this appendix can find the annotated raw code in our supplemental file here: https://osf.io/sp6mx/?view_only=cdcd6ff0df71417590672e34386e6beb .

A basic behaviorally informed sentiment analysis involves several steps, which we now demonstrate in order.

  1. 1.

    Select a previously validated lexicon or create a new one

  2. 2.

    Acquire raw verbal data (documents)

  3. 3.

    Tokenize your documents and wrangle them into a “tidy” format.

  4. 4.

    Remove stop words / stop tokens

  5. 5.

    Use the lexicon to score each token

  6. 6.

    Compute summary statistics (e.g., proportion of positive words)

  7. 7.

    Analyze with standard behavior analytic methods (e.g., regression, visual analysis)

We will implement these steps to perform an analysis reminiscent of McDowell and Caron’s (2010) work connecting rule-break talk to received praise, in accordance with the GML. Except in this case, we will be examining whether two U.S. politicians—vice presidents Mike Pence and Kamala Harris—post tweets in accordance with the GML.

  • Step 1: Acquire an Appropriate Lexicon

Technically, steps 1 and 2 can be conducted out of order. We begin with the lexicon in this discussion simply because we needed to begin somewhere. When acquiring a lexicon, a researcher has two options. They can either utilize a prevalidated lexicon from previous research or create a new one. We encourage anyone new to sentiment analysis to use a pre-validated lexicon, which is both safer and faster. The lexicon you choose can come from a range of sources (Khoo & Johnkhan, 2018). The easiest to use will be those already available in an R package like tidytext (Silge & Robinson, 2022), which includes a helper function to download the lexicons displayed in Table 1. For most of the sentiment analysis a researcher would want to conduct in behavior analysis, these will be sufficient because they include several emotional categories that will get a researcher through their first few studies. By the time a researcher completes their first few studies with these well-known lexicons, they should already have a sense of the kind of things they would want in their next lexicon.

One other lexicon behavioral researchers should be aware of right away, however, is the Linguistic Inquiry and Word Count (LIWC; Boyd et al., 2022; Tausczik & Pennebaker, 2010). This lexicon was created for psychological research and has been evaluated and revised several times. It is especially valuable for its comprehensiveness, including many more word categories than is common in other lexicons (e.g., words related to cognitive processes, social processes, hierarchy). For this reason, LIWC has already been used extensively to study the connection between subjects’ linguistic content and a range of psychological topics and in a number of languages (Brandt & Herzberg, 2020; Cutler et al., 2021; Lumontod, 2020). Researchers who find themselves saying “I feel like the basic lexicons aren’t enough, I wish I had a lexicon that covered my niche topic” should immediately check whether LIWC covers their particular case.

In this case, we will use the National Research Council Word-Emotion Association Lexicon (NRC) lexicon, which was built up from a range of sources, including the preexisting WordNet affective lexicon and 8,000 terms from the General Inquirer (Mohammad & Turney, 2010, 2013). Previous work has used it specifically to study Twitter tweets, including the identification of suicide-related posts, predicting Affordable Care Act enrollment, and to evaluate global pandemic reactions (Dubey, 2020; Sarsam et al., 2021; Wong et al., 2015). This diversity of topics, including one using the NRC to predict overt behavior (e.g., insurance enrollment), all increase the plausibility this lexicon tracks behaviorally meaningful verbal content. It has the added advantage of being included in the tidytext R package, so we can load it directly like this.

figure a

With the NRC lexicon loaded into memory, we can narrow down the kind of sentiment we want to study in this analysis. Here, we retain only words that are related to the dimension of trust / mistrust. We expect this dimension is especially relevant to the occupational success of our two subjects, so it is likely to be a function of some salient reinforcer—like the number of “likes” from Twitter followers. Below, we also provide a random sample of the remaining trust-related words.

figure b
  • Step 2: Acquire Raw Verbal Data

Most researchers will already be aware of verbal data sources relevant to their research (e.g., intervention session transcripts), so we will avoid repeating some of the most common sources here. Instead, we point out that there are likely a few data sources that researchers have not previously considered. For example, some video conference platforms (e.g., Zoom) have built-in support for automatic transcription of recorded meetings. And readers will be pleased to learn that these transcriptions are both accurate and arrive in a standardized format. In a yet unpublished study, our own research group has already taken advantage of these resources, finding that the latency, duration, and content of speech is associated with intervention satisfaction, recall, and self-reported adoption at 1-month follow-up (manuscript in preparation).

Another example is Project Gutenberg, which provides digital versions of public domain literature. Although this is outside the scope of most modern behavior studies, we mention it to interested readers who might want to follow in Skinner’s early footsteps, which actually began with an analysis of alliteration in the works of William Shakespeare (Skinner, 1939).

The last approach—and the one we use for our worked example—is to use a REST API.Footnote 3 Usually shortened to just “API,” this is a system for communicating with a web server via code, rather than a point-and-click interface. This process requires some initial effort, but is often simpler than it sounds and is a quick way to access a substantial amount of data. One of the most well-known APIs in research is the Twitter API, which allows people outside of Twitter to access a substantial amount of granular data on the activity of Twitter’s users. To give readers a sense of the scope, the first author was able to gather 64 million tweets from 17 million different for a recent study—all for free (Cero & Witte, 2020).

Although a comprehensive introduction to APIs is beyond the scope of the current discussion, Twitter’s own tutorial is a great introduction and will remain up-to-date whenever they implement changes (Twitter, 2022). In practice, the process involves filling out a brief application to Twitter, who will then provide a set of tokens that function like a username and password. Researchers can then pass these tokens and a search query to an R package (“academictwitteR”) that knows how to handle the Twitter API (Barrie et al., 2022), doing most of the work under the hood.

For example, to save the roughly 30,000 posts Pence and Harris have produced from 2016 through 2021, a researcher simply provides their bearer token from Twitter, a formatted search query for tweets from Pence’s and Harris’s accounts, the dates to search through, and a data path (folder) in which to save the results.

figure c

The get_all_tweets() function unfortunately saves tweet and user data in separate places, so we’ll need to load and merge them ourselves. For the loading, we can use the bind_tweets() function from the academictwitteR package to bring the tweets and user data into memory. Along the way, we extract (unnest(public_metrics)) some information about each tweet, including the like_count—our putative reinforcer in this mini matching study. We’ll also filter out retweets (which always start with an “RT”), retaining only the tweets generated by Pence and Harris themselves. By coincidence, this leaves exactly 18,000 tweets in total.

We can then use the left_join() function of the tidyverse package to add user information to each tweet.

figure d
  • Step 3: Tokenize Your Documents and Wrangle Them into a “Tidy” Format

In its current form, our full_df dataframe stores each tweet as a line of text. Although it is easy to read, this makes it hard for our code to access each individual word and compare it to the entries in our NRC lexicon. To get around this, we need to tokenize all of our tweets, so that each row of our dataframe will represent a single word. This is called the tidy format in R. Fortunately, the tidytext package makes this process easy, providing us with the unnest_tokens() function that handles everything automatically. We simply tell it we want a new column named word, which is made up of the individual words from the old text column. Careful readers will thus notice the first several entries in the word column of the tokenized_df now represent the first several words of the first text in the text column of the full_df.

figure e
  • Step 4: Remove Stopwords from the Dataset

Stop words or stop tokens (in the case of multiword n-grams) are those that occur so often that they are uninformative to the meaning of a text (e.g., “of,” “and,” “the”). Fortunately, just by loading the tidytext package, we have already loaded a precompiled list of stopwords called stop_words in the background. Thus, the quickest way to get these stopwords out of our tokenized_df dataframe is simply to anti_join() them. In an anti-join or anti-merge, only the records from the first dataset (tokenized_df) that DO NOT match anything in the second dataset (stop_words) are retained.

While we are removing unhelpful tokens, we’ll also filter out “t.co” and “https.” Visual inspection of our tokenized_df revealed these are both fragments of web links Pence and Harris posted in some of their tweets, which were accidentally included during the tokenization process (unnest_tokens() thought they were words worth retaining). Because our lexicon does not cover them, we can explicitly filter them out here too.

figure f
  • Step 5: Use the Lexicon to Score Each Token

We expect this likely sounds as though it will be the most intensive part of sentiment analysis. After all, we estimated for McDowell and Caron’s group to hand-score a much smaller sample of text likely took over 140 person hr. Scoring all the words from 18,000 tweets must be quite laborious, right? In fact, all of our words are effectively scored with just two lines of code, which join the words from our observed dataset to the values in our NRC trust lexicon.

figure g

A minor snag is that our nrc_trust lexicon only includes words that are trust-related. It produces missing values for everything on which the lexicon is silent (i.e., nontrust words). To simplify our upcoming analysis, we’ll compute a new true/false column called trust_word, which will indicate whether a given word in our dataset is a trust word, based on the values in the adjacent sentiment column.

figure h

As a quick sanity check, we’ll now peak at a random sample of trust and nontrust words from both subjects.

figure i

Here, we get a quick sense of the kinds of trust- and nontrust-related words each subject might be using. These randomly selected words are overall somewhat banal, but they are consistent with what we would expect. Words like “system” imply something that needs to be relied on, so they exist somewhere along a dimension of trustworthiness. Words like “fair” are morally salient, but do not imply something related to reliance and thus not scored as trust-related. The same is true of words like persecution, which is unfair to be sure, but does not indicate a dimension of trust.

  • Step 6: Compute Summary Statistics

For our upcoming matching analysis, we’ll want to know whether each subject produces tweets with trust-related words in proportion to the likes those tweets received. To get this far, we needed to break up (“tokenize”) whole tweets into individual words, so that we could score those words with a lexicon. Now that they have been scored in the trust_word column, we need to start going in reverse. We need to recombine words into tweets and summarize each tweet by whether any of its words is a trust word.

figure j

Once individual tweets have been scored in the is_trust_tweet column, we arrange tweets by their ID numbers (which are strictly in order of date produced) and assign them to blocks of 50 people. This final line, block = floor(row_number() - 1) / 50, is just a shorthand way of saying “take the row number of each tweet, divide by 50, and round down to the nearest integer, and treat that as its block number.”

figure k

With blocks assigned, we simply compute the familiar matching statistics. One trick to note is that R will treat TRUE and FALSE values as 1 and 0 when they are forced into mathematical computations. Thus, sum(like_count*is_trust_tweet) can be read “the sum of likes produced when is_trust_tweet is true.” We also proactively filter() to retain only cases where the log_b and log_r are still finite, which in this case is all of them because there were no blocks with 0 trust/non-trust tweets or 0 likes for either of those cases.

figure l
  • Step 7: Analyze with Standard Behavior-Analytic Methods (e.g., Visual

  • Analysis, Regression)

At this point, all that is left to do is perform a matching analysis. Because we have two subjects who will need separate regressions, we use the group-nest-map-tidy-unnest approach. It is probably overkill for only two regressions, but in the common case that a matching analysis includes a half-dozen or more subjects to regress, this strategy is both faster and safer than copy-pasting code.

figure m

Unnesting the regression coefficients reveals some interesting results. Both subjects are highly sensitive to the likes associated with trust-related words.

figure n

What is even more interesting are these substantial bias terms, which suggest that even if trust-related tweets produced likes in equal proportion to nontrust-related tweets, both subjects would still produce trust-related tweets in substantial excess. In particular, even if the likes received for each tweet type were perfectly balanced, Harris would be expected to produce 2^0.278 = 1.21x more trust-related tweets than nontrust related ones—and Pence would produce 1.34x more.

Examining the efficacy of the matching model to explain such behavior, note that the R-squared values for both subjects are significant, but much higher for Pence. Combined with a sensitivity very near 1.0 for this subject, such a finding suggests this learning model is a compelling (if as yet, nonexperimental) account of his verbal behavior over many years.

figure o

We can see this by visually examining the log behavior and log reinforcement rates for each subject for each block, observing that Pence’s blocks conform much more closely to the theoretically perfect matching (the dashed line). (Fig. 1).

figure p
Fig. 1
figure 1

Matching analyses. Note. R output for the log2 behavior and reinforcement rates by block for both subjects. Solid lines represent empirically observed regression slopes and gray ribbons represent confidence bounds. Dashed lines represent theoretically perfect matching

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cero, I., Luo, J. & Falligant, J.M. Lexicon-Based Sentiment Analysis in Behavioral Research. Perspect Behav Sci 47, 283–310 (2024). https://doi.org/10.1007/s40614-023-00394-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40614-023-00394-x

Keywords

Navigation