Skip to main content

Token Distribution Analysis

  • Chapter
  • First Online:
Text Analysis with R

Abstract

This chapter expands upon the introduction to regular expressions and introduces several new functions including seq_along, rbind, apply, and do.call. if conditionals and for loops are also presented as we explore how to identify chapter breaks and build a distribution plot based on chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Do not be alarmed if you see a series of backslash characters in the text. These are escape characters that R adds before quotation marks and apostrophes so that they will not be treated as special characters and parsed by R.

  2. 2.

    Recall that the start of a line is marked by use of the caret symbol: ˆ.

  3. 3.

    You might be wondering what to do if the text you are analyzing does not happen to include the final line THE END. A simple solution would be to add a new line to the end of the novel_lines_v vector. You could add the words THE END as a final line, or it could simply be blank. And, naturally, there are other more sophisticated ways of writing your code so that you do not have to do any of this, but that is more than we want to get into in this introductory text.

  4. 4.

    Using i is a matter of convention. You could name this variable anything that you wish: e.g., my.int, x, etc.

  5. 5.

    It might seem a bit odd, but in R even objects containing only one item are vectors. So in this example the y object is a vector of one item. If you simply enter y into the console, you will get a bracketed number 1 [1] followed by the value 2, which is the value held in the first (and only) position of the y vector.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

L. Jockers, M., Thalken, R. (2020). Token Distribution Analysis. In: Text Analysis with R. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-39643-5_5

Download citation

Publish with us

Policies and ethics