Token Distribution Analysis

L. Jockers, Matthew; Thalken, Rosamond

doi:10.1007/978-3-030-39643-5_5

Matthew L. Jockers⁸ &
Rosamond Thalken⁹

Part of the book series: Quantitative Methods in the Humanities and Social Sciences ((QMHSS))

4045 Accesses

Abstract

This chapter expands upon the introduction to regular expressions and introduces several new functions including seq_along, rbind, apply, and do.call. if conditionals and for loops are also presented as we explore how to identify chapter breaks and build a distribution plot based on chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Do not be alarmed if you see a series of backslash characters in the text. These are escape characters that R adds before quotation marks and apostrophes so that they will not be treated as special characters and parsed by R.
2.
Recall that the start of a line is marked by use of the caret symbol: ˆ.
3.
You might be wondering what to do if the text you are analyzing does not happen to include the final line THE END. A simple solution would be to add a new line to the end of the novel_lines_v vector. You could add the words THE END as a final line, or it could simply be blank. And, naturally, there are other more sophisticated ways of writing your code so that you do not have to do any of this, but that is more than we want to get into in this introductory text.
4.
Using i is a matter of convention. You could name this variable anything that you wish: e.g., my.int, x, etc.
5.
It might seem a bit odd, but in R even objects containing only one item are vectors. So in this example the y object is a vector of one item. If you simply enter y into the console, you will get a bracketed number 1 [1] followed by the value 2, which is the value held in the first (and only) position of the y vector.

Author information

Authors and Affiliations

College of Arts and Sciences, Washington State University, Pullman, WA, USA
Matthew L. Jockers
Digital Technology and Culture Program, Washington State University, Pullman, WA, USA
Rosamond Thalken

Authors

Matthew L. Jockers
View author publications
You can also search for this author in PubMed Google Scholar
Rosamond Thalken
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

L. Jockers, M., Thalken, R. (2020). Token Distribution Analysis. In: Text Analysis with R. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-39643-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-39643-5_5
Published: 31 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39642-8
Online ISBN: 978-3-030-39643-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics