Abstract
This chapter moves readers from the analysis of one or two texts to a larger corpus. Machine clustering is introduced in the context of an authorship attribution problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
If you have been working on other sections of this book or on R projects of your own, it might be a good idea to either restart R or to clear the R workspace. To do the latter, just click on the Session menu of the RStudio GUI and select Clear Workspace. This will remove all R objects and functions that you may have been using, wiping the R slate clean, as it were.
- 2.
Patrick Burns has written a 125 page book documenting many of R’s unusual behavior. The book is informative and entertaining to read. You can find it online at http://www.burns-stat.com.
- 3.
Enter ?regex at the prompt to learn more about regex in R.
- 4.
You can learn more about the useInternalNodes argument in the documentation for the xmlTreeParse function. Basically, setting it to TRUE avoids converting the contents into R objects, which saves a bit of processing time.
- 5.
- 6.
For a brief overview of how this work is conducted, See Jockers, Matthew L. Macroanalysis: Digital Methods and Literary History. University of Illinois Press, 2013. Pages 63–67.
- 7.
seq_along is a simple R function for generating a sequence of numbers. Check the R-help documentation for details. In this example, I could have just as easily used 1:43 or 1:length(book.freqs.l).
- 8.
Factors are explained in a later section.
- 9.
Other options include using reshape and expressions that leverage the apply family of functions.
- 10.
Remember that the getTEIWordTableList function that we built multiplies all the relative frequencies by 100.
- 11.
For details, consult the documentation for the dist and hclust functions.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Jockers, M.L. (2014). Clustering. In: Text Analysis with R for Students of Literature. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-03164-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-03164-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03163-7
Online ISBN: 978-3-319-03164-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)