Abstract
This chapter details the process of converting documents into an analysis-ready term-document representation. Preprocessed text documents are first transformed into an inverted index for demonstrative purposes. Then, the inverted index is manipulated into a term-document or document-term matrix. The chapter concludes with descriptions of different weighting schemas for analysis-ready term-document representation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Berry, M. W., Drmac, Z., & Jessup, E. R. (1999). Matrices, vector spaces, and information retrieval. SIAM Review, 41(2), 335–362.
Dumais, S. T. (1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers, 23(2), 229–236.
Jessup, E. R., & Martin, J. H. (2001). Taking a new look at the latent semantic analysis approach to information retrieval. Computational Information Retrieval, 2001, 121–144.
Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511809071.
Further Reading
For more about the term-document representation of text data, see Berry et al. (1999) and Manning et al. (2008).
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Anandarajan, M., Hill, C., Nolan, T. (2019). Term-Document Representation. In: Practical Text Analytics. Advances in Analytics and Data Science, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-319-95663-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-95663-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95662-6
Online ISBN: 978-3-319-95663-3
eBook Packages: Business and ManagementBusiness and Management (R0)