Term-Document Representation

Anandarajan, Murugan; Hill, Chelsey; Nolan, Thomas

doi:10.1007/978-3-319-95663-3_5

Murugan Anandarajan⁶,
Chelsey Hill⁷ &
Thomas Nolan⁸

Part of the book series: Advances in Analytics and Data Science ((AADS,volume 2))

4079 Accesses
3 Citations

Abstract

This chapter details the process of converting documents into an analysis-ready term-document representation. Preprocessed text documents are first transformed into an inverted index for demonstrative purposes. Then, the inverted index is manipulated into a term-document or document-term matrix. The chapter concludes with descriptions of different weighting schemas for analysis-ready term-document representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Berry, M. W., Drmac, Z., & Jessup, E. R. (1999). Matrices, vector spaces, and information retrieval. SIAM Review, 41(2), 335–362.
Article Google Scholar
Dumais, S. T. (1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers, 23(2), 229–236.
Article Google Scholar
Jessup, E. R., & Martin, J. H. (2001). Taking a new look at the latent semantic analysis approach to information retrieval. Computational Information Retrieval, 2001, 121–144.
Google Scholar
Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511809071.
Book Google Scholar

Author information

Authors and Affiliations

LeBow College of Business, Drexel University, Philadelphia, PA, USA
Murugan Anandarajan
Feliciano School of Business, Montclair State University, Montclair, NJ, USA
Chelsey Hill
Mercury Data Science, Houston, TX, USA
Thomas Nolan

Authors

Murugan Anandarajan
View author publications
You can also search for this author in PubMed Google Scholar
Chelsey Hill
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Nolan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Anandarajan, M., Hill, C., Nolan, T. (2019). Term-Document Representation. In: Practical Text Analytics. Advances in Analytics and Data Science, vol 2. Springer, Cham. https://doi.org/10.1007/978-3-319-95663-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-95663-3_5
Published: 20 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95662-6
Online ISBN: 978-3-319-95663-3
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics

Term-Document Representation

Abstract

Access this chapter

References

Further Reading

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Term-Document Representation

Abstract

Access this chapter

References

Further Reading

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation