Skip to main content

Text Pre-Processing

  • 377 Accesses

Abstract

This chapter focuses on the theoretical framework of text data pre-processing. It describes the three levels of text representation: lexical, syntactic, and semantic. It further explains the concept of bag of words, word embedding, term frequency and weighting, named entity extraction, and parsing. The chapter is followed by a case study showing text analysis of Tolkien’s books, a web project developed by Emil Johanson.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-85085-2_3
  • Chapter length: 25 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-85085-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Qi W, Procter R, Zhang J, Guo W (2019) Mapping consumer sentiment toward wireless services using geospatial Twitter data. IEEE Access 7:113726–113739. https://doi.org/10.1109/ACCESS.2019.2935200

    CrossRef  Google Scholar 

  2. Lotr Project: An analysis of Tolkien’s books. http://lotrproject.com/statistics/books/. Accessed 15 July 2020

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Lamba, M., Madhusudhan, M. (2022). Text Pre-Processing. In: Text Mining for Information Professionals. Springer, Cham. https://doi.org/10.1007/978-3-030-85085-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85085-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85084-5

  • Online ISBN: 978-3-030-85085-2

  • eBook Packages: Computer ScienceComputer Science (R0)