This chapter focuses on the theoretical framework of text data pre-processing. It describes the three levels of text representation: lexical, syntactic, and semantic. It further explains the concept of bag of words, word embedding, term frequency and weighting, named entity extraction, and parsing. The chapter is followed by a case study showing text analysis of Tolkien’s books, a web project developed by Emil Johanson.