Skip to main content

Natural Language Processing (NLP): An Introduction

Making Sense of Textual Data

  • Chapter
  • First Online:
Applied Data Science in Tourism

Part of the book series: Tourism on the Verge ((TV))

Abstract

With the increase in internet usage, the amount of available textual data has also continued to increase rapidly. In addition, the development of stronger computers has enabled the processing of data to become much easier. The tourism field has a strong potential to utilize such data available on the internet; yet, on the other hand, a high proportion of available data is unlabelled and unprocessed. In order to use them effectively, new methods and new approaches are needed. In this regard, the area of Natural Language Processing (NLP) helps researchers to utilize textual data and develop an understanding of text analysis. By using machine learning approaches, text mining potential can expand enormously, leading to deeper insights, a better understanding of social phenomena, and, thus, also a better basis for decision-making. As such, this chapter will provide the reader with the basics of NLP as well as present the text pre-processing procedure in detail.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Aicher, J., Asiimwe, F., Batchuluun, B., Hauschild, M., Zöhrer, M., & Egger, R. (2016). Online hotel reviews: Rating symbols or text… text or rating symbols? That is the question! In A. Inversini & R. Schegg (Eds.), Information and communication Technologies in Tourism 2016 (pp. 369–382). Springer International Publishing.

    Chapter  Google Scholar 

  • Alaei, A. R., Becken, S., & Stantic, B. (2017). Sentiment analysis in tourism: Capitalising on big data. Journal of Travel Research, 1(9), 175–191.

    Google Scholar 

  • Albishre, K., Albathan, M., & Li, Y. (2015, December). Effective 20 newsgroups dataset cleaning. In 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) (Vol. 3, pp. 98–101). IEEE.

    Chapter  Google Scholar 

  • Anandarajan, M., Hill, C., & Nolan, T. (2019). Practical text analytics (Vol. 2). Springer International Publishing.

    Book  Google Scholar 

  • Baldwin, T., Cook, P., Lui, M., MacKinlay, A., & Wang, L. (2013, October). How noisy social media text, how different social media sources? In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 356–364).

    Google Scholar 

  • Bird, S., Loper, E., & Klein, E. (2009). Natural language processing with python. O'Reilly Media.

    Google Scholar 

  • Blondel, V. D., & Senellart, P. P. (2002). Automatic extraction of synonyms in a dictionary. vertex, 1, x1.

    Google Scholar 

  • Boyarskaya, E. (2019). Ambiguity matters in linguistics and translation. Слово.ру: балтийский акцент, 10(3), 81–93. https://doi.org/10.5922/2225-5346-2019-3-6

    Article  Google Scholar 

  • Bussière, K. (2018). Chapter 4 – Text analysis (digital humanities - a primer). Available online at https://carletonu.pressbooks.pub/digh5000/chapter/chapter-4-text-analysis/.

  • Calheiros, A. C., Moro, S., & Rita, P. (2017). Sentiment classification of consumer-generated online reviews using topic modeling. Journal of Hospitality Marketing & Management, 26(7), 675–693.

    Article  Google Scholar 

  • Chang, Y. C., Ku, C. H., & Chen, C. H. (2020). Using deep learning and visual analytics to explore hotel reviews and responses. Tourism Management, 80, 104129.

    Article  Google Scholar 

  • Chantrapornchai, C., & Tunsakul, A. (2019). Information extraction based on named entity for tourism corpus. In 2019 16th International Joint Conference on Computer Science and Software Engineering (pp. 187–192). IEEE.

    Google Scholar 

  • Conti, E., & Lexhagen, M. (2020). Instagramming nature-based tourism experiences: A netnographic study of online photography and value creation. Tourism Management Perspectives, 34, 2–3.

    Article  Google Scholar 

  • Cook, P., Evert, S., Schäfer, R., & Stemle, E. (Eds.). (2016). Proceedings of the 10th Web as Corpus Workshop. Association for Computational Linguistics.

    Google Scholar 

  • Egger, R. (2010). Theorizing web 2.0 phenomena in tourism: A sociological signpost. Information Technology & Tourism, 12(2), 125–137. https://doi.org/10.3727/109830510X12887971002666

    Article  Google Scholar 

  • Fielding, N. G., Lee, R. M., & Blank, G. (2017). The SAGE handbook of online research methods. SAGE Publications Ltd.

    Book  Google Scholar 

  • García-Pablos, A., Cuadros, M., & Linaza, M. T. (2016). Automatic analysis of textual hotel reviews. Information Technology & Tourism, 16(1), 45–69.

    Article  Google Scholar 

  • Guerreiro, J., & Rita, P. (2020). How to predict explicit recommendations in online reviews using text mining and sentiment analysis. Journal of Hospitality and Tourism Management, 43, 269–272.

    Article  Google Scholar 

  • Han, H. J.; Mankad, S.; Gavirneni, N.; Verma, R. (2016). What guests really think of your hotel: Text analytics of online customer reviews. Cornell Hospitality report, 16(2), 3–17. Available online at https://scholarship.sha.cornell.edu/cgi/viewcontent.cgi?article=1003&context=chrreports, checked on 4/5/2019.

  • Hannigan, T. R., Haans, R. F. J., Vakili, K., Tchalian, H., Glaser, V. L., Wang, M. S., Kaplan, S., & Jennings, P. D. (2019). Topic modeling in management research: Rendering new theory from textual data. Academy of Management Annals, 13(2), 586–632.

    Article  Google Scholar 

  • Hapke, H. M., Lane, H., & Howard, C. (2019). Natural language processing in action. Manning.

    Google Scholar 

  • Hazem, A., & Daille, B. (2018, May). Word embedding approach for synonym extraction of multi-word terms. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).

    Google Scholar 

  • IDC (2018). Time Crunch: Equalising time spent on data management vs analytics. https://blogs.idc.com/2018/08/23/time-crunch-equalizing-time-spent-on-data-management-vs-analytics/

  • Ignatow, G., & Mihalcea, R. (2017). Text mining: A guidebook for the social sciences. SAGE Publications, Inc.

    Book  Google Scholar 

  • Kannan, S., & Gurusamy, V. (2014). Pre-processing techniques for text mining. International Journal of Computer Science & Communication Networks, 5(1), 7–16.

    Google Scholar 

  • Kannan, S., Gurusamy, V., Vijayarani, S., Ilamathi, J., & Nithya, M. (2014). Preprocessing techniques for text mining. International Journal of Computer Science & Communication Networks, 5(1), 7–16.

    Google Scholar 

  • Kao, A., & Poteet, S. R. (2007). Natural language processing and text mining. Springer.

    Book  Google Scholar 

  • Keung, P., Lu, Y., Szarvas, G., & Smith, N. A. (2020). The multilingual Amazon reviews corpus.

    Google Scholar 

  • Kumar, C. P., & Babu, L. D. (2019). Novel text pre-processing framework for sentiment analysis. In Smart intelligent computing and applications (pp. 309–317). Springer.

    Chapter  Google Scholar 

  • Li, S., Li, G., Law, R., & Paradies, Y. (2020). Racism in tourism reviews. Tourism Management, 80, 104100.

    Article  Google Scholar 

  • Li, Q., Li, S., Zhang, S., Hu, J., & Hu, J. (2019). A review of text corpus-based tourism big data mining. Applied Sciences, 9(16), 3300. https://doi.org/10.3390/app9163300

    Article  Google Scholar 

  • Ma, Y., Xiang, Z., Du, Q., & Fan, W. (2018). Effects of user-provided photos on hotel review helpfulness: An analytical approach with deep leaning. International Journal of Hospitality Management, 71, 120–131.

    Article  Google Scholar 

  • MacCartney, B. (2014). Understanding natural language understanding. ACM SIGAI Bay Area Chapter Inaugural Meeting, 2014. Available online at https://nlp.stanford.edu/~wcmac/papers/20140716-UNLU.pdf.

  • Manning, C. (2019, March 21). Coreference Resolution [Video]. Youtube. https://www.youtube.com/watch?v=i19m4GzBhfc&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z&index=16&ab_channel=stanfordonline

  • Markopoulos, G., Mikros, G., Iliadi, A., & Liontos, M. (2015). Sentiment analysis of hotel reviews in Greek: A comparison of unigram features. In Cultural tourism in a digital era (pp. 373–383). Springer.

    Chapter  Google Scholar 

  • Mendez, J. R., Iglesias, E. L., Fdez-Riverola, F., Diaz, F., & Corchado, J. M. (2005, November). Tokenising, stemming and stopword removal on anti-spam filtering domain. In Conference of the Spanish Association for Artificial Intelligence (pp. 449–458). Springer.

    Google Scholar 

  • Merriam-Webster. (2021). Contraction. In Merriam-Webster.com dictionary. Retrieved January 14, 2021, from. https://www.merriam-webster.com/dictionary/contraction

  • Munezero, M., Montero, C. S., Sutinen, E., & Pajunen, J. (2014). Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Transactions on Affective Computing, 5(2), 101–111.

    Article  Google Scholar 

  • Poon, A. (1993). Tourism, technology and competitive strategies. CAB International.

    Google Scholar 

  • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.

    Article  Google Scholar 

  • Porter, M. F. (2001). Snowball: A language for stemming algorithms. Available online at http://snowball.tartarus.org/texts/introduction.html.

  • Qi, P., Dozat, T., Zhang, Y., Manning, C. D., 2018. Universal dependency parsing from scratch. In Proceedings of the CoNLL 2018 shared task: Multilingual parsing from raw text to Universal Dependencies.

    Google Scholar 

  • Ricci, F. (2020). Recommender systems in Tourism. In Z. Xiang, M. Fuchs, U. Gretzel, & W. Höpken (Eds.), Handbook of e-Tourism (pp. 1–18). Springer International Publishing; Imprint Springer.

    Google Scholar 

  • Rockwell, G. (2003). What is text analysis, really? Literary and Linguistic Computing, 18(2), 209–219.

    Article  Google Scholar 

  • Saralegi, X., & Leturia, I. (2007). Kimatu, a tool for cleaning non-content text parts from HTML docs. In Proceedings of the 3rd Web as Corpus Workshop (pp. 163–167).

    Google Scholar 

  • Sarkar, D. (2019). Text analytics with python. Apress.

    Book  Google Scholar 

  • Sarker, A., & Gonzalez, G. (2016, December). Data, tools and resources for mining social media drug chatter. In Proceedings of the fifth workshop on building and evaluating resources for biomedical text mining (BioTxtM2016) (pp. 99–107).

    Google Scholar 

  • Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval (Vol. 39, pp. 1041–4347). Cambridge University Press.

    Google Scholar 

  • Siemens, R. (1996). Lemmatization and parsing with TACT pre-processing programs. Digital Studies/Le champ numérique.

    Google Scholar 

  • Thanaki, J. (2017). Python natural language processing. Explore NLP with machine learning and deep learning techniques. Packt.

    Google Scholar 

  • Tsai, C.-F., Chen, K., Hu, Y.-H., & Chen, W.-K. (2020). Improving text summarization of online hotel reviews with review helpfulness and sentiment. In Tourism Management, 80, 104122. https://doi.org/10.1016/j.tourman.2020.104122

    Article  Google Scholar 

  • Vijayarani, S., Ilamathi, M. J., & Nithya, M. (2015). Pre-processing techniques for text mining-an overview. International Journal of Computer Science & Communication Networks, 5(1), 7–16.

    Google Scholar 

  • Wennker, P. (2020). Künstliche Intelligenz in der Praxis. Anwendung in Unternehmen und Branchen: KI wettbewerbs- und zukunftsorientiert Einsetzen. Springer Gabler. Available online at https://ebookcentral.proquest.com/lib/kxp/detail.action?docID=6326361

    Book  Google Scholar 

  • Xiang, Z. (2018). From digitisation to the age of acceleration: On information technology and tourism. Tourism Management Perspectives, 25, 147–150.

    Article  Google Scholar 

  • Xiang, Z., Du, Q., Ma, Y., & Fan, W. (2017). A comparative analysis of major online review platforms: Implications for social media analytics in hospitality and tourism. Tourism Management, 58, 51–65.

    Article  Google Scholar 

  • Yang, L., Cao, H., Hao, F., Zhang, W. Z., & Ahmad, M. (2020). Research on tourism question answering system based on xi’an tourism knowledge graph. Journal of Physics: Conference Series, 1616(1), 12090. https://doi.org/10.1088/1742-6596/1616/1/012090

    Article  Google Scholar 

  • Yu, J., & Egger, R. (2021). Tourist experiences at overcrowded attractions: A text analytics approach. In W. Wörndl, C. Koo, & J. L. Stienmetz (Eds.), Information and Communication Technologies in Tourism 2021. Proceedings of the ENTER 2021 eTourism Conference, January 19–22, 2021 (pp. 231–243). Springer.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roman Egger .

Editor information

Editors and Affiliations

Further Readings and Other Sources

Further Readings and Other Sources

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Egger, R., Gokce, E. (2022). Natural Language Processing (NLP): An Introduction. In: Egger, R. (eds) Applied Data Science in Tourism. Tourism on the Verge. Springer, Cham. https://doi.org/10.1007/978-3-030-88389-8_15

Download citation

Publish with us

Policies and ethics