Skip to main content

A Text Preprocessing Approach for Efficacious Information Retrieval

  • Conference paper
  • First Online:
Smart Innovations in Communication and Computational Sciences

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 669))

Abstract

The information retrieval is the task of obtaining relevant information from a large collection of databases. Preprocessing plays an important role in information retrieval to extract the relevant information. In this paper, a text preprocessing approach text preprocessing for information retrieval (TPIR) is proposed. The proposed approach works in two steps. Firstly, spell check utility is used for enhancing stemming and secondly, synonyms of similar tokens are combined. In this paper, proposed technique is applied to a case study on International Monetary Fund. The experimental results prove the efficiency of the proposed approach in terms of complexity, time and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Subbaiah S.: Extracting Knowledge using Probabilistic Classifier for Text Mining. In: International Conference on Pattern Recognition Informatics and Mobile Engineering, pp. 440–442 (2013).

    Google Scholar 

  2. Bhujade V., Jhanwe N.J.: Knowledge Discovery in Text Mining Technique Using Association Rules Extraction. In: International Conference on Computational Intelligence and Communication Systems, pp. 498–502 (2011).

    Google Scholar 

  3. Tari, Hakenberg J., Chen Y., Son T., Gonzalez G., Baral C.: Incremental Information Extraction Using Relational Databases. IEEE Transactions on Knowledge and Data Engineering, 24, 1, 86–99 (2012).

    Google Scholar 

  4. Ramasubramanian C., Ramya R.: Effective Pre-Processing Activities in Text Mining using Improved Porter’s Stemming Algorithm. International Journal of Advanced Research in Computer and Communication Engineering. 2, 12, 4536–4538 (2013).

    Google Scholar 

  5. Patil L.H., Atique M.: A Novel Approach for Feature Selection Method TF-IDF in Document Clustering. In: IEEE International Advance Computing Conference (IACC), pp. 858–862 (2013).

    Google Scholar 

  6. Amarasinghe K., Hruska R.: Optimal Stop Word Selection for Text Mining in Critical Infrastructure Domain. In: IEEE Conference, pp. 179–184 (2015).

    Google Scholar 

  7. Singh V., Saini B.: An Effective Pre-Processing Algorithm for Information Retrieval Systems. International Journal of Database Management Systems (IJDMS), 6, 6, 13–24 (2014).

    Google Scholar 

  8. Nayak A.S., Kanive A.P., Chandavekar N., Balasubramani R: Survey on Pre-Processing Techniques for Text Mining. International Journal Of Engineering And Computer Science, ISSN: 2319-7242, 5, 6, pp. 16875–16879, (2016).

    Google Scholar 

  9. Xubu M., Guo J.: Information Extraction of Strategic Activities based on Semi-structured Text. In: International Joint Conference on Computational Sciences and Optimization, pp. 579–583 (2014).

    Google Scholar 

  10. Hadni M., Lachkar A., Ouatik S.A.: A New and Efficient Stemming Technique for Arabic Text Categorization. In: International Conference on Multimedia Computing and Systems (ICMCS), pp. 791–796 (2012).

    Google Scholar 

  11. Feilmayr C.: Text Mining-Supported Information Extraction an Extended Methodology for Developing Information Extraction Systems. In: International Workshop on Database and Expert Systems Applications, pp. 217–221 (2011).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepali Virmani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Virmani, D., Taneja, S. (2019). A Text Preprocessing Approach for Efficacious Information Retrieval. In: Panigrahi, B., Trivedi, M., Mishra, K., Tiwari, S., Singh, P. (eds) Smart Innovations in Communication and Computational Sciences. Advances in Intelligent Systems and Computing, vol 669. Springer, Singapore. https://doi.org/10.1007/978-981-10-8968-8_2

Download citation

Publish with us

Policies and ethics