Skip to main content

Analysis of Text Mining Tools in Disease Prediction

  • Conference paper
  • First Online:
Hybrid Intelligent Systems (HIS 2020)

Abstract

Due to rapid creation of digital data by Internet of Things devices or smart devices, many new modern mining strategies/techniques require to handle/analyse this large amount of data. Note that more than 90% of today’s data is in present (generated) unstructured or semi-structured data format (most of part of this data is being generated only in the past decade). The discovery of appropriate patterns and trends to analyse the text documents from this large big data (i.e., large volume of data) is a big issue. Text mining is a process of extracting interesting and non- trivial patterns from huge amount of text documents. There exist different techniques and tools to mine the text (also other data format) and discover valuable information for future prediction and decision making process. Basically, there are two terms used in making or extracting some relevant information from a data-set, i.e., prediction modelling, and text mining. Predictive models are often used to detect crimes and identify suspects, after the crime has taken place/to detect an email, how likely that it is spam. Similarly, text mining used in applications like digital libraries, academic research field, life science, social media, business intelligence, etc. Today’s different text mining techniques are available for analysing the text patterns and their mining process, some of them are included here as: document classification (text classification, document standardization), information retrieval (keyword search/querying and indexing), document clustering (phrase clustering), natural language processing (spelling correction, lemmatization, grammatical parsing, and word sense disambiguation), information extraction (relationship extraction/link analysis), and web mining (web link analysis), etc.

This article discusses and analyse the text mining techniques and their applications in diverse fields of life. This work discusses about several use-cases, efficient algorithms like apriori algorithm, association rule mining, etc., which is used for frequent item set extraction (information retrieval and information extraction) and rule generation. Also, in result, generated several rules form a collected data-set to predict about a disease (as an example) will be discussed. In last, this work discusses detail descriptions about the terms classification, clustering, regression, association rule mining and outlier detection as a work-flow in analysing the data for producing a decision or making some prediction, also discussing some useful research gaps, challenges, issues (as its concluding remarks).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.: Text mining: predictive methods for analyzing unstructured information. Springer Science and Business Media (2010). https://doi.org/10.1007/978-0-387-34555-0

  2. Hilfiker, J.N., Sun, J., Hong, N.: Data analysis. In: Springer Series in Optical Sciences. https://doi.org/10.1007/978-3-319-75377-5_3

  3. Liao, S.-H., Chu, P.-H., Hsiao, P.-Y.: Data mining techniques and applications–a decade review from 2000 to 2011. Expert Syst. Appl. 39(12), 11 303–11 311 (2012)

    Google Scholar 

  4. Zhong, N., Li, Y., Wu, S.-T.: Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 24(1), 30–44 (2012)

    Article  Google Scholar 

  5. Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., Duneld, M.: Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J. Biomed. Semant. 5(1), 1–25 (2014)

    Google Scholar 

  6. Laxman, B., Sujatha, D.: Improved method for pattern discovery in text mining. Int. J. Res. Eng. Technol. 2(1), 2321–2328 (2013)

    Google Scholar 

  7. Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)

    Article  Google Scholar 

  8. Rajendra, R., Saransh, V., Ashu, K., Sanjay, S.: A novel modified apriori approach for web document clustering. In: Proceedings of the ICCIDM, Smart Innovation Systems and Technologies, Dec 2014, Vol. 33, p. 159–171 (2015). https://arxiv.org/abs/1503.08463

  9. Sumathy, K.L., Chidambaram, M.: Text mining: Concepts, applications, tools and issues-an overview. Int. J. Comput. Appl. 80(4), 29–32 (2013). https://www.ijcaonline.org/archives/volume80/number4/13851-1685

  10. Joby, P.J., Korra, J.: Accessing accurate documents by mining auxiliary document information. In: 2015 Second International Conference on Advances in Computing and Communication Engineering (ICACCE), pp. 634–638. IEEE (2015)

    Google Scholar 

  11. Wen, Z., Yoshida, T., Tang, X.: A study with multi-word feature with text classification. In: Proceedings of the 51st Annual Meeting of the ISSS-2007, Tokyo, Japan, vol. 51, p. 45 (2007)

    Google Scholar 

  12. Zhua, F., Zhanga, C., et.al.: Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46(2), 200–211 (2013)

    Google Scholar 

  13. Baker, S., Ali, I., Silins, I., Pyysalo, S., et al.: Cancer hallmarks analytics tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer. Bioinformatics 33(24), 3973–3981 (2017)

    Article  Google Scholar 

  14. Henriksson, A., Zhao, J., Dalianis, H., Bostrom, H.: Ensembles of randomized trees using diverse distributed representations of clinical events. BMC Med. Inform. Decis. Mak. 16(2), 69 (2016)

    Article  Google Scholar 

  15. Solanki, H.: Comparative study of data mining tools and analysis with unified data mining theory. Int. J. Comput. Appl. 75(16), 23–28 (2013)

    Google Scholar 

  16. Kumaran, A., Makin, R., Pattisapu, V., Sharif, S.E.: Automatic extraction of synonymy information: -extended abstract, OTT06, vol. 1, p. 55 (2007)

    Google Scholar 

  17. Narayana, B.L., Kumar, S.P.: A new clustering technique on text in sentence for text mining. IJSEAT 3(3), 69–71 (2015)

    Google Scholar 

  18. Kaklauskas, A., Seniut, M., Amaratunga, D., Lill, I., Safonov, A., Vatin, N., Cerkauskas, J., Jackute, I., Kuzminske, A., Peciure, L.: Text analytics for android project. Procedia Econ. Finan. 18, 610–617 (2014)

    Article  Google Scholar 

  19. Samsudin, N., Puteh, M., Hamdan, A.R., Nazri, M.Z.A.: Immune based feature selection for opinion mining. In: Proceedings of the World Congress on Engineering, vol. 3, pp. 3–5 (2013)

    Google Scholar 

  20. Tyagi, A.K.: Building a smart and sustainable environment using internet of things (February 22, 2019). In: Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur - India, 26–28 February 2019

    Google Scholar 

Download references

Acknowledgements

This research is funded by the Anumit Academy’s Research and Innovation Network (AARIN), India. The authors would like to thank AARIN India, an education foundation body and a research network for supporting the project through its financial assistance.

Author information

Authors and Affiliations

Authors

Contributions

All authors have contributed in this work equally. Amit Kumar Tyagi has analysed, and approved this manuscript.

Editor information

Editors and Affiliations

Ethics declarations

The authors declare that they do not have any conflict of interest with respect to publication of this research work.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumari, S., Vani, V., Malik, S., Tyagi, A.K., Reddy, S. (2021). Analysis of Text Mining Tools in Disease Prediction. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, TP. (eds) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-73050-5_55

Download citation

Publish with us

Policies and ethics