Abstract
Due to rapid creation of digital data by Internet of Things devices or smart devices, many new modern mining strategies/techniques require to handle/analyse this large amount of data. Note that more than 90% of today’s data is in present (generated) unstructured or semi-structured data format (most of part of this data is being generated only in the past decade). The discovery of appropriate patterns and trends to analyse the text documents from this large big data (i.e., large volume of data) is a big issue. Text mining is a process of extracting interesting and non- trivial patterns from huge amount of text documents. There exist different techniques and tools to mine the text (also other data format) and discover valuable information for future prediction and decision making process. Basically, there are two terms used in making or extracting some relevant information from a data-set, i.e., prediction modelling, and text mining. Predictive models are often used to detect crimes and identify suspects, after the crime has taken place/to detect an email, how likely that it is spam. Similarly, text mining used in applications like digital libraries, academic research field, life science, social media, business intelligence, etc. Today’s different text mining techniques are available for analysing the text patterns and their mining process, some of them are included here as: document classification (text classification, document standardization), information retrieval (keyword search/querying and indexing), document clustering (phrase clustering), natural language processing (spelling correction, lemmatization, grammatical parsing, and word sense disambiguation), information extraction (relationship extraction/link analysis), and web mining (web link analysis), etc.
This article discusses and analyse the text mining techniques and their applications in diverse fields of life. This work discusses about several use-cases, efficient algorithms like apriori algorithm, association rule mining, etc., which is used for frequent item set extraction (information retrieval and information extraction) and rule generation. Also, in result, generated several rules form a collected data-set to predict about a disease (as an example) will be discussed. In last, this work discusses detail descriptions about the terms classification, clustering, regression, association rule mining and outlier detection as a work-flow in analysing the data for producing a decision or making some prediction, also discussing some useful research gaps, challenges, issues (as its concluding remarks).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.: Text mining: predictive methods for analyzing unstructured information. Springer Science and Business Media (2010). https://doi.org/10.1007/978-0-387-34555-0
Hilfiker, J.N., Sun, J., Hong, N.: Data analysis. In: Springer Series in Optical Sciences. https://doi.org/10.1007/978-3-319-75377-5_3
Liao, S.-H., Chu, P.-H., Hsiao, P.-Y.: Data mining techniques and applications–a decade review from 2000 to 2011. Expert Syst. Appl. 39(12), 11 303–11 311 (2012)
Zhong, N., Li, Y., Wu, S.-T.: Effective pattern discovery for text mining. IEEE Trans. Knowl. Data Eng. 24(1), 30–44 (2012)
Henriksson, A., Moen, H., Skeppstedt, M., Daudaravicius, V., Duneld, M.: Synonym extraction and abbreviation expansion with ensembles of semantic spaces. J. Biomed. Semant. 5(1), 1–25 (2014)
Laxman, B., Sujatha, D.: Improved method for pattern discovery in text mining. Int. J. Res. Eng. Technol. 2(1), 2321–2328 (2013)
Chen, C.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)
Rajendra, R., Saransh, V., Ashu, K., Sanjay, S.: A novel modified apriori approach for web document clustering. In: Proceedings of the ICCIDM, Smart Innovation Systems and Technologies, Dec 2014, Vol. 33, p. 159–171 (2015). https://arxiv.org/abs/1503.08463
Sumathy, K.L., Chidambaram, M.: Text mining: Concepts, applications, tools and issues-an overview. Int. J. Comput. Appl. 80(4), 29–32 (2013). https://www.ijcaonline.org/archives/volume80/number4/13851-1685
Joby, P.J., Korra, J.: Accessing accurate documents by mining auxiliary document information. In: 2015 Second International Conference on Advances in Computing and Communication Engineering (ICACCE), pp. 634–638. IEEE (2015)
Wen, Z., Yoshida, T., Tang, X.: A study with multi-word feature with text classification. In: Proceedings of the 51st Annual Meeting of the ISSS-2007, Tokyo, Japan, vol. 51, p. 45 (2007)
Zhua, F., Zhanga, C., et.al.: Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46(2), 200–211 (2013)
Baker, S., Ali, I., Silins, I., Pyysalo, S., et al.: Cancer hallmarks analytics tool (CHAT): a text mining approach to organize and evaluate scientific literature on cancer. Bioinformatics 33(24), 3973–3981 (2017)
Henriksson, A., Zhao, J., Dalianis, H., Bostrom, H.: Ensembles of randomized trees using diverse distributed representations of clinical events. BMC Med. Inform. Decis. Mak. 16(2), 69 (2016)
Solanki, H.: Comparative study of data mining tools and analysis with unified data mining theory. Int. J. Comput. Appl. 75(16), 23–28 (2013)
Kumaran, A., Makin, R., Pattisapu, V., Sharif, S.E.: Automatic extraction of synonymy information: -extended abstract, OTT06, vol. 1, p. 55 (2007)
Narayana, B.L., Kumar, S.P.: A new clustering technique on text in sentence for text mining. IJSEAT 3(3), 69–71 (2015)
Kaklauskas, A., Seniut, M., Amaratunga, D., Lill, I., Safonov, A., Vatin, N., Cerkauskas, J., Jackute, I., Kuzminske, A., Peciure, L.: Text analytics for android project. Procedia Econ. Finan. 18, 610–617 (2014)
Samsudin, N., Puteh, M., Hamdan, A.R., Nazri, M.Z.A.: Immune based feature selection for opinion mining. In: Proceedings of the World Congress on Engineering, vol. 3, pp. 3–5 (2013)
Tyagi, A.K.: Building a smart and sustainable environment using internet of things (February 22, 2019). In: Proceedings of International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur - India, 26–28 February 2019
Acknowledgements
This research is funded by the Anumit Academy’s Research and Innovation Network (AARIN), India. The authors would like to thank AARIN India, an education foundation body and a research network for supporting the project through its financial assistance.
Author information
Authors and Affiliations
Contributions
All authors have contributed in this work equally. Amit Kumar Tyagi has analysed, and approved this manuscript.
Editor information
Editors and Affiliations
Ethics declarations
The authors declare that they do not have any conflict of interest with respect to publication of this research work.
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kumari, S., Vani, V., Malik, S., Tyagi, A.K., Reddy, S. (2021). Analysis of Text Mining Tools in Disease Prediction. In: Abraham, A., Hanne, T., Castillo, O., Gandhi, N., Nogueira Rios, T., Hong, TP. (eds) Hybrid Intelligent Systems. HIS 2020. Advances in Intelligent Systems and Computing, vol 1375. Springer, Cham. https://doi.org/10.1007/978-3-030-73050-5_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-73050-5_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73049-9
Online ISBN: 978-3-030-73050-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)