Associative Feature Information Extraction Using Text Mining from Health Big Data

Kim, Joo-Chang; Chung, Kyungyong

doi:10.1007/s11277-018-5722-5

Associative Feature Information Extraction Using Text Mining from Health Big Data

Published: 18 April 2018

Volume 105, pages 691–707, (2019)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Joo-Chang Kim¹ &
Kyungyong Chung²

1657 Accesses
45 Citations
Explore all metrics

Abstract

With the development of big data computing technology, most documents in various areas, including politics, economics, society, culture, life, and public health, have been digitalized. The structure of conventional documents differs according to their authors or the organization that generated them. Therefore, policies and studies related to their efficient digitalization and use exist. Text mining is the technology used to classify, cluster, extract, search, and analyze data to find patterns or features in a set of unstructured or structured documents written in natural language. In this paper, a method for extracting associative feature information using text mining from health big data is proposed. Using health documents as raw data, health big data are created by means of the Web. The useful information contained in health documents is extracted through text mining. Health documents as raw data are collected through Web scraping and then saved in a file server. The collected raw data of health documents are sentence type, and thus morphological analysis is applied to create a corpus. The file server executes stop word removal, tagging, and the analysis of polysemous words in a preprocessing procedure to create a candidate corpus. TF-C-IDF is applied to the candidate corpus to evaluate the importance of words in a set of documents. The words classified as of high importance by TF-C-IDF are included in a set of keywords, and the transactions of each document are created. Using an Apriori mining algorithm, the association rules of keywords in the created transaction are analyzed and associative keywords are generated. TF-C-IDF weights and associative keywords are extracted from health big data as associative features. The proposed method is a base technology for creating added value in the healthcare industry in the era of the 4th industrial revolution. Its evaluation in terms of F-measure and efficiency showed its performance to be high. The method is expected to contribute to healthcare big data management and information search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Semantic Text Mining Based on Domain Ontology

Reducing Search Space in Big Data Mining

Acquisition and Application of Internet Medical Big Data Based on Text Mining Technology

References

Jung, H., & Chung, K. (2016). Life style improvement mobile service for high risk chronic disease based on PHR platform. Cluster Computing, 19(2), 967–977.
Article Google Scholar
HL7, Health Level Seven International. http://www.hl7.org/.
Kim, J. C., & Chung, K. (2017). Depression index service using knowledge based crowdsourcing in smart health. Wireless Personal Communication, 93(1), 255–268.
Article Google Scholar
Yoo, H., & Chung, K. (2017). PHR based diabetes index service model using life behavior analysis. Wireless Personal Communications, 93(1), 161–174.
Article Google Scholar
Chung, K., & Park, R. C. (2016). PHR open platform based smart health service using distributed object group framework. Cluster Computing, 19(1), 505–517.
Article Google Scholar
Kang, H. C. (2016). National-level use of health care big data and its policy implications. Health Welf Policy Forum, 238, 55–71.
Google Scholar
Yoo, H., & Chung, K. (2017). Heart rate variability based stress index service model using bio-sensor. Cluster Computing. https://doi.org/10.1007/s10586-017-0879-3.
Article Google Scholar
Jung, H., & Chung, K. (2016). Knowledge-based dietary nutrition recommendation for obese management. Information Technology and Management, 17(1), 29–42.
Article Google Scholar
Health Insurance Review and Assessment Service (HIRA). http://opendata.hira.or.kr/.
Shmueli, G., Patel, N. R., & Bruce, P. C. (2016). Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner. Wiley.
Kim, J. S. (2016). Emotion prediction of paragraph using big data analysis. Journal of Digital Convergence, 14(11), 267–273.
Article Google Scholar
Ravichandran, D., & Hovy, E. (2002). Learning surface text patterns for a question answering system. In Proceedings of the annual meeting on association for computational linguistics (pp. 41–47).
Bernth, A., Gdaniec, C. M., McCord, M. C., & Medeiros, S. A. (2001). System and method for estimating accuracy of an automatic natural language translation. U.S. Patent, No. 6,285,978.
Song, C. W., Jung, H., & Chung, K. (2017). Development of a medical big-data mining process using topic modeling. Cluster Computing. https://doi.org/10.1007/s10586-017-0942-0.
Article Google Scholar
Eagle, N., & Pentland, A. S. (2006). Reality mining: Sensing complex social systems. Personal and Ubiquitous Computing, 10(4), 255–268.
Article Google Scholar
Tan, A. H. (1999). Text mining: The state of the art and the challenges. In Proceedings of the PAKDD 1999 workshop on knowledge discovery from advanced databases (Vol. 8, pp. 65–70).
Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. In Proceedings of the international conference on language resources and evaluation (pp. 1320–1326).
Bonchi, F., Castillo, C., Gionis, A., & Jaimes, A. (2011). Social network analysis and mining for business applications. ACM Transactions on Intelligent Systems and Technology, 2(3), 22.
Article Google Scholar
Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Mining, Addison Wesley.
Park, D. S., Moon, Y. S., Park, Y. H., Youn, C. H., Jung, Y. S., & Jang, H. S. (2014). Bigdata computing technology. Bengaluru: Hanbit Academy.
Google Scholar
Pillbox. http://pillbox.nlm.nih.gov/.
National Library of Medicine. http://www.nlm.nih.gov/.
Van De Belt, T. H., Engelen, L. J., Berben, S. A., & Schoonhoven, L. (2010). Definition of health 2.0 and medicine 2.0: A systematic review. Journal of Medical Internet Research, 12(2), e18.
Article Google Scholar
Go, S. J., & Jung, Y. H. (2012). Health risk prediction using big health data. Health and Welfare Policy Forum, 193, 43–52.
Google Scholar
Drug Information Portal. https://druginfo.nlm.nih.gov/.
DailyMed. https://dailymed.nlm.nih.gov/.
Cui, W., Wu, Y., Liu, S., Wei, F., Zhou, M. X., & Qu, H. (2010). Context preserving dynamic word cloud visualization. In Pacific visualization symposium (pp. 121–128).
The R Project for Statistical Computing. https://www.r-project.org/.
rvest. https://cran.r-project.org/web/packages/rvest/.
Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based N-gram models of natural language. Computational Linguistics, 18(4), 467–479.
Google Scholar
tm Package. https://cran.r-project.org/web/packages/tm/.
Aizawa, A. (2003). An information-theoretic perspective of TF–IDF measures. Information Processing and Management, 39(1), 45–65.
Article MATH Google Scholar
Han, J., Pei, J., & Kamber, M. (2011). Data mining: Concepts and techniques. New York City, NY: Elsevier.
MATH Google Scholar
Weka 3. http://www.cs.waikato.ac.nz/~ml/weka/.
Chung, K. Y., Na, Y., & Lee, J. H. (2013). Interactive design recommendation using sensor based smart wear and weather WebBot. Wireless Personal Communications, 73(2), 243–256.
Article Google Scholar

Download references

Acknowledgements

This work was supported by Kyonggi University Research Grant 2017.

Author information

Authors and Affiliations

Data Mining Laboratory, Department of Computer Science, Kyonggi University, 154-42, Gwanggyosan-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16227, Korea
Joo-Chang Kim
Division of Computer Science and Engineering, Kyonggi University, 154-42, Gwanggyosan-ro, Yeongtong-gu, Suwon-si, Gyeonggi-do, 16227, Korea
Kyungyong Chung

Authors

Joo-Chang Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kyungyong Chung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kyungyong Chung.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, JC., Chung, K. Associative Feature Information Extraction Using Text Mining from Health Big Data. Wireless Pers Commun 105, 691–707 (2019). https://doi.org/10.1007/s11277-018-5722-5

Download citation

Published: 18 April 2018
Issue Date: 30 March 2019
DOI: https://doi.org/10.1007/s11277-018-5722-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Associative Feature Information Extraction Using Text Mining from Health Big Data

Abstract

Access this article

Similar content being viewed by others

Research on Semantic Text Mining Based on Domain Ontology

Reducing Search Space in Big Data Mining

Acquisition and Application of Internet Medical Big Data Based on Text Mining Technology

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Associative Feature Information Extraction Using Text Mining from Health Big Data

Abstract

Access this article

Similar content being viewed by others

Research on Semantic Text Mining Based on Domain Ontology

Reducing Search Space in Big Data Mining

Acquisition and Application of Internet Medical Big Data Based on Text Mining Technology

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation