Abstract
This paper describes how data mining is being used to identify primary factors of cancer incidences and living habits of cancer patients from a set of health and living habit questionnaires. Decision tree, radial basis function and back propagation neural network have been employed in this case study. Decision tree classification uncovers the primary factors of cancer patients from rules. Radial basis function method has advantages in comparing the living habits between a group of cancer patients and a group of healthy people. Back propagation neural network contributes to elicit the important factors of cancer incidences. This case study provides a useful data mining template for characteristics identification in healthcare and other areas.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Pomeroy, S.L., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 405, 436–442 (2002)
Kawamura, Y., Zhang, X., Konagaya, A.: Inference of genetic network in cluster level. In: 18th AI Symposium of Japanese Society for Artificial Intelligence, SIG-J-A301-12P (2003)
Agrawal, R., Imielinski, T., Swami, A.: Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering 5, 914–925 (1993)
Chen, M.S., Han, J., Yu, P.S.: Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering 8, 866–883 (1996)
Zhang, X.: Knowledge Acqusition and Revision with First Order Logic Induction. PhD Thesis, Tokyo Institute of Technology (1998)
Special Issue. Comparison and evaluation of KDD methods with common medical databases. Journal of Japanese Society for Artificial Intelligence 15,750–790 (2000)
Apte, C., Grossman, E., Pednault, E., Rosen, B., Tipu, F., White, B.: Probablistic estimation based data mining for discovering insurance risks. Technical Report IBM Research Report RC-21483, T. J. Watson Research Center, IBM Research Division, Yorktown Heights, NY 10598 (1999)
Gedeon, T.D.: Data mining of inputs: analysing magnitude and functional measures. Int. J. Neural Syst 8, 209–217 (1997)
Cathy, W., Shivakumar, S.: Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. Nucleic Acids Research 22, 4291–4299 (1994)
Cathy, W., Berry, M., Shivakumar, S., McLarty, J.: Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning 21, 177–193 (1994)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A scalable parallel classifier for data mining. In: Proc. of the 22th Int’l Conference on Very Large Databases, Bombay, India (1996)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proceedings of the IEEE 78, 1481–1497 (1990)
Littleand, R.J.A., Rubin, D.B.: Statistical analysis with missing data. John Wiley & Sons, Chichester (1987)
Dempster, A., Laird, N., Rubin, D.: Maximun likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1–38 (1977)
IBM Intelligent Miner for Data. Using the Intelligent Miner for Data, 3rd edn. IBM Corp (1998)
Cabena, P., et al.: Discovering data mining. Prentice Hall PTR, Englewood Cliffs (1998)
Zhang, X., Narita, T.: Discovering the primary factors of cancer from health and living habit questionaires. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS (LNAI), vol. 1721, p. 371. Springer, Heidelberg (1999)
Srivastava, A.N.: Data mining for semiconductor yield forecasting. In: Future Fab International (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, X., Narita, T. (2005). Integrated Mining for Cancer Incidence Factors from Healthcare Data. In: Tsumoto, S., Yamaguchi, T., Numao, M., Motoda, H. (eds) Active Mining. Lecture Notes in Computer Science(), vol 3430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11423270_19
Download citation
DOI: https://doi.org/10.1007/11423270_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26157-5
Online ISBN: 978-3-540-31933-7
eBook Packages: Computer ScienceComputer Science (R0)