Abstract
This work addresses the problem of finding the mortality distribution for lung cancer in Mexican districts, through clustering patterns discovery. A data mining system was developed which consists of a pattern generator and a visualization subsystem. Such an approach may contribute to biomarker discovery by means of identifying risk regions for a given cancer type and further reduce the cost and time spend in conducting cancer studies. The k-means algorithm was used for the generation of patterns, which permits expressing patterns as groups of districts with affinity in their location and mortality rate attributes. The source data were obtained from Mexican official institutions. As a result, a set of grouping patterns reflecting the mortality distribution of lung cancer in Mexico was generated. Two interesting patterns in northeastern and northwestern Mexico with high mortality rate were detected. We consider that patterns generated by the data mining system, can be useful for identifying high risk cancer areas and biomarkers discovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P. Adrianns and D. Zantinge. Data Mining. Pearson Education Ltd, Canada, 1996.
C. Bouchardy, D.M. Parkin, and M. Khlat. Education and mortality from cancer in São Paulo, Brazil. Annals of Epidemiology, 3(1):64–70, 1993.
P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth. Cross Industry Standard Process for Datamining version 1.0 step by step datamining guide, SPSS. http://www.crisp-dm.org/CRISPWP-0800.pdf. Last visited: 2011.
Núcleo de Acopio y Análisis de Información en Salud. Descripción de las tablas de mortalidad por tumores malignos. http://sigsalud.insp.mx/naais/pr/demo/Cie10/Descripcin.zip. Last visited: 2011.
F. Eibe, H. Mark, and T. Len. Weka api. http://weka.sourceforge.net/doc/. Last visited: 2010.
F. Faggiano, T. Partanen, M. Kogevinas, and P. Boffetta. Socioeconomic differences in cancer incidence and mortality. Technical report, International Agency for Research on Cancer (IARC), 1997. http://www.iarc.fr/en/publications/pdfs-online/epi/sp138/sp138-chap5.pdf. Last visited: 2011.
A. Flouris and J. Duffy. Application of artificial intelligence systems in the analysis of epidemiological data. European Journal of Epidemiology, 21:167–170, 2006.
J.J.G. García and M.B. Jasso. Mortalidad por cáncer en el adulto mayor en México. http://www.mex.ops-oms.org/documentos/publicaciones/hacia/a04.pdf, 2004. Last visited: 2011.
S.S. Hecht, J.M. Yuan, and D. Hatsukami. Applying tobacco carcinogen and toxicant biomarkers in product regulation and cancer prevention. Chemical Research in Toxicology, 23(6):1001–1008, 2010.
J. Hernández, M.J. Ramírez, and R.C. Ferri. Introducción a la Minería de Datos, Exploración y Selección. Pearson Prentice Hall, Madrid, España, 2004.
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM computing surveys, 31:264 – 323, 1999.
M. Labib and M. Malek. Data mining for cancer management in Egypt case study: Childhood acute lymphoblastic leukemia. World Academy of Science, Engineering and Technology, 8:309–314, 2005.
D. Larose. Data Mining Methods and Models. John Wiley & Sons, New Jersey, EUA, 2006.
J. Liao, L. Yu, Y. Mei, M. Guarnera, J. Shen, R. Li, Z. Liu, and F. Jiang. Small nucleolar RNA signatures as biomarkers for non-small-cell lung cancer. Molecular Cancer, 9, 2006.
J.B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifteenth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–298, 1967.
R. Maheswaran, D. Strachan, B. Dodgeon, and N.G. Best. A population-based case-control study for examining early life influences on geographical variation in adult mortality in England and Wales using stomach cancer and stroke as examples. International Journal of Epidemiology, 31:375–382, 2002.
M.F. Medina and F.M. Salazar. Frecuencia y patrón cambiante del cáncer pulmonar en México. Salud Pública de México, 42(4):333–336, 2000.
I. Mullins, M. Siadaty, J. Lyman, K. Scully, C.T. Garrettb, W.G. Millerb, R. Mullerb, B. Robsonc, C. Aptec, S. Weissc, I. Rigoutsosc, D. Plattc, S. Cohend, and W.A. Knaus. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36:1351–1377, 2006.
National Institute of Public Health. Collection and Analysis Core on Health Information. http://sigsalud.insp.mx/naais/. Last visited: 2011.
National Institute of Statistic Geography and Informatics. Database District System. http://sc.inegi.gob.mx/simbad/index.jsp?c=125J. Last visited: 2011.
C.R. Pacheco and M.G.S. Díaz. Tumores Pulmonares, volume 4, chapter 9, pp. 35–40. Academia Nacional de Medicina/Intersistemas, México city, 1999.
N. Pérez, R. Murillo, C. Pinzón, and C. Hernández. Costos de la atención médica del cáncer de pulmón, la EPOC y el IAM atribuibles al consumo de tabaco en Colombia (proyecto multicéntrico de la OPS). Revista Colombiana de Cancerología, 11(4):241–249, 2007.
L.M. Reynales, M.S. Juárez, and S.R. Valdés. Costos de atención médica atribuibles al tabaquismo en el IMSS, Morelos. Salud Pública de México, 47(6):451–457, 2005.
G.L.M. Ruíz, P. Rizo, F. Sánchez, A. Osornio, C. García, and G.A. Meneses. Lung cancer mortality in Mexico. BioMed Central Cancer, 7:A29, 2007.
K. Thangavel, P. Jaganathan, and P. Esmy. Subgroup discovery in cervical cancer analysis using data mining. AIML Journal, 6:29–36, 2006.
G.V. Tovar, A.F.J. López, and S.N. Rodríguez. Tendencias de la mortalidad por cáncer pulmonar en México, 1980-2000. Pan American Journal of Public Health, 17(4):254–262, 2005.
D. Wheeler. A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996–2003. International Journal of Health Geographics, 6:13, 2007.
H.I. Witten and F. Eibe. Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco, EUA, 2000.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Pérez, J., Henriques, F., Santaolaya, R., Fragoso, O., Mexicano, A. (2012). Data Mining System Applied to Population Databases for Studies on Lung Cancer. In: Pardalos, P., Xanthopoulos, P., Zervakis, M. (eds) Data Mining for Biomarker Discovery. Springer Optimization and Its Applications(), vol 65. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-2107-8_13
Download citation
DOI: https://doi.org/10.1007/978-1-4614-2107-8_13
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-2106-1
Online ISBN: 978-1-4614-2107-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)