Data Mining System Applied to Population Databases for Studies on Lung Cancer

Pérez, J.; Henriques, F.; Santaolaya, R.; Fragoso, O.; Mexicano, A.

doi:10.1007/978-1-4614-2107-8_13

J. Pérez⁴,
F. Henriques⁵,
R. Santaolaya⁴,
O. Fragoso⁴ &
…
A. Mexicano⁴

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 65))

1508 Accesses

Abstract

This work addresses the problem of finding the mortality distribution for lung cancer in Mexican districts, through clustering patterns discovery. A data mining system was developed which consists of a pattern generator and a visualization subsystem. Such an approach may contribute to biomarker discovery by means of identifying risk regions for a given cancer type and further reduce the cost and time spend in conducting cancer studies. The k-means algorithm was used for the generation of patterns, which permits expressing patterns as groups of districts with affinity in their location and mortality rate attributes. The source data were obtained from Mexican official institutions. As a result, a set of grouping patterns reflecting the mortality distribution of lung cancer in Mexico was generated. Two interesting patterns in northeastern and northwestern Mexico with high mortality rate were detected. We consider that patterns generated by the data mining system, can be useful for identifying high risk cancer areas and biomarkers discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. Adrianns and D. Zantinge. Data Mining. Pearson Education Ltd, Canada, 1996.
Google Scholar
C. Bouchardy, D.M. Parkin, and M. Khlat. Education and mortality from cancer in São Paulo, Brazil. Annals of Epidemiology, 3(1):64–70, 1993.
Article Google Scholar
P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth. Cross Industry Standard Process for Datamining version 1.0 step by step datamining guide, SPSS. http://www.crisp-dm.org/CRISPWP-0800.pdf. Last visited: 2011.
Núcleo de Acopio y Análisis de Información en Salud. Descripción de las tablas de mortalidad por tumores malignos. http://sigsalud.insp.mx/naais/pr/demo/Cie10/Descripcin.zip. Last visited: 2011.
F. Eibe, H. Mark, and T. Len. Weka api. http://weka.sourceforge.net/doc/. Last visited: 2010.
F. Faggiano, T. Partanen, M. Kogevinas, and P. Boffetta. Socioeconomic differences in cancer incidence and mortality. Technical report, International Agency for Research on Cancer (IARC), 1997. http://www.iarc.fr/en/publications/pdfs-online/epi/sp138/sp138-chap5.pdf. Last visited: 2011.
A. Flouris and J. Duffy. Application of artificial intelligence systems in the analysis of epidemiological data. European Journal of Epidemiology, 21:167–170, 2006.
Article Google Scholar
J.J.G. García and M.B. Jasso. Mortalidad por cáncer en el adulto mayor en México. http://www.mex.ops-oms.org/documentos/publicaciones/hacia/a04.pdf, 2004. Last visited: 2011.
S.S. Hecht, J.M. Yuan, and D. Hatsukami. Applying tobacco carcinogen and toxicant biomarkers in product regulation and cancer prevention. Chemical Research in Toxicology, 23(6):1001–1008, 2010.
Article Google Scholar
J. Hernández, M.J. Ramírez, and R.C. Ferri. Introducción a la Minería de Datos, Exploración y Selección. Pearson Prentice Hall, Madrid, España, 2004.
Google Scholar
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM computing surveys, 31:264 – 323, 1999.
Article Google Scholar
M. Labib and M. Malek. Data mining for cancer management in Egypt case study: Childhood acute lymphoblastic leukemia. World Academy of Science, Engineering and Technology, 8:309–314, 2005.
Google Scholar
D. Larose. Data Mining Methods and Models. John Wiley & Sons, New Jersey, EUA, 2006.
MATH Google Scholar
J. Liao, L. Yu, Y. Mei, M. Guarnera, J. Shen, R. Li, Z. Liu, and F. Jiang. Small nucleolar RNA signatures as biomarkers for non-small-cell lung cancer. Molecular Cancer, 9, 2006.
Google Scholar
J.B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifteenth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–298, 1967.
Google Scholar
R. Maheswaran, D. Strachan, B. Dodgeon, and N.G. Best. A population-based case-control study for examining early life influences on geographical variation in adult mortality in England and Wales using stomach cancer and stroke as examples. International Journal of Epidemiology, 31:375–382, 2002.
Article Google Scholar
M.F. Medina and F.M. Salazar. Frecuencia y patrón cambiante del cáncer pulmonar en México. Salud Pública de México, 42(4):333–336, 2000.
Article Google Scholar
I. Mullins, M. Siadaty, J. Lyman, K. Scully, C.T. Garrettb, W.G. Millerb, R. Mullerb, B. Robsonc, C. Aptec, S. Weissc, I. Rigoutsosc, D. Plattc, S. Cohend, and W.A. Knaus. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36:1351–1377, 2006.
Article Google Scholar
National Institute of Public Health. Collection and Analysis Core on Health Information. http://sigsalud.insp.mx/naais/. Last visited: 2011.
National Institute of Statistic Geography and Informatics. Database District System. http://sc.inegi.gob.mx/simbad/index.jsp?c=125J. Last visited: 2011.
C.R. Pacheco and M.G.S. Díaz. Tumores Pulmonares, volume 4, chapter 9, pp. 35–40. Academia Nacional de Medicina/Intersistemas, México city, 1999.
Google Scholar
N. Pérez, R. Murillo, C. Pinzón, and C. Hernández. Costos de la atención médica del cáncer de pulmón, la EPOC y el IAM atribuibles al consumo de tabaco en Colombia (proyecto multicéntrico de la OPS). Revista Colombiana de Cancerología, 11(4):241–249, 2007.
Google Scholar
L.M. Reynales, M.S. Juárez, and S.R. Valdés. Costos de atención médica atribuibles al tabaquismo en el IMSS, Morelos. Salud Pública de México, 47(6):451–457, 2005.
Article Google Scholar
G.L.M. Ruíz, P. Rizo, F. Sánchez, A. Osornio, C. García, and G.A. Meneses. Lung cancer mortality in Mexico. BioMed Central Cancer, 7:A29, 2007.
Google Scholar
K. Thangavel, P. Jaganathan, and P. Esmy. Subgroup discovery in cervical cancer analysis using data mining. AIML Journal, 6:29–36, 2006.
Google Scholar
G.V. Tovar, A.F.J. López, and S.N. Rodríguez. Tendencias de la mortalidad por cáncer pulmonar en México, 1980-2000. Pan American Journal of Public Health, 17(4):254–262, 2005.
Article Google Scholar
D. Wheeler. A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996–2003. International Journal of Health Geographics, 6:13, 2007.
Article Google Scholar
H.I. Witten and F. Eibe. Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco, EUA, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Centro Nacional de Investigación y Desarrollo Tecnológico, Cuernavaca, México
J. Pérez, R. Santaolaya, O. Fragoso & A. Mexicano
Fundação Nacional de Saúde, Recife, Brazil
F. Henriques

Authors

J. Pérez
View author publications
You can also search for this author in PubMed Google Scholar
F. Henriques
View author publications
You can also search for this author in PubMed Google Scholar
R. Santaolaya
View author publications
You can also search for this author in PubMed Google Scholar
O. Fragoso
View author publications
You can also search for this author in PubMed Google Scholar
A. Mexicano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Pérez .

Editor information

Editors and Affiliations

, Department of Industrial & Systems Engin, University of Florida, Weil Hall 401, Gainesville, 32611-6595, Florida, USA
Panos M. Pardalos
, Department of Industrial & Systems Engin, University of Florida, Weil Hall 401, Gainesville, 32611, Florida, USA
Petros Xanthopoulos
Dept. Electronic & Computer, Engineering, Technical University of Crete, Chania, Crete, 731 00, Greece
Michalis Zervakis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pérez, J., Henriques, F., Santaolaya, R., Fragoso, O., Mexicano, A. (2012). Data Mining System Applied to Population Databases for Studies on Lung Cancer. In: Pardalos, P., Xanthopoulos, P., Zervakis, M. (eds) Data Mining for Biomarker Discovery. Springer Optimization and Its Applications(), vol 65. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-2107-8_13

Download citation

DOI: https://doi.org/10.1007/978-1-4614-2107-8_13
Published: 07 January 2012
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4614-2106-1
Online ISBN: 978-1-4614-2107-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics