Skip to main content

Data Mining System Applied to Population Databases for Studies on Lung Cancer

  • Chapter
  • First Online:
Book cover Data Mining for Biomarker Discovery

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 65))

  • 1508 Accesses

Abstract

This work addresses the problem of finding the mortality distribution for lung cancer in Mexican districts, through clustering patterns discovery. A data mining system was developed which consists of a pattern generator and a visualization subsystem. Such an approach may contribute to biomarker discovery by means of identifying risk regions for a given cancer type and further reduce the cost and time spend in conducting cancer studies. The k-means algorithm was used for the generation of patterns, which permits expressing patterns as groups of districts with affinity in their location and mortality rate attributes. The source data were obtained from Mexican official institutions. As a result, a set of grouping patterns reflecting the mortality distribution of lung cancer in Mexico was generated. Two interesting patterns in northeastern and northwestern Mexico with high mortality rate were detected. We consider that patterns generated by the data mining system, can be useful for identifying high risk cancer areas and biomarkers discovery.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Adrianns and D. Zantinge. Data Mining. Pearson Education Ltd, Canada, 1996.

    Google Scholar 

  2. C. Bouchardy, D.M. Parkin, and M. Khlat. Education and mortality from cancer in São Paulo, Brazil. Annals of Epidemiology, 3(1):64–70, 1993.

    Article  Google Scholar 

  3. P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth. Cross Industry Standard Process for Datamining version 1.0 step by step datamining guide, SPSS. http://www.crisp-dm.org/CRISPWP-0800.pdf. Last visited: 2011.

  4. Núcleo de Acopio y Análisis de Información en Salud. Descripción de las tablas de mortalidad por tumores malignos. http://sigsalud.insp.mx/naais/pr/demo/Cie10/Descripcin.zip. Last visited: 2011.

  5. F. Eibe, H. Mark, and T. Len. Weka api. http://weka.sourceforge.net/doc/. Last visited: 2010.

  6. F. Faggiano, T. Partanen, M. Kogevinas, and P. Boffetta. Socioeconomic differences in cancer incidence and mortality. Technical report, International Agency for Research on Cancer (IARC), 1997. http://www.iarc.fr/en/publications/pdfs-online/epi/sp138/sp138-chap5.pdf. Last visited: 2011.

  7. A. Flouris and J. Duffy. Application of artificial intelligence systems in the analysis of epidemiological data. European Journal of Epidemiology, 21:167–170, 2006.

    Article  Google Scholar 

  8. J.J.G. García and M.B. Jasso. Mortalidad por cáncer en el adulto mayor en México. http://www.mex.ops-oms.org/documentos/publicaciones/hacia/a04.pdf, 2004. Last visited: 2011.

  9. S.S. Hecht, J.M. Yuan, and D. Hatsukami. Applying tobacco carcinogen and toxicant biomarkers in product regulation and cancer prevention. Chemical Research in Toxicology, 23(6):1001–1008, 2010.

    Article  Google Scholar 

  10. J. Hernández, M.J. Ramírez, and R.C. Ferri. Introducción a la Minería de Datos, Exploración y Selección. Pearson Prentice Hall, Madrid, España, 2004.

    Google Scholar 

  11. A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM computing surveys, 31:264 – 323, 1999.

    Article  Google Scholar 

  12. M. Labib and M. Malek. Data mining for cancer management in Egypt case study: Childhood acute lymphoblastic leukemia. World Academy of Science, Engineering and Technology, 8:309–314, 2005.

    Google Scholar 

  13. D. Larose. Data Mining Methods and Models. John Wiley & Sons, New Jersey, EUA, 2006.

    MATH  Google Scholar 

  14. J. Liao, L. Yu, Y. Mei, M. Guarnera, J. Shen, R. Li, Z. Liu, and F. Jiang. Small nucleolar RNA signatures as biomarkers for non-small-cell lung cancer. Molecular Cancer, 9, 2006.

    Google Scholar 

  15. J.B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifteenth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–298, 1967.

    Google Scholar 

  16. R. Maheswaran, D. Strachan, B. Dodgeon, and N.G. Best. A population-based case-control study for examining early life influences on geographical variation in adult mortality in England and Wales using stomach cancer and stroke as examples. International Journal of Epidemiology, 31:375–382, 2002.

    Article  Google Scholar 

  17. M.F. Medina and F.M. Salazar. Frecuencia y patrón cambiante del cáncer pulmonar en México. Salud Pública de México, 42(4):333–336, 2000.

    Article  Google Scholar 

  18. I. Mullins, M. Siadaty, J. Lyman, K. Scully, C.T. Garrettb, W.G. Millerb, R. Mullerb, B. Robsonc, C. Aptec, S. Weissc, I. Rigoutsosc, D. Plattc, S. Cohend, and W.A. Knaus. Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36:1351–1377, 2006.

    Article  Google Scholar 

  19. National Institute of Public Health. Collection and Analysis Core on Health Information. http://sigsalud.insp.mx/naais/. Last visited: 2011.

  20. National Institute of Statistic Geography and Informatics. Database District System. http://sc.inegi.gob.mx/simbad/index.jsp?c=125J. Last visited: 2011.

  21. C.R. Pacheco and M.G.S. Díaz. Tumores Pulmonares, volume 4, chapter 9, pp. 35–40. Academia Nacional de Medicina/Intersistemas, México city, 1999.

    Google Scholar 

  22. N. Pérez, R. Murillo, C. Pinzón, and C. Hernández. Costos de la atención médica del cáncer de pulmón, la EPOC y el IAM atribuibles al consumo de tabaco en Colombia (proyecto multicéntrico de la OPS). Revista Colombiana de Cancerología, 11(4):241–249, 2007.

    Google Scholar 

  23. L.M. Reynales, M.S. Juárez, and S.R. Valdés. Costos de atención médica atribuibles al tabaquismo en el IMSS, Morelos. Salud Pública de México, 47(6):451–457, 2005.

    Article  Google Scholar 

  24. G.L.M. Ruíz, P. Rizo, F. Sánchez, A. Osornio, C. García, and G.A. Meneses. Lung cancer mortality in Mexico. BioMed Central Cancer, 7:A29, 2007.

    Google Scholar 

  25. K. Thangavel, P. Jaganathan, and P. Esmy. Subgroup discovery in cervical cancer analysis using data mining. AIML Journal, 6:29–36, 2006.

    Google Scholar 

  26. G.V. Tovar, A.F.J. López, and S.N. Rodríguez. Tendencias de la mortalidad por cáncer pulmonar en México, 1980-2000. Pan American Journal of Public Health, 17(4):254–262, 2005.

    Article  Google Scholar 

  27. D. Wheeler. A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996–2003. International Journal of Health Geographics, 6:13, 2007.

    Article  Google Scholar 

  28. H.I. Witten and F. Eibe. Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco, EUA, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Pérez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Pérez, J., Henriques, F., Santaolaya, R., Fragoso, O., Mexicano, A. (2012). Data Mining System Applied to Population Databases for Studies on Lung Cancer. In: Pardalos, P., Xanthopoulos, P., Zervakis, M. (eds) Data Mining for Biomarker Discovery. Springer Optimization and Its Applications(), vol 65. Springer, Boston, MA. https://doi.org/10.1007/978-1-4614-2107-8_13

Download citation

Publish with us

Policies and ethics