Skip to main content

Modelling of Cancer Patient Records: A Structured Approach to Data Mining and Visual Analytics

  • Conference paper
  • First Online:
Information Technology in Bio- and Medical Informatics (ITBAM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10443))

Abstract

This research presents a methodology for health data analytics through a case study for modelling cancer patient records. Timeline-structured clinical data systems represent a new approach to the understanding of the relationship between clinical activity, disease pathologies and health outcomes. The novel Southampton Breast Cancer Data System contains episode and timeline-structured records onĀ >17,000 patients who have been treated in University Hospital Southampton and affiliated hospitals since the late 1970s. The system is under continuous development and validation. Modern data mining software and visual analytics tools permit new insights into temporally-structured clinical data. The challenges and outcomes of the application of such software-based systems to this complex data environment are reported here. The core data was anonymised and put through a series of pre-processing exercises to identify and exclude anomalous and erroneous data, before restructuring within a remote data warehouse. A range of approaches was tested on the resulting dataset including multi-dimensional modelling, sequential patterns mining and classification. Visual analytics software has enabled the comparison of survival times and surgical treatments. The systems tested proved to be powerful in identifying episode sequencing patterns which were consistent with real-world clinical outcomes. It is concluded that, subject to further refinement and selection, modern data mining techniques can be applied to large and heterogeneous clinical datasets to inform decision making.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bonadonna, G., Hortobagyi, G.N., Valagussa, P.: Textbook of Breast Cancer: A Clinical Guide to Therapy. CRC Press, Boca Raton (2006)

    Google ScholarĀ 

  2. Devi, R.D.H., Deepika, P.: Performance comparison of various clustering techniques for diagnosis of breast cancer. In: IEEE International Conference on Computational Intelligence and Computing Research, pp. 1ā€“5 (2015)

    Google ScholarĀ 

  3. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10ā€“18 (2009)

    ArticleĀ  Google ScholarĀ 

  4. Han, J.W., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)

    Google ScholarĀ 

  5. Hand, D.J., Smyth, P., Mannila, H.: Principles of Data Mining. MIT Press, Cambridge (2001)

    Google ScholarĀ 

  6. Holzinger, A.: Trends in interactive knowledge discovery for personalized medicine: cognitive science meets machine learning. IEEE Intell. Inform. Bull. 15(1), 6ā€“14 (2014)

    Google ScholarĀ 

  7. Hu, H., Correll, M., Kvecher, L., Osmond, M., Clark, J., et al.: DW4TR: a data warehouse for translational research. J. Biomed. Inform. 44(6), 1004ā€“1019 (2011)

    ArticleĀ  Google ScholarĀ 

  8. Jerez-Aragones, J.M., Gomez-Ruiz, J.A., Ramos-Jimenez, G., et al.: A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif. Intell. Med. 27(1), 45ā€“63 (2003)

    ArticleĀ  Google ScholarĀ 

  9. Kimball, R., Ross, M.: The Data Warehouse Toolkit ā€“ The Definitive Guide to Dimensional Modeling. Wiley, New York (2013)

    Google ScholarĀ 

  10. Lee, Y.J., Mangasarian, O.L., Wolberg, W.H.: Survival-time classification of breast cancer patients. Comput. Optim. Appl. 25(1ā€“3), 151ā€“166 (2003)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  11. Lu, J., Chen, W.R., Adjei, O., Keech, M.: Sequential patterns post-processing for structural relation patterns mining. Int. J. Data Warehouse. Min. 4(3), 71ā€“89. (2008). IGI Global, Hershey, Pennsylvania

    Google ScholarĀ 

  12. Lu, J., Hales, A., Rew, D., Keech, M., Frƶhlingsdorf, C., Mills-Mullett, A., Wette, C.: Data mining techniques in health informatics: a case study from breast cancer research. In: Renda, M.E., Bursa, M., Holzinger, A., Khuri, S. (eds.) ITBAM 2015. LNCS, vol. 9267, pp. 56ā€“70. Springer, Cham (2015). doi:10.1007/978-3-319-22741-2_6

    ChapterĀ  Google ScholarĀ 

  13. Lu, J., Hales, A., Rew, D., Keech, M.: Timeline and episode-structured clinical data: Pre-processing for data mining and analytics. In: 32nd IEEE International Conference on Data Engineering (ICDE) ā€“ Workshop on Health Data Management and Mining, pp. 64ā€“67 (2016)

    Google ScholarĀ 

  14. Mahajan, R., Shneiderman, B.: Visual and textual consistency checking tools for graphical user interfaces. IEEE Trans. Softw. Eng. 23(11), 722ā€“735 (1997)

    ArticleĀ  Google ScholarĀ 

  15. Marr, B.: Big Data: Using Smart Big Data Analytics and Metrics to Make Better Decisions and Improve Performance. Wiley, Chichester (2015)

    Google ScholarĀ 

  16. Martin, M.A., Meyricke, R., Oā€™Neill, T., Roberts, S.: Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer: A classification tree approach. BMC Cancer 6, 98 (2006)

    ArticleĀ  Google ScholarĀ 

  17. National Information Board. Personalised Health and Care 2020 (2014). https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/384650/NIB_Report.pdf

  18. NHS. Five year forward view (2014). http://www.england.nhs.uk/wp-content/uploads/2014/10/5yfv-web.pdf

  19. Razavi, A.R., Gill, H., Ahlfeldt, H., Shahsavar, N.: Predicting metastasis in breast cancer: Comparing a decision tree with domain experts. J. Med. Syst. 31, 263ā€“273 (2007)

    ArticleĀ  Google ScholarĀ 

  20. Reenskaug, T., Coplien, J.: The DCI architecture: A new vision of object-oriented programming (2009). http://www.artima.com/articles/dci_vision.html

  21. Reps, J., Garibaldi, J.M., Aickelin, U., Soria, D., Gibson, J.E., Hubbard, R.B.: Discovering sequential patterns in a UK general practice database. In: IEEE-EMBS International Conference on Biomedical and Health Informatics, pp. 960ā€“963 (2012)

    Google ScholarĀ 

  22. Rew, D.: Issues in professional practice: The clinical informatics revolution. Published by Association of Surgeons of Great Britain and Ireland (2015)

    Google ScholarĀ 

  23. Stolba, N., Tjoa, A.: The relevance of data warehousing and data mining in the field of evidence-based medicine to support healthcare decision making. Int. J. Comput. Syst. Sci. Eng. 3(3), 143ā€“148 (2006)

    Google ScholarĀ 

  24. Wyatt, J.: Plenary Talk: Five big challenges for big health data. In: 8th IMA Conference on Quantitative Modelling in the Management of Health and Social Care (2016)

    Google ScholarĀ 

Download references

Acknowledgements

This research project has been supported in part by a Southampton Solent Research Innovation and Knowledge Exchange (RIKE) award for ā€œSolent Health Informatics Partnershipā€ (Project ID: 1326). The authors would like to thank Solent students who made some contribution to the work: in particular Chantel Biddle, Adam Kershaw and Alex Potter. We are also pleased to acknowledge the generous support of colleagues in the University Hospital Southampton Informatics Team, in particular Adrian Byrne and David Cable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2017 Springer International Publishing AG

About this paper

Cite this paper

Lu, J., Hales, A., Rew, D. (2017). Modelling of Cancer Patient Records: A Structured Approach to Data Mining and Visual Analytics. In: Bursa, M., Holzinger, A., Renda, M., Khuri, S. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2017. Lecture Notes in Computer Science(), vol 10443. Springer, Cham. https://doi.org/10.1007/978-3-319-64265-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64265-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64264-2

  • Online ISBN: 978-3-319-64265-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics