Skip to main content

On Stability of Feature Selection Based on MALDI Mass Spectrometry Imaging Data and Simulated Biopsy

Part of the Advances in Intelligent Systems and Computing book series (AISC,volume 1033)

Abstract

In this work we analyse MALDI mass spectrometry imaging data for thyroid cancer samples. Such a data, containing information about spatial distribution of proteins/peptides, makes possible to make a virtual analysis how a technique of fine needle aspiration (FNA) biopsy, a routine diagnosis procedure for thyroid, influences the outcome i.e. a set of discriminative features between cancerous and normal tissue. We hypothesised that an impure dataset (consisting of normal cell contaminated cancer samples) would be beneficial in the terms of stable feature selection. We compared several methods of predictor selection on different datasets to perform an in-depth feature ranking stability analysis for thyroid cancer mass spectrometry data. Furthermore we examined the impact of sample contamination level on the selection.

Keywords

  • MALDI imaging mass spectrometry
  • Machine learning
  • Feature selection
  • Fine needle biopsy

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-29885-2_8
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-29885-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   219.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

References

  1. Aha, D.W., Bankert, R.L.: A Comparative Evaluation of Sequential Feature Selection Algorithms, pp. 199–206. Springer, New York (1996)

    CrossRef  Google Scholar 

  2. Bensz, W., Borys, D., Fujarewicz, K., Herok, K., Jaksik, R., Krasucki, M., Kurczyk, A., Matusik, K., Mrozek, D., Ochab, M., et al.: Integrated system supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N. (eds.) Recent Developments in Intelligent Information and Database Systems, pp. 399–409. Springer, Cham (2016)

    CrossRef  Google Scholar 

  3. Filipczuk, P., Fevens, T., Krzyzak, A., Monczak, R.: Computer-aided breast cancer diagnosis based on the analysis of cytological images of fine needle biopsies. IEEE Trans. Med. Imaging 32(12), 2169–2178 (2013)

    CrossRef  Google Scholar 

  4. Fujarewicz, K., Student, S., Zielański, T., Jakubczak, M., Pieter, J., Pojda, K., Świerniak, A.: Large-scale data classification system based on galaxy server and protected from information leak. In: ACIIDS 2017, pp. 765–773. Springer, Cham (2017)

    CrossRef  Google Scholar 

  5. Gaweł, D., Fujarewicz, K.: On the sensitivity of feature ranked lists for large-scale biological data. Math. Biosci. Eng. MBE 10(3), 677–690 (2013)

    MathSciNet  MATH  Google Scholar 

  6. Hand, D.J.: Data Mining Based in part on the article ‘Data mining’ by David Hand, which appeared in the Encyclopedia of Environmetrics. American Cancer Society (2013)

    Google Scholar 

  7. Haury, A.-C., Gestraud, P., Vert, J.-P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLOS ONE 6(12), 1–12 (2011)

    CrossRef  Google Scholar 

  8. Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms. In: Fifth IEEE International Conference on Data Mining (ICDM 2005), p. 8, November 2005

    Google Scholar 

  9. Kim, Y., Jeon, J., Mejia, S., Yao, C.Q., Ignatchenko, V., Nyalwidhe, J.O., Gramolini, A.O., Lance, R.S., Troyer, D.A., Drake, R.R., Boutros, P.C., Semmes, O.J., Kislinger, T.: Targeted proteomics identifies liquid-biopsy signatures for extracapsular prostate cancer. Nat. Commun. 7, 11906 (2016)

    CrossRef  Google Scholar 

  10. MathWorks. Two sample t-test, 23 March 2019

    Google Scholar 

  11. Nakamura, T., Furukawa, Y., Nakagawa, H., Tsunoda, T., Ohigashi, H., Murata, K., Ishikawa, O., Ohgaki, K., Kashimura, N., Miyamoto, M., Hirano, S., Kondo, S., Katoh, H., Nakamura, Y., Katagiri, T.: Genome-wide CDNA microarray analysis of gene expression profiles in pancreatic cancers using populations of tumor cells and normal ductal epithelial cells selected for purity by laser microdissection. Oncogene 23(13), 2385–2400 (2004)

    CrossRef  Google Scholar 

  12. Oreski, D., Oreski, S., Klicek, B.: Effects of dataset characteristics on the performance of feature selection techniques. Appl. Soft Comput. 52, 109–119 (2017)

    CrossRef  Google Scholar 

  13. Pankratz, D.G., Choi, Y., Imtiaz, U., Fedorowicz, G.M., Anderson, J.D., Colby, T.V., Myers, J.L., Lynch, D.A., Brown, K.K., Flaherty, K.R., Steele, M.P., Groshong, S.D., Raghu, G., Barth, N.M., Walsh, P.S., Huang, J., Kennedy, G.C., Martinez, F.J.: Usual interstitial pneumonia can be detected in transbronchial biopsies using machine learning. Ann. Am. Thoracic Soc. 14(11), 1646–1654 (2017). PMID: 28640655

    CrossRef  Google Scholar 

  14. Pietrowska, M., Diehl, H.C., Mrukwa, G., Kalinowska-Herok, M., Gawin, M., Chekan, M., Elm, J., Drazek, G., Krawczyk, A., Lange, D., Meyer, H.E., Polanska, J., Henkel, C., Widlak, P.: Molecular profiles of thyroid cancer subtypes: classification based on features of tissue revealed by mass spectrometry imaging. Biochimica et Biophysica Acta (BBA) Proteins Proteomics 1865(7), 837–845 (2017). MALDI Imaging

    CrossRef  Google Scholar 

  15. Polanski, A., Marczyk, M., Pietrowska, M., Widlak, P., Polanska, J.: Signal partitioning algorithm for highly efficient gaussian mixture modeling in mass spectrometry. PLOS ONE 10(7), 1–19 (2015)

    CrossRef  Google Scholar 

  16. Psiuk-Maksymowicz, K., Płaczek, A., Jaksik, R., Student, S., Borys, D., Mrozek, D., Fujarewicz, K., Świerniak, A.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małlysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015, pp. 449–462. Springer, Cham (2015)

    Google Scholar 

  17. Quon, G., Haider, S., Deshwar, A.G., Cui, A., Boutros, P.C., Morris, Q.: Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction. Genome Med. 5(3), 29 (2013)

    CrossRef  Google Scholar 

  18. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    CrossRef  Google Scholar 

  19. Student, S., Fujarewicz, K.: Stable feature selection and classification algorithms for multiclass microarray data. Biol. Direct 7, 33 (2012). 23031190[pmid], PMC3599581[pmcid], 1745-6150-7-33[PII]

    CrossRef  Google Scholar 

  20. Student, S., Fujarewicz, K.: Stable feature selection and classification algorithms for multiclass microarray data. Biol. Direct 7(1), 33 (2012)

    CrossRef  Google Scholar 

  21. Türeci, Ö., Ding, J., Hilton, H., Bian, H., Ohkawa, H., Braxenthaler, M., Seitz, G., Raddrizzani, L., Friess, H., Buchler, M., Sahin, U., Hammer, J.: Computational dissection of tissue contamination for identification of colon cancer-specific expression profiles. FASEB J. 17(3), 376–385 (2003). PMID: 12631577

    CrossRef  Google Scholar 

  22. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)

    CrossRef  Google Scholar 

  23. Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17(5–6), 375–381 (2003)

    CrossRef  Google Scholar 

Download references

Acknowledgement

This work was supported by Polish National Centre for Research and Development under Grant Strategmed2/267398/4/NCBR/2015 and Silesian University of Technology Grant 02/010/BK-18/0102. Data analysis was partially carried out using the Biotest Platform developed within Project n. PBS3/B3/32/2015 financed by the Polish National Centre of Research and Development (NCBiR) and described in [2, 4, 16]. Calculations were performed using the infrastructure supported by the computer cluster Ziemowit (www.ziemowit.hpc.polsl.pl) funded by the Silesian BIO-FARMA project No. POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 in the Computational Biology and Bioinformatics Laboratory of the Biotechnology Centre at the Silesian University of Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Fujarewicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Wilk, A., Gawin, M., Frątczak, K., Widłak, P., Fujarewicz, K. (2020). On Stability of Feature Selection Based on MALDI Mass Spectrometry Imaging Data and Simulated Biopsy. In: Korbicz, J., Maniewski, R., Patan, K., Kowal, M. (eds) Current Trends in Biomedical Engineering and Bioimages Analysis. PCBEE 2019. Advances in Intelligent Systems and Computing, vol 1033. Springer, Cham. https://doi.org/10.1007/978-3-030-29885-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29885-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29884-5

  • Online ISBN: 978-3-030-29885-2

  • eBook Packages: EngineeringEngineering (R0)