Abstract
In this work we analyse MALDI mass spectrometry imaging data for thyroid cancer samples. Such a data, containing information about spatial distribution of proteins/peptides, makes possible to make a virtual analysis how a technique of fine needle aspiration (FNA) biopsy, a routine diagnosis procedure for thyroid, influences the outcome i.e. a set of discriminative features between cancerous and normal tissue. We hypothesised that an impure dataset (consisting of normal cell contaminated cancer samples) would be beneficial in the terms of stable feature selection. We compared several methods of predictor selection on different datasets to perform an in-depth feature ranking stability analysis for thyroid cancer mass spectrometry data. Furthermore we examined the impact of sample contamination level on the selection.
Keywords
- MALDI imaging mass spectrometry
- Machine learning
- Feature selection
- Fine needle biopsy
This is a preview of subscription content, access via your institution.
Buying options







References
Aha, D.W., Bankert, R.L.: A Comparative Evaluation of Sequential Feature Selection Algorithms, pp. 199–206. Springer, New York (1996)
Bensz, W., Borys, D., Fujarewicz, K., Herok, K., Jaksik, R., Krasucki, M., Kurczyk, A., Matusik, K., Mrozek, D., Ochab, M., et al.: Integrated system supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N. (eds.) Recent Developments in Intelligent Information and Database Systems, pp. 399–409. Springer, Cham (2016)
Filipczuk, P., Fevens, T., Krzyzak, A., Monczak, R.: Computer-aided breast cancer diagnosis based on the analysis of cytological images of fine needle biopsies. IEEE Trans. Med. Imaging 32(12), 2169–2178 (2013)
Fujarewicz, K., Student, S., Zielański, T., Jakubczak, M., Pieter, J., Pojda, K., Świerniak, A.: Large-scale data classification system based on galaxy server and protected from information leak. In: ACIIDS 2017, pp. 765–773. Springer, Cham (2017)
Gaweł, D., Fujarewicz, K.: On the sensitivity of feature ranked lists for large-scale biological data. Math. Biosci. Eng. MBE 10(3), 677–690 (2013)
Hand, D.J.: Data Mining Based in part on the article ‘Data mining’ by David Hand, which appeared in the Encyclopedia of Environmetrics. American Cancer Society (2013)
Haury, A.-C., Gestraud, P., Vert, J.-P.: The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PLOS ONE 6(12), 1–12 (2011)
Kalousis, A., Prados, J., Hilario, M.: Stability of feature selection algorithms. In: Fifth IEEE International Conference on Data Mining (ICDM 2005), p. 8, November 2005
Kim, Y., Jeon, J., Mejia, S., Yao, C.Q., Ignatchenko, V., Nyalwidhe, J.O., Gramolini, A.O., Lance, R.S., Troyer, D.A., Drake, R.R., Boutros, P.C., Semmes, O.J., Kislinger, T.: Targeted proteomics identifies liquid-biopsy signatures for extracapsular prostate cancer. Nat. Commun. 7, 11906 (2016)
MathWorks. Two sample t-test, 23 March 2019
Nakamura, T., Furukawa, Y., Nakagawa, H., Tsunoda, T., Ohigashi, H., Murata, K., Ishikawa, O., Ohgaki, K., Kashimura, N., Miyamoto, M., Hirano, S., Kondo, S., Katoh, H., Nakamura, Y., Katagiri, T.: Genome-wide CDNA microarray analysis of gene expression profiles in pancreatic cancers using populations of tumor cells and normal ductal epithelial cells selected for purity by laser microdissection. Oncogene 23(13), 2385–2400 (2004)
Oreski, D., Oreski, S., Klicek, B.: Effects of dataset characteristics on the performance of feature selection techniques. Appl. Soft Comput. 52, 109–119 (2017)
Pankratz, D.G., Choi, Y., Imtiaz, U., Fedorowicz, G.M., Anderson, J.D., Colby, T.V., Myers, J.L., Lynch, D.A., Brown, K.K., Flaherty, K.R., Steele, M.P., Groshong, S.D., Raghu, G., Barth, N.M., Walsh, P.S., Huang, J., Kennedy, G.C., Martinez, F.J.: Usual interstitial pneumonia can be detected in transbronchial biopsies using machine learning. Ann. Am. Thoracic Soc. 14(11), 1646–1654 (2017). PMID: 28640655
Pietrowska, M., Diehl, H.C., Mrukwa, G., Kalinowska-Herok, M., Gawin, M., Chekan, M., Elm, J., Drazek, G., Krawczyk, A., Lange, D., Meyer, H.E., Polanska, J., Henkel, C., Widlak, P.: Molecular profiles of thyroid cancer subtypes: classification based on features of tissue revealed by mass spectrometry imaging. Biochimica et Biophysica Acta (BBA) Proteins Proteomics 1865(7), 837–845 (2017). MALDI Imaging
Polanski, A., Marczyk, M., Pietrowska, M., Widlak, P., Polanska, J.: Signal partitioning algorithm for highly efficient gaussian mixture modeling in mass spectrometry. PLOS ONE 10(7), 1–19 (2015)
Psiuk-Maksymowicz, K., Płaczek, A., Jaksik, R., Student, S., Borys, D., Mrozek, D., Fujarewicz, K., Świerniak, A.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małlysiak-Mrozek, B., Kostrzewa, D. (eds.) BDAS 2015, pp. 449–462. Springer, Cham (2015)
Quon, G., Haider, S., Deshwar, A.G., Cui, A., Boutros, P.C., Morris, Q.: Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction. Genome Med. 5(3), 29 (2013)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Student, S., Fujarewicz, K.: Stable feature selection and classification algorithms for multiclass microarray data. Biol. Direct 7, 33 (2012). 23031190[pmid], PMC3599581[pmcid], 1745-6150-7-33[PII]
Student, S., Fujarewicz, K.: Stable feature selection and classification algorithms for multiclass microarray data. Biol. Direct 7(1), 33 (2012)
Türeci, Ö., Ding, J., Hilton, H., Bian, H., Ohkawa, H., Braxenthaler, M., Seitz, G., Raddrizzani, L., Friess, H., Buchler, M., Sahin, U., Hammer, J.: Computational dissection of tissue contamination for identification of colon cancer-specific expression profiles. FASEB J. 17(3), 376–385 (2003). PMID: 12631577
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17(5–6), 375–381 (2003)
Acknowledgement
This work was supported by Polish National Centre for Research and Development under Grant Strategmed2/267398/4/NCBR/2015 and Silesian University of Technology Grant 02/010/BK-18/0102. Data analysis was partially carried out using the Biotest Platform developed within Project n. PBS3/B3/32/2015 financed by the Polish National Centre of Research and Development (NCBiR) and described in [2, 4, 16]. Calculations were performed using the infrastructure supported by the computer cluster Ziemowit (www.ziemowit.hpc.polsl.pl) funded by the Silesian BIO-FARMA project No. POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 in the Computational Biology and Bioinformatics Laboratory of the Biotechnology Centre at the Silesian University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wilk, A., Gawin, M., Frątczak, K., Widłak, P., Fujarewicz, K. (2020). On Stability of Feature Selection Based on MALDI Mass Spectrometry Imaging Data and Simulated Biopsy. In: Korbicz, J., Maniewski, R., Patan, K., Kowal, M. (eds) Current Trends in Biomedical Engineering and Bioimages Analysis. PCBEE 2019. Advances in Intelligent Systems and Computing, vol 1033. Springer, Cham. https://doi.org/10.1007/978-3-030-29885-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-29885-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29884-5
Online ISBN: 978-3-030-29885-2
eBook Packages: EngineeringEngineering (R0)