Interpretable Per Case Weighted Ensemble Method for Cancer Associations

Jalali, Adrin; Pfeifer, Nico

doi:10.1007/978-3-662-44753-6_26

Adrin Jalali^20,21 &
Nico Pfeifer²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8701))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1831 Accesses

Abstract

Over the past decades, biology has transformed into a high throughput research field both in terms of the number of different measurement techniques as well as the amount of variables measured by each technique (e.g., from Sanger sequencing to deep sequencing) and is more and more targeted to individual cells [3]. This has led to an unprecedented growth of biological information. Consequently, techniques that can help researchers find the important insights of the data are becoming more and more important. Molecular measurements from cancer patients such as gene expression and DNA methylation are usually very noisy. Furthermore, cancer types can be very heterogeneous. Therefore, one of the main assumptions for machine learning, that the underlying unknown distribution is the same for all samples in training and test data, might not be completely fulfilled.

In this work, we introduce a method that is aware of this potential bias and utilizes an estimate of the differences during the generation of the final prediction method. For this, we introduce a set of sparse classifiers based on L1-SVMs [1], under the constraint of disjoint features used by classifiers. Furthermore, for each feature chosen by one of the classifiers, we introduce a regression model based on Gaussian process regression that uses additional features. For a given test sample we can then use these regression models to estimate for each classifier how well its features are predictable by the corresponding Gaussian process regression model. This information is then used for a confidence-based weighting of the classifiers for the test sample. Schapire and Singer showed that incorporating confidences of classifiers can improve the performance of an ensemble method [2]. However, in their setting confidences of classifiers are estimated using the training data and are thus fixed for all test samples, whereas in our setting we estimate confidences of individual classifiers per given test sample.

In our evaluation, the new method achieved state-of-the-art performance on many different cancer data sets with measured DNA methylation or gene expression. Moreover, we developed a method to visualize our learned classifiers to find interesting associations with the target label. Applied to a leukemia data set we found several ribosomal proteins associated with leukemia that might be interesting targets for follow-up studies and support the hypothesis that the ribosomes are a new frontier in gene regulation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: ICML, vol. 98, pp. 82–90 (1998)
Google Scholar
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)
Article MATH Google Scholar
Shapiro, E., Biezuner, T., Linnarsson, S.: Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14(9), 618–630 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Campus E1 4, 66123, Saarbrücken, Germany
Adrin Jalali & Nico Pfeifer
Saarbrücken Graduate School of Computer Science, Saarland University, Saarbrücken, Germany
Adrin Jalali

Authors

Adrin Jalali
View author publications
You can also search for this author in PubMed Google Scholar
Nico Pfeifer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

David R. Cheriton School of Computer Science, University of Waterloo, ON, Canada
Dan Brown
Institute of Microbiology and Genetics, Department of Bioinformatics, University of Göttingen, Germany, Goldschmidtstr. 1, 37077, Göttingen, Germany
Burkhard Morgenstern

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jalali, A., Pfeifer, N. (2014). Interpretable Per Case Weighted Ensemble Method for Cancer Associations. In: Brown, D., Morgenstern, B. (eds) Algorithms in Bioinformatics. WABI 2014. Lecture Notes in Computer Science(), vol 8701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44753-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-662-44753-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44752-9
Online ISBN: 978-3-662-44753-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics