Semi-supervised Naive Hubness Bayesian k-Nearest Neighbor for Gene Expression Data

Buza, Krisztian

doi:10.1007/978-3-319-26227-7_10

Semi-supervised Naive Hubness Bayesian k-Nearest Neighbor for Gene Expression Data

Krisztian Buza⁷

Conference paper
First Online: 05 March 2016

970 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 403))

Abstract

Classification of gene expression data is the common denominator of various biomedical recognition tasks. However, obtaining class labels for large training samples may be difficult or even impossible in many cases. Therefore, semi-supervised classification techniques are required as semi-supervised classifiers take advantage of the unlabeled data. Furthermore, gene expression data is high dimensional which gives rise to the phenomena known under the umbrella of the curse of dimensionality, one of its recently explored aspects being the presence of hubs or hubness for short. Therefore, hubness-aware classifiers were developed recently, such as Naive Hubness Bayesian k-Nearest Neighbor (NHBNN). In this paper, we propose a semi-supervised extension of NHBNN and show in experiments on publicly available gene expression data that the proposed classifier outperforms all its examined competitors.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Article Google Scholar
Bhattacharjee, A., Richards, W.G., Staunton, J., Li, C., Monti, S., Vasa, P., Ladd, C., Beheshti, J., Bueno, R., Gillette, M., et al.: Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl. Acad. Sci. 98(24), 13790–13795 (2001)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New Jersey (2006)
MATH Google Scholar
Buza, K., Nanopoulos, A., Schmidt-Thieme, L.: INSIGHT: Efficient and effective instance selection for time-series classification. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, vol. 6635, pp. 149–160. Springer, Heidelberg (2011)
Chapter Google Scholar
Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Book Google Scholar
Guillaumin, M., Verbeek, J., Schmid, C.: Multimodal semi-supervised learning for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010), pp. 902–909 (2010)
Google Scholar
Lin, W.J., Chen, J.J.: Class-imbalanced classifiers for high-dimensional data. Br. Bioinform. 14(1), 13–26 (2013)
Article Google Scholar
Marussy, K.: The curse of intrinsic dimensionality in genome expression classification. In: Proceedings of the Students’ Scientific Conference, Budapest University of Technology and Economics (2014)
Google Scholar
Marussy, K., Buza, K.: Hubness-based indicators for semi-supervised time-series clas-sification. In: Proceeding of the 8th Japanese-Hungarian Symposium on Discrete Mathematics and Its Applications. pp. 97–108 (2013)
Google Scholar
Marussy, K., Buza, K.: SUCCESS: A new approach for semi-supervised classification of time-series. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artificial Intelligence and Soft Computing. Lecture Notes in Computer Science, vol. 7894, pp. 437–447. Springer, Heidelberg (2013)
Chapter Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: Nearest neighbors in high-dimensional data: the emergence and influence of hubs. In: Proceedings of the 26rd International Conference on Machine Learning (ICML). pp. 865–872. ACM (2009)
Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. (JMLR) 11, 2487–2531 (2010)
MathSciNet MATH Google Scholar
Radovanović, M., Nanopoulos, A., Ivanović, M.: Time-series classification in many intrinsic dimensions. In: Proceedings of the 10th SIAM International Conference on Data Mining (SDM). pp. 677–688 (2010)
Google Scholar
Radovanović, M.: Representations and Metrics in High-Dimensional Data Mining. Izdavačka knjižarnica Zorana Stojanovića, Novi Sad, Serbia (2011)
Google Scholar
Rish, I.: An empirical study of the naive Bayes classifier. In: Proceedings of the IJCAI Workshop on Empirical Methods in Artificial Intelligence (2001)
Google Scholar
Sotiriou, C., Neo, S.Y., McShane, L.M., Korn, E.L., Long, P.M., Jazaeri, A., Martiat, P., Fox, S.B., Harris, A.L., Liu, E.T.: Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc. Natl. Acad. Sci. 100(18), 10393–10398 (2003)
Article Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley, Boston (2005)
Google Scholar
Tomašev, N., Buza, K.: Hubness-aware knn classification of high-dimensional data in presence of label noise. Neurocomputing 160, 157–172 (2015)
Article Google Scholar
Tomašev, N., Buza, K., Marussy, K., Kis, P.B.: Hubness-aware classification, instance selection and feature construction: survey and extensions to time-series. Feature Selection for Data and Pattern Recognition, pp. 231–262. Springer, Heidelberg (2015)
Google Scholar
Tomašev, N., Mladenić, D.: Nearest neighbor voting in high dimensional data: learning from past occurrences. Comput. Sci. Inf. Syst. 9, 691–712 (2012)
Article Google Scholar
Tomašev, N., Radovanović, M., Mladenić, D., Ivanovicć, M.: A probabilistic approach to nearest neighbor classification: naive hubness Bayesian k-nearest neighbor. In: Proceeding of the CIKM Conference (2011)
Google Scholar
Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M.: The role of hubness in clustering high-dimensional data. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, vol. 6634, pp. 183–195. Springer, Heidelberg (2011)
Chapter Google Scholar
Tomašev, N., Radovanović, M., Mladenić, D., Ivanović, M.: Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification. Int. J. Mach. Learn. Cybern. 5(3), 79–84 (2013)
Google Scholar

Download references

Acknowledgments

This research was performed within the framework of the grant of the Hungarian Scientific Research Fund—OTKA 111710 PD. This paper was supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences.

Author information

Authors and Affiliations

BioIntelligence Lab, Institute of Genomic Medicine and Rare Disorders, Semmelweis University, Budapest, Hungary
Krisztian Buza

Authors

Krisztian Buza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krisztian Buza .

Editor information

Editors and Affiliations

Department of Systems, Wrocław University of Technology, Wroclaw, Poland
Robert Burduk
Department of Systems and Computer, Wrocław University of Technology, Wroclaw, Poland
Konrad Jackowski
Department of Systems and Computer, Wrocław University of Technology, Wroclaw, Poland
Marek Kurzyński
Dept. of Systems and Computer Networks, Wrocław University of Technology, Wroclaw, Poland
Michał Woźniak
Department of Systems, Wrocław University of Technology, Wroclaw, Poland
Andrzej Żołnierek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buza, K. (2016). Semi-supervised Naive Hubness Bayesian k-Nearest Neighbor for Gene Expression Data. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-26227-7_10
Published: 05 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26225-3
Online ISBN: 978-3-319-26227-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics