Optimization Problem of k-NN Classifier in DNA Microarray Methods

  • Urszula BentkowskaEmail author
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 378)


The microarrays are a particularly interesting tool in modern molecular biology, not only because of the wide spectrum of applications such as analysis of the genome structure, profile gene expression, genotyping, sequencing, but also due to the possibility of testing a large number of objects in one experiment (cf. [1, 2]). However, the identification of relevant information from a huge amount of data obtained using microarrays requires the use of sophisticated bioinformatic methods. Clustering methods or machine learning algorithms are applied. However, when such methods are used, there is a problem of lowering their performance on test data due to the large number of attributes (columns). In this chapter microarray methods are applied for identification of marker genes. Our aim is to show that the quality of classification in the case of large number of attributes may be improved while using the microarray methods and interval modeling. There will be considered the so called vertical decomposition of a table representing a given data set.


  1. 1.
    Bodrossy, L.: Diagnostic oligonucleotide microarrays for microbiology. In: Blalock, E. (ed.) A Beginners Guide to Microarrays, pp. 43–92. Kluwer Academic Publisher, New York (2003)CrossRefGoogle Scholar
  2. 2.
    Heller, M.J.: DNA microarray technology: devices, systems, and applications. Annu. Rev. Biomed. Eng. 4, 129–153 (2002)CrossRefGoogle Scholar
  3. 3.
    Fukushima, M., Kakinuma, K., Hayashi, H., et al.: Detection and Identification of mycobacterium species isolates by DNA microarray. J. Clin. Microbiol. 41, 2605–2615 (2003)CrossRefGoogle Scholar
  4. 4.
    Karczmarczyk, M., Bartoszcze, M.: DNA microarrays - new tool in identification of bilogical agents. Przegl. Epidemiol. (In Pol.) 60, 803–811 (2006)Google Scholar
  5. 5.
    Robertson, B.H., Nicholson, J.K.A.: New microbiology tools for public health and their implications. Annu. Rev. Public Health 26, 281–302 (2005)CrossRefGoogle Scholar
  6. 6.
    Stenger, D.A., Andreadis, J.D., Vora, G.J., et al.: Potential applications of DNA microarrays in biodefenserelated diagnostics. Curr. Opin. Biotechnol. 13, 208–212 (2002)CrossRefGoogle Scholar
  7. 7.
    Straub, T.M., Quinonez-Diaz, M.D., Valdez, C.O., et al.: Using DNA microarrays to detect multiple pathogen threats in water. Water Supply 2, 107–114 (2004)CrossRefGoogle Scholar
  8. 8.
    Yu, X., Susa, M., Knabbe, C., et al.: Development and Validation of a Diagnostic DNA Microarray To Detect Quinolone-Resistant Escherichia coli among Clinical Isolates. J. Clin. Microbiol. 42, 4083–4091 (2004)CrossRefGoogle Scholar
  9. 9.
    Stȩpniak, P., Handschuh, L., Figlerowicz, M.: DNA microarray data analysis (in Polish). Biotechnologia 4(83), 68–87 (2008)Google Scholar
  10. 10.
    Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)CrossRefGoogle Scholar
  11. 11.
    Deegalla, S., Boström, H.: Reducing high-dimensional data by principal component analysis vs. random projection for nearest neighbor classification. In: Proceedings of the 5th International Conference on Machine Learning and Applications, ICMLA 2006, pp. 245–250. IEEE Computer Society, Washington, DC, USA (2006)Google Scholar
  12. 12.
    Deegalla, S., Boström, H.: Classification of microarrays with kNN: comparison of dimensionality reduction methods. In: Yin, H., et al. (eds.) IDEAL 2007, LNCS 4881, pp. 800–809. Springer, Berlin (2007)Google Scholar
  13. 13.
    Bentkowska, U., Bazan, J. G., Rza̧sa, W., Zarȩba, L.: Application of interval-valued aggregation to optimization problem of \(k\)-\(NN\) classifiers in DNA microarray methods (under preparation)Google Scholar
  14. 14.
    Frank, E., Hall, M.A., Witten, I.H.: The WEKA Workbench. Online Appendix for Data Mining Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, Burlington (2016)Google Scholar
  15. 15.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  16. 16.
    ELVIRA Biomedical data set repository.
  17. 17.
    Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)CrossRefGoogle Scholar
  18. 18.
    Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander. E.: Molecular classification of cancer. Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)CrossRefGoogle Scholar
  19. 19.
    Alizadeh, A., Eisen, M., Davis, R., Ma, C., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)CrossRefGoogle Scholar
  20. 20.
    Gordon, G., Jensen, R., Hsiao, L., Gullans, S., Blumenstock, J., Ramaswamy, S., Richards, W., Sugarbaker, D., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17), 4963–4967 (2002)Google Scholar
  21. 21.
    Petricoin III, E.F., et al.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359, 572–577 (2002)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Faculty of Mathematics and Natural SciencesUniversity of RzeszówRzeszówPoland

Personalised recommendations