Advertisement

Automatic Control and Computer Sciences

, Volume 53, Issue 1, pp 28–38 | Cite as

Classification Methodology for Bioinformatics Data Analysis

  • M. Gasparovica-AsīteEmail author
  • L. AleksejevaEmail author
Article
  • 10 Downloads

Abstract

The paper presents a methodology for bioinformatics data analysis. First, it describes the use of data analysis in bioinformatics—data preprocessing approaches, missing data processing approaches, data dimensionality reduction and classification algorithms. Then, the next section determines the most appropriate data analysis methods, which should be used in bioinformatics data analysis methodology to solve diagnostic classification task. The methodology was practically approbated in experiments using WEKA software and real-world bioinformatics data sets. This allowed determination of specific method realizations that show the best classification result; all intermediate results are recorded. Finally, the best preprocessing method sequence for this methodology is determined.

Keywords:

data mining bioinformatics preprocessing 

REFERENCES

  1. 1.
    Nigles, M. and Linge, J.P., Bioinformatics, France: Institut Pasteur, 2015. http://www.pasteur.fr/recherche/ unites/Binfs/definition/bioinformatics_definition.html. Accessed April 12, 2015.Google Scholar
  2. 2.
    Lu, Y. and Han, J., Cancer classification using gene expression data, Inf. Syst., 2003, vol. 28, pp. 243–268.CrossRefzbMATHGoogle Scholar
  3. 3.
    Zhang, N. and Lu, W.F., An efficient data preprocessing method for mining customer survey data, Industrial Informatics, 5th IEEE International Conference, Vienna, 2007, pp. 573–578.Google Scholar
  4. 4.
    Data preprocessing techniques for data mining, Winter School on “Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets,” Indian Agricultural Statistics Research Institute, 2002, pp. 139–144. http://www.iasri.res.in/ebook/win_school_aa/notes/Data_Preprocessing.pdf. Accessed April 15, 2015.Google Scholar
  5. 5.
    Tan, P.-N., Steinbach, M., and Kumar, V., Introduction to Data Mining, Addison-Wesley, 2005.Google Scholar
  6. 6.
    Li, D., Deogun, J., Spaulding, W., et al., Rough sets and current trends in computing, Proceedings of 4th International Conference, RSCTC 2004, Uppsala, Berlin Heidelberg: Springer, 2004, pp. 573–579.Google Scholar
  7. 7.
    Maimon, O. and Rokach, L., Data Mining and Knowledge Discovery Handbook, Berlin Heidelberg: Springer, 2010.CrossRefzbMATHGoogle Scholar
  8. 8.
    Saeys, Y., Inza, I., and Larranaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 2007, vol. 23, no. 19, pp. 2507–2517.CrossRefGoogle Scholar
  9. 9.
    Han, J., Kamber, M., and Pie, J., Data Mining: Concepts and Techniques, San Francisco: Morgan Kaufmann Publishers, 2005, 2nd ed.Google Scholar
  10. 10.
    Gasparovica, M. and Aleksejeva, L., Feature selection for bioinformatics data sets—is it recommended?, Proceedings of the 5th International Conference on Applied Information and Communication Technologies (AICT2012), Latvia, Jelgava, April 26–27, 2012, Jelgava, 2012, pp. 325–335.Google Scholar
  11. 11.
    Alcalá-Fdez, J., Fernandez, A. Luengo, J., et al., KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework, J. Mult.-Valued Logic Soft Comput., 2011, vol. 17, nos. 2–3, pp. 255–287.Google Scholar
  12. 12.
    Hall, M., Frank, E., Holmes, G., et al., The WEKA data mining software: An update, ACM SIGKDD Explor. Newsl., 2009, vol. 11, no. 1, pp. 10–18.CrossRefGoogle Scholar
  13. 13.
    Sensitivity and Specificity, Michigan State University, Office of Medical Education Research and Development, College of Human Medicine, 2008. http://omerad.msu.edu/ebm/Diagnosis/Diagnosis4.html. Accessed April 21, 2015.Google Scholar
  14. 14.
    Yu, L. and Liu, H., Feature selection for high-dimensional data: A fast correlation-based filter solution, Proceedings of the 20th International Conference on Machine Learning (ICML-2003), August 21–24, 2003, Washington, DC: AAAI Press, Menlo Park, California, 2003, pp. 856–863.Google Scholar
  15. 15.
    Gasparovica-Asite, M., Fuzzy classification methodology for processing and analyzing bioinformatics data, PhD Thesis, Riga: Riga Technical University, 2015.Google Scholar

Copyright information

© Allerton Press, Inc. 2019

Authors and Affiliations

  1. 1.Riga Technical UniversityRigaLatvia

Personalised recommendations