Biological Sequence Data Preprocessing for Classification: A Case Study in Splice Site Identification

  • A. K. M. A. Baten
  • S. K. Halgamuge
  • Bill Chang
  • Nalin Wickramarachchi
Conference paper

DOI: 10.1007/978-3-540-72393-6_144

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4492)
Cite this paper as:
Baten A.K.M.A., Halgamuge S.K., Chang B., Wickramarachchi N. (2007) Biological Sequence Data Preprocessing for Classification: A Case Study in Splice Site Identification. In: Liu D., Fei S., Hou Z., Zhang H., Sun C. (eds) Advances in Neural Networks – ISNN 2007. ISNN 2007. Lecture Notes in Computer Science, vol 4492. Springer, Berlin, Heidelberg

Abstract

The increasing growth of biological sequence data demands better and efficient analysis methods. Effective detection of various regulatory signals in these sequences requires the knowledge of characteristics, dependencies, and relationship of nucleotides in the surrounding region of the regulatory signals. A higher order Markov model is generally regarded as a useful technique for modeling higher order dependencies of the nucleotides. However, its implementation requires estimating a large number of computationally expensive parameters. In this paper, we propose a hybrid method consisting of a first order Markov model for sequence data preprocessing and a multilayer perceptron neural network for classification. The Markov model captures the compositional features and dependencies of nucleotides in terms of probabilistic parameters which are used as inputs to the classifier. The classifier combines the Markov probabilities nonlinearly for signal detection. When applied to the splice site detection problem using three widely used data sets, it is observed that the proposed hybrid method is able to model higher order dependencies with better classification accuracies.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • A. K. M. A. Baten
    • 1
  • S. K. Halgamuge
    • 1
  • Bill Chang
    • 1
  • Nalin Wickramarachchi
    • 2
  1. 1.Dynamic Systems and Control Research Group, DoMME, Faculty of Engineering, The University of Melbourne, Parkville 3010Australia
  2. 2.Department of Electrical Engineering, The University of MoratuwaSrilanka

Personalised recommendations