A One-Class Classification Approach for Protein Sequences and Structures

Bánhalmi, András; Busa-Fekete, Róbert; Kégl, Balázs

doi:10.1007/978-3-642-01551-9_30

A One-Class Classification Approach for Protein Sequences and Structures

András Bánhalmi²²,
Róbert Busa-Fekete^22,23 &
Balázs Kégl²³

Conference paper

768 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5542))

Abstract

The One-Class Classification (OCC) approach is based on the assumption that samples are available only from a target class in the training phase. OCC methods have been applied with success to problems where the classes are very different in size. As class-imbalance problems are typical in protein classification tasks, we were interested in testing one-class classification algorithms for the detection of distant similarities in protein sequences and structures. We found that the OCC approach brought about a small improvement in classification performance compared to binary classifiers (SVM, ANN, Random Forest). More importantly, there is a substantial (50 to 100 fold) improvement in the training time. OCCs may provide an especially useful alternative for processing those protein groups where discriminative classifiers cannot be easily trained.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, Y., Zhou, X.S., Huang, T.S.: One-class SVM for learning in image retrieval. In: 2001 International Conference on Image Processing proc., vol. 1, pp. 34–37 (2001)
Google Scholar
Shin, H.J., Eom, D.-H., Kim, S.-S.: One-class support vector machines: an application in machine fault detection and classification. Comput. Ind. Eng. 48(2), 395–408 (2005)
Article Google Scholar
He, C., Girolami, M., Ross, G.: Employing optimised combinations of one-class classifiers for automated currency validation. Pattern Recognition 37, 1085–1096 (2004)
Article Google Scholar
Sachs, A., Thiel, C., Schwenker, F.: One-class support-vector machines for the classification of bioacoustic time series. ICGST International Journal on Artificial Intelligence and Machine Learning (AIML) 6(4), 29–34 (2006)
Google Scholar
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. Journal of Machine Learning Research 2, 139–154 (2001)
Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk E-mail. In: Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, AAAI Technical Report WS-98-05 (1998)
Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Son, New York (2001)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Google Scholar
Parzen, E.: On the estimation of a probability density function and mode. Annals of Mathematical Statistics 33, 1065–1076 (1962)
Article Google Scholar
Japkowicz, N., Myers, C., Gluck, M.A.: A novelty detection approach to classification. In: IJCAI, pp. 518–523 (1995)
Google Scholar
Ypma, A., Duin, R.: Support objects for domain approximation (1998)
Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural Computation 13(7), 1443–1471 (2001)
Article PubMed Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recogn. Lett. 20(11-13), 1191–1199 (1999)
Article Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector data description. Mach. Learn. 54(1), 45–66 (2004)
Article Google Scholar
Tax, D.M.J.: One-class classification; Concept-learning in the absence of counter-examples. Ph.D thesis, Delft University of Technology (2001)
Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article CAS PubMed Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 147(1), 195–197 (1981)
Article CAS PubMed Google Scholar
Holm, L., Park, J.: Dalilite workbench for protein structure comparison. Bioinformatics (16), 566–567 (2000)
Google Scholar
Vlahovicek, K., Gaspari, Z., Pongor, S.: Efficient recognition of folds in protein 3d structures by the improved pride algorithm. Bioinformatics (21), 3322–3323 (2005)
Google Scholar
Vapnik, V.N.: Statistical Learning Theory. John Wiley and Son, Chichester (1998)
Google Scholar
Breiman, L.: Random forests. Machine Learning V45(1), 5–32 (2001)
Article Google Scholar
Sonego, P., Pacurar, M., Dhir, S., Kertész-Farkas, A., Kocsor, A., Gáspari, Z., Leunissen, A.M., Pongor, S.: A protein classification benchmark collection for machine learning. Nucleic Acids Research 35(suppl. 1), D232–D236 (2007)
Article Google Scholar
Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: The cog database: an updated version includes eukaryotes. BMC Bioinformatics 4 (September 2003)
Google Scholar
Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for remote protein homology detection. In: RECOMB 2002: Proceedings of the sixth annual international conference on Computational biology, pp. 225–232. ACM Press, New York (2002)
Chapter Google Scholar
Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J., Chothia, C., Murzin, A.G.: Scop database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 32(Database issue) (January 2004)
Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U S A 89(22), 10915–10919 (1992)
Article CAS PubMed PubMed Central Google Scholar
Vlahovicek, K., Kajan, L., Agoston, V., Pongor, S.: The sbase domain sequence resource, release 12: prediction of protein domain-architecture using support vector machines. Nucleic Acids Research 33(suppl. 1), 223 (2005)
Google Scholar
Murvai, J., Vlahovicek, K., Szepesvári, C., Pongor, S.: Prediction of protein functional domains from sequences using artificial neural networks. Genome Res. 11, 1410–1417 (2001)
Article CAS PubMed PubMed Central Google Scholar
Paalanen, P.: Bayesian classification using Gaussian mixture model and EM estimation: Implementations and comparisons. Technical report, Department of Information Technology, Lappeenranta University of Technology, Lappeenranta (2004)
Google Scholar
Allinson, N.M., Yin, H.: Self-organising maps for pattern recognition. In: Oja, E., Kaski, S. (eds.) Kohonen Maps, pp. 111–120. Elsevier, Amsterdam (1999)
Chapter Google Scholar
Bánhalmi, A., Kocsor, A., Busa-Fekete, R.: Counter-example generation-based one-class classification. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 543–550. Springer, Heidelberg (2007)
Chapter Google Scholar
Bánhalmi, A.: One-class classification methods via automatic counter-example generation. In: AIAP 2008: Proceedings of the 26th IASTED International Multi-Conference, Anaheim, CA, USA. ACTA Press (2008)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)
Google Scholar
Joachims, T.: Making large-scale support vector machine learning practical. MIT Press, Cambridge (1998)
Google Scholar
Egan, J.P.: Signal Detection theory and ROC Analysis. Academic Press, New York (1975)
Google Scholar
Sonego, P., Kocsor, A., Pongor, S.: Roc analysis: applications to the classification of biological sequences and 3d structures. Brief Bioinform. (January 2008)
Google Scholar
Gribskov, M., Robinson, N.: Use of receiver operating characteristic (roc) analysis to evaluate sequence matching (1996)
Google Scholar
Cortes, C., Mohri, M.: Auc optimization vs. error rate minimization (2004)
Google Scholar
Ingleby, J.D.: Signal detection theory and psychophysics. Journal of Sound Vibration 5, 519–521 (1967)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University of Szeged, Aradi vértanúk tere 1., H-6720, Szeged, Hungary
András Bánhalmi & Róbert Busa-Fekete
LAL, University of Paris-Sud, CNRS, 91898, Orsay, France
Róbert Busa-Fekete & Balázs Kégl

Authors

András Bánhalmi
View author publications
You can also search for this author in PubMed Google Scholar
Róbert Busa-Fekete
View author publications
You can also search for this author in PubMed Google Scholar
Balázs Kégl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science & Engineering Department, University of Connecticut, 371 Fairfield Way, Unit 2155, CT 06269, Storrs, USA
Ion Măndoiu
Bioinformatics Research Group (BioRG), School of Computing and Information Sciences, Florida International University, 11200 SW 8th Street, Room ECS254, University Park, FL 33199, Miami, USA
Giri Narasimhan
Department of Computer Science, Georgia State University, P.O. Box 3994, GA 30302-3994, Atlanta, USA
Yanqing Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bánhalmi, A., Busa-Fekete, R., Kégl, B. (2009). A One-Class Classification Approach for Protein Sequences and Structures. In: Măndoiu, I., Narasimhan, G., Zhang, Y. (eds) Bioinformatics Research and Applications. ISBRA 2009. Lecture Notes in Computer Science(), vol 5542. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01551-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-01551-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01550-2
Online ISBN: 978-3-642-01551-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics