Properties of Object-Level Cross-Validation Schemes for Symmetric Pair-Input Data

Heimonen, Juho; Salakoski, Tapio; Pahikkala, Tapio

doi:10.1007/978-3-662-44415-3_39

Juho Heimonen^20,21,
Tapio Salakoski^20,21 &
Tapio Pahikkala^20,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8621))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

2281 Accesses

Abstract

In bioinformatics, many learning tasks involve pair-input data (i.e., inputs representing object pairs) where inputs are not independent. Two cross-validation schemes for symmetric pair-input data are considered. The mean and variance of cross-validation estimate deviations from respective generalization performances are examined in the situation where the learned model is applied to pairs of two previously unseen objects. In experiments with the task of learning protein functional similarities, large positive mean deviations were observed with the relaxed scheme due to training–validation dependencies while the strict scheme yielded small negative mean deviations and higher variances. The properties of the strict scheme can be explained by the reduction in cross-validation training set sizes when avoiding training–validation dependencies. The results suggest that the strict scheme is preferable in the given setting.

Download to read the full chapter text

Chapter PDF

Characterizing Multiple Instance Datasets

Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs

Article Open access 22 July 2022

Cross-validation Strategies for Balanced and Imbalanced Datasets

Keywords

References

Airola, A., Pahikkala, T., Waegeman, W., De Baets, B., Salakoski, T.: An experimental comparison of cross-validation techniques for estimating the area under the roc curve. Computational Statistics and Data Analysis 55, 1828–1844 (2011)
Article MathSciNet Google Scholar
Arlot, S., Celisse, A.: A survey of cross-validation procedures for model selection. Statistics Surveys 4, 40–79 (2010)
Article MATH MathSciNet Google Scholar
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)
Google Scholar
Bork, P., Dandekar, T., Diaz-Lazcoz, Y., Eisenhaber, F., Huynen, M., Yuan, Y.: Predicting function: from genes to genomes and back. J. Mol. Biol. 283, 707–725 (1998)
Article Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Article Google Scholar
Braga-Neto, U.M., Dougherty, E.R.: Is cross-validation valid for small-sample microarray classification? Bioinformatics 20, 374–380 (2004)
Article Google Scholar
Eisenberg, D., Marcotte, E.M., Xenarios, I., Yeates, T.O.: Protein function in the post-genomic era. Nature 405, 823–826 (2000)
Article Google Scholar
Han, L., Cui, J., Lin, H., Ji, Z., Cao, Z., Li, Y., Chen, Y.: Recent progresses in the application of machine learning approach for predicting protein functional class independent of sequence similarity. Proteomics 6, 4023–4037 (2006)
Article Google Scholar
Lee, D., Redfern, O., Orengo, C.: Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 8, 995–1005 (2007)
Article Google Scholar
Mei, S., Fei, W.: Amino acid classification based spectrum kernel fusion for protein subnuclear localization. BMC Bioinformatics 11(suppl. 1), S17 (2010)
Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning 52, 239–281 (2003)
Article MATH Google Scholar
Pahikkala, T., Suominen, H., Boberg, J.: Efficient cross-validation for kernelized least-squares regression with sparse basis expansions. Machine Learning 87, 381–407 (2012)
Article MATH MathSciNet Google Scholar
Park, Y., Marcotte, E.M.: Flaws in evaluation schemes for pair-input computational predictions. Nat. Methods 9, 1134–1136 (2012)
Article Google Scholar
The UniProt Consortium: Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res. 42, D191–D198 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

TUCS - Turku Centre for Computer Science, Turku, Finland
Juho Heimonen, Tapio Salakoski & Tapio Pahikkala
University of Turku, Turku, Finland
Juho Heimonen, Tapio Salakoski & Tapio Pahikkala

Authors

Juho Heimonen
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Salakoski
View author publications
You can also search for this author in PubMed Google Scholar
Tapio Pahikkala
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, University of Eastern Finland, 80101, Joensuu, Finland
Pasi Fränti
School of Computer Science, The University of Manchester, Manchester, UK
Gavin Brown
Delft University of Technology, Delft, The Netherlands
Marco Loog
Universidad de Alicante, Spain
Francisco Escolano
Università Ca’ Foscari Venezia, Venezia Mestre, Italy
Marcello Pelillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heimonen, J., Salakoski, T., Pahikkala, T. (2014). Properties of Object-Level Cross-Validation Schemes for Symmetric Pair-Input Data. In: Fränti, P., Brown, G., Loog, M., Escolano, F., Pelillo, M. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2014. Lecture Notes in Computer Science, vol 8621. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44415-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-662-44415-3_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44414-6
Online ISBN: 978-3-662-44415-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Properties of Object-Level Cross-Validation Schemes for Symmetric Pair-Input Data

Abstract

Chapter PDF

Similar content being viewed by others

Characterizing Multiple Instance Datasets

Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs

Cross-validation Strategies for Balanced and Imbalanced Datasets

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Properties of Object-Level Cross-Validation Schemes for Symmetric Pair-Input Data

Abstract

Chapter PDF

Similar content being viewed by others

Characterizing Multiple Instance Datasets

Understanding the Adjusted Rand Index and Other Partition Comparison Indices Based on Counting Object Pairs

Cross-validation Strategies for Balanced and Imbalanced Datasets

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation