Correcting Bias in Statistical Tests for Network Classifier Evaluation

Wang, Tao; Neville, Jennifer; Gallagher, Brian; Eliassi-Rad, Tina

doi:10.1007/978-3-642-23808-6_33

Tao Wang²³,
Jennifer Neville²⁴,
Brian Gallagher²⁵ &
…
Tina Eliassi-Rad²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6913))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5551 Accesses
3 Citations

Abstract

It is difficult to directly apply conventional significance tests to compare the performance of network classification models because network data instances are not independent and identically distributed. Recent work [6] has shown that paired t-tests applied to overlapping network samples will result in unacceptably high levels (e.g., up to 50%) of Type I error (i.e., the tests lead to incorrect conclusions that models are different, when they are not). Thus, we need new strategies to accurately evaluate network classifiers. In this paper, we analyze the sources of bias (e.g. dependencies among network data instances) theoretically and propose analytical corrections to standard significance tests to reduce the Type I error rate to more acceptable levels, while maintaining reasonable levels of statistical power to detect true performance differences. We validate the effectiveness of the proposed corrections empirically on both synthetic and real networks.

Download to read the full chapter text

Chapter PDF

Explaining classification performance and bias via network structure and sampling technique

Article Open access 21 October 2021

Cross-validation estimate of the number of clusters in a network

Article Open access 12 June 2017

Statistical Network Models

Keywords

References

Bengio, Y., Grandvalet, Y.: No unbiased estimator of the variance of k-fold cross-validation. Journal of Machine Learning Research 5, 1089–1105 (2004)
MATH MathSciNet Google Scholar
Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10, 1895–1923 (1998)
Article Google Scholar
Franklin, J.N.: Matrix Theory. Dover Publications, Mineola (1993)
Google Scholar
Macskassy, S., Provost, F.: Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research 8, 935–983 (2007)
Google Scholar
Nadeau, C., Bengio, Y.: Inference for the generalization error. Machine Learning Journal 52(3), 239–281 (2003)
Article MATH Google Scholar
Neville, J., Gallagher, B., Eliassi-Rad, T., Wang, T.: Correcting evaluation bias of relational classifiers with network cross validation. Knowledge and Information Systems, 1–25 (2011)
Google Scholar
Owen, A.B.: Variance of the number of false discoveries. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67, 411–426 (2005)
Article MATH MathSciNet Google Scholar
Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Purdue University, West Lafayette, IN, USA
Tao Wang
Department of Computer Science and Statistics, Purdue University, West Lafayette, IN, USA
Jennifer Neville
Lawrence Livermore National Laboratory, Livermore, CA, USA
Brian Gallagher
Department of Computer Science, Rutgers University, Piscataway, NJ, USA
Tina Eliassi-Rad

Authors

Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Neville
View author publications
You can also search for this author in PubMed Google Scholar
Brian Gallagher
View author publications
You can also search for this author in PubMed Google Scholar
Tina Eliassi-Rad
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics and Telecommunications, University of Athens, Panepistimioupolis, Ilisia, 15784, Athens, Greece
Dimitrios Gunopulos
Google Switzerland GmbH, Brandschenkestrasse 110, 8002, Zurich, Switzerland
Thomas Hofmann
Department of Computer Science, University of Bari “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Donato Malerba
Deptartment of Informatics, Athens University of Economics and Business, Patision 76, 10434, Athens, Greece
Michalis Vazirgiannis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, T., Neville, J., Gallagher, B., Eliassi-Rad, T. (2011). Correcting Bias in Statistical Tests for Network Classifier Evaluation. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science(), vol 6913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23808-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-23808-6_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23807-9
Online ISBN: 978-3-642-23808-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Correcting Bias in Statistical Tests for Network Classifier Evaluation

Abstract

Chapter PDF

Similar content being viewed by others

Explaining classification performance and bias via network structure and sampling technique

Cross-validation estimate of the number of clusters in a network

Statistical Network Models

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Correcting Bias in Statistical Tests for Network Classifier Evaluation

Abstract

Chapter PDF

Similar content being viewed by others

Explaining classification performance and bias via network structure and sampling technique

Cross-validation estimate of the number of clusters in a network

Statistical Network Models

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation