Abstract
Recently, a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how these same dependencies affect our ability to draw accurate conclusions about the performance of the models. More specifically, the complex link structure and attribute dependencies in relational data violate the assumptions of many conventional statistical tests and make it difficult to use these tests to assess the models in an unbiased manner. In this work, we examine the task of within-network classification and the question of whether two algorithms will learn models that will result in significantly different levels of performance. We show that the commonly used form of evaluation (paired t-test on overlapping network samples) can result in an unacceptable level of Type I error. Furthermore, we show that Type I error increases as (1) the correlation among instances increases and (2) the size of the evaluation set increases (i.e., the proportion of labeled nodes in the network decreases). We propose a method for network cross-validation that combined with paired t-tests produces more acceptable levels of Type I error while still providing reasonable levels of statistical power (i.e., 1−Type II error).
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Chakrabarti S, Dom B, Indyk P (1998) Enhanced hypertext categorization using hyperlinks. In: Proceedings of the ACM SIGMOD int’l conference on management of data. pp 307–318
Cohen P (1995) Empirical methods for artificial intelligence. MIT Press, Cambridge
Dhurandhar A, Dobra A (2008) Probabilistic characterization of random decision trees. J Mach Learn Res 9: 2321–2348
Dhurandhar A, Dobra A (2008) Study of classification models and model selection measures based on moment analysis. In: NIPS’08 workshop on new challenges in theoretical machine learning: learning with data-dependent concept spaces
Dietterich T (1998) Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 10: 1895–1923
Dundar M, Krishnapuram B, Bi J, Rao RB (2007) Learning classifiers when the training data is not iid. In: Proceedings of the 20th int’l joint conference on artificial intelligence
Gallagher B, Eliassi-Rad T (2007) An examination of experimental methodology for classifiers of relational data. In: Workshop proceedings of the seventh IEEE int’l conference on data mining. pp 411–416
Gallagher B, Eliassi-Rad T (2008) Leveraging label-independent features for classification in sparsely labeled networks: an empirical study. In: Proceedings of the second ACM SIGKDD workshop on social network mining and analysis (SNA-KDD’08)
Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C (2008) Using ghost edges for classification in sparsely labeled networks. In: Proceedings of the 14th ACM SIGKDD Int’l conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 256–264
Getoor L, Friedman N, Koller D, Taskar B (2001) Learning probabilistic models of relational structure. In: Proceedings of the 18th Int’l conference on machine learning. Morgan Kaufmann, San Francisco, CA, pp 170–177
Getoor L, Friedman N, Koller D, Taskar B (2002) Learning probabilistic models with link uncertainty. J Mach Learn Res 3: 679–707
Getoor L, Segal E, Taskar B, Koller D (2001) Probabilistic models of text and link structure for hypertext classification. In: In IJCAI’01 workshop on text learning: beyond supervision. Morgan Kaufmann, San Francisco, CA, pp 170–177
Harris K (2008) The national longitudinal study of adolescent health (add health), waves i & ii, 1994–1996; wave iii, 2001–2002 [machine-readable data file and documentation]. Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC
Herbster M, Lever G, Pontil M (2008) Exploiting cluster-structure to predict the labeling of a graph. In: NIPS’08 workshop on new challenges in theoretical machine learning: learning with data-dependent concept spaces
Herbster M, Lever G, Pontil M (2008) Online prediction on large diameter graphs. In: NIPS’08 workshop on new challenges in theoretical machine learning: learning with data-dependent concept spaces
Jensen D, Neville J, Gallagher B (2004) Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD int’l conference on knowledge discovery and data mining. pp 593–598
Lu Q, Getoor L (2003) Link-based classification. In: Proceedings of the 20th int’l conference on machine learning. pp 496–503
Macskassy S, Provost F (2003) A simple relational classifier. In: Proceedings of the 2nd workshop on multi-relational data mining, KDD2003. pp 64–76
Macskassy S, Provost F (2007) Classification in networked data: a toolkit and a univariate case study. J Mach Learn Res 8(May): 935–983
Macskassy SA (2007) Classification in networked data: a toolkit and a univariate case study. In: Proceedings of the twenty-second conference on artificial intelligence. pp 590–595
McDowell L, Gupta K, Aha D (2007) Cautious inference in collective classification. In: Proceedings of the 22nd AAAI conference on artificial intelligence
Mohri M, Rostamizadeh A (2007) Stability bounds for non-i.i.d. processes. In: Procedings of the neural information processing systems conference, 20
Mohri M, Rostamizadeh A (2010) Stability bounds for stationary ϕ-mixing and β-mixing processes. J Mach Learn Res 11: 789–814
Neville J, Gallagher B, Eliassi-Rad T (2009) Evaluating statistical tests for within-network classifiers of relational data. In: Proceedings of the 9th IEEE int’l conference on data mining. pp 397–406
Neville J, Jensen D (2005) Leveraging relational autocorrelation with latent group models. In: Proceedings of the 5th IEEE int’l conference on data mining. pp 322–329
Neville J, Jensen D (2007) Relational dependency networks. J Mach Learn Res 8: 653–692
Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD int’l conference on knowledge discovery and data mining. pp 625–630
Neville J, Jensen D, Gallagher B (2003) Simple estimators for relational Bayesian classifers. In: Proceedings of the 3rd IEEE int’l conference on data mining. pp 609–612
Perlich C, Provost F (2006) Acora: distribution-based aggregation for relational learning from identifier attributes. Mach Learn 62(1/2): 65–105
Ralaivola L, Szafranski M, Stempfel G (2008) Chromatic pac-bayes bounds for non-iid data. In: NIPS’08 workshop on new challenges in theoretical machine learning: learning with data-dependent concept spaces
Sen P, Namata G, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data.. AI Mag 29(3): 93–106
Steinwart I, Christmann A (2009) Fast learning from non-i.i.d. observations. In: Proceedings of the neural information processing systems conference, 22
Taskar B (2009) Structured prediction cascades. In: NIPS’09 workshop on approximate learning of large scale graphical models: theory and applications
Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic models for relational data. In: Proceedings of the 18th conference on uncertainty in artificial intelligence. pp 485–492
Taskar B, Segal E, Koller D (2001) Probabilistic classification and clustering in relational data. In: Proceedings of the 17th int’l joint conference on artificial intelligence. pp 870–878
Usunier N, Amini MR, Gallinari P (2005) Generalization error bounds for classifiers trained with interdependent data. In: Proceedings of the neural information processing systems conference, 18
Vitale F, Cesa-Bianchi N, Gentile C (2008) Online graph predictionwith random trees. In: NIPS’08 workshop on new challenges in theoretical machine learning: learning with data-dependent concept spaces
Xu Z, Kersting K, Tresp V (2009) Multi-relational learning with gaussian processes. In: Proceedings of the 21st int’l joint conference on artificial intelligence
Zhang X, Song L, Gretton A, Smola A (2008) Kernel measures of independence for non-iid data. In: Proceedings of the neural information processing systems conference, 21
Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th int’l conference on machine learning
Acknowledgments
We thank Rongjing Xiang for her assistance in experimental implementation. This work was performed under the auspices of the US Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344 (LLNL-JRNL-455699). This work was also supported by DARPA and NSF under contract numbers NBCH1080005 and SES-0823313.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Neville, J., Gallagher, B., Eliassi-Rad, T. et al. Correcting evaluation bias of relational classifiers with network cross validation. Knowl Inf Syst 30, 31–55 (2012). https://doi.org/10.1007/s10115-010-0373-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0373-1