Statistical Hypothesis Testing in Positive Unlabelled Data

  • Konstantinos Sechidis
  • Borja Calvo
  • Gavin Brown
Conference paper

DOI: 10.1007/978-3-662-44845-8_5

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8726)
Cite this paper as:
Sechidis K., Calvo B., Brown G. (2014) Statistical Hypothesis Testing in Positive Unlabelled Data. In: Calders T., Esposito F., Hüllermeier E., Meo R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science, vol 8726. Springer, Berlin, Heidelberg

Abstract

We propose a set of novel methodologies which enable valid statistical hypothesis testing when we have only positive and unlabelled (PU) examples. This type of problem, a special case of semi-supervised data, is common in text mining, bioinformatics, and computer vision. Focusing on a generalised likelihood ratio test, we have 3 key contributions: (1) a proof that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities; (2) a new methodology that compensates this and enables power analysis, allowing sample size determination for observing an effect with a desired power; and finally, (3) a new capability, supervision determination, which can determine a-priori the number of labelled examples the user must collect before being able to observe a desired statistical effect. Beyond general hypothesis testing, we suggest the tools will additionally be useful for information theoretic feature selection, and Bayesian Network structure learning.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Konstantinos Sechidis
    • 1
  • Borja Calvo
    • 2
  • Gavin Brown
    • 1
  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUK
  2. 2.Department of Computer Science and Artificial IntelligenceUniversity of the Basque CountrySpain

Personalised recommendations