Semi-parametric analysis of multi-rater data
Datasets that are subjectively labeled by a number of experts are becoming more common in tasks such as biological text annotation where class definitions are necessarily somewhat subjective. Standard classification and regression models are not suited to multiple labels and typically a pre-processing step (normally assigning the majority class) is performed. We propose Bayesian models for classification and ordinal regression that naturally incorporate multiple expert opinions in defining predictive distributions. The models make use of Gaussian process priors, resulting in great flexibility and particular suitability to text based problems where the number of covariates can be far greater than the number of data instances. We show that using all labels rather than just the majority improves performance on a recent biological dataset.
KeywordsSemi-parametric Gaussian processes Machine learning Multi-rater Classification
Unable to display preview. Download preview PDF.
- Bickel, S., Brefeld, U., Faulstich, L., Hakenberg, J., Leser, U., Plake, C., Scheffer, T.: A support vector machine classifier for gene name recognition. In: EMBO Workshop: A Critical Assessment of Text Mining Methods in Molecular Biology, Granada, Spain, March 2004 Google Scholar
- Cohen, K., Fox, L., Ogren, P., Hunter, L.: Corpus design for biomedical natural language processing. In: Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases (Jan. 2005) Google Scholar
- Girolami, M., Zhong, M.: Data integration for classification problems emplying Gaussian process priors. Adv. Neural Inf. Process. Syst. 21 (2007) Google Scholar
- Johnson, V., Albert, J.: Ordinal Data Modeling. books.google.com (Jan. 1999)
- Rogers, S., Girolami, M.: Multi-class semi-supervised learning with the ε-truncated multinomial probit Gaussian process. J. Mach. Learn. Res. Workshop Conf. Proc. 1, 17–32 (2007) Google Scholar
- Smyth, P., Fayyad, U., Burl, M., Perona, P., Baldi, P.: Inferring ground truth from subjective labelling of venus images. Adv. Neural Inf. Process. Syst. 7 (1995) Google Scholar
- Versley, Y.: Disagreement dissected: Vagueness as a source of ambiguity in nominal (co-) reference. In: Ambiguity in Anaphora Workshop Proceedings (2006) Google Scholar