Automating the Choice Between Single or Dual Annotation for Classifier Training

Fukuda, Satoshi; Ishita, Emi; Tomiura, Yoichi; Oard, Douglas W.

doi:10.1007/978-3-030-91669-5_19

Satoshi Fukuda¹¹,
Emi Ishita¹²,
Yoichi Tomiura¹² &
…
Douglas W. Oard¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13133))

Included in the following conference series:

International Conference on Asian Digital Libraries

898 Accesses

Abstract

Many emerging digital library applications rely on automated classifiers that are trained using manually assigned labels. Accurately labeling training data for text classification requires either highly trained coders or multiple annotations, either of which can be costly. Previous studies have shown that there is a quality-quantity trade-off for this labeling process, and the optimal balance between quality and quantity varies depending on the annotation task. In this paper, we present a method that learns to choose between higher-quality annotation that results from dual annotation and higher-quantity annotation that results from the use of a single annotator per item. We demonstrate the effectiveness of this approach through an experiment in which a binary classifier is constructed for assigning human value categories to sentences in newspaper editorials.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The name comes from a colloquial reference to slot machines used by gamblers called “one-armed bandits.” In the imagined multi-armed bandit scenario, the gambler seeks to pull the arm that would yield the greatest profit.
2.
http://chasen.org/~taku/software/TinySVM/
3.
http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JUMAN.
4.
Note the difference in annotation strategy between constructing annotated data in [11] and applying the multi armed bandit (MAB)-problem method: in [11], coders assigned several labels to each sentence in one sitting; however for the MAB-problem method, one label is assigned or not for each sentence.

References

Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)
Article Google Scholar
Auer, P., CesaBianchi, N., Fischer, P.: Finitetime analysis of the multiarmed bandit problem. Mach. Learn. 47(23), 235–256 (2002)
Google Scholar
Bennett, E.M., Alpert, R., Goldstein, A.C.: Communications through limited response questioning. Public Opin. Q. 18(3), 303–308 (1954)
Article Google Scholar
Cai, W., Zhang, Y., Zhou, J.: Maximizing expected model change for active learning in regression. In: Proceedings of the ICDM, pp. 51–60 (2013)
Google Scholar
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
Google Scholar
Cohen, J.: Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213–220 (1968)
Article Google Scholar
Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks. In: Proceedings of the AAAI, pp. 746–751 (2005)
Google Scholar
Fort, K., François, C., Galibert, O., Ghribi, M.: Analyzing the impact of prevalence on the evaluation of a manual annotation campaign. In: Proceedings of the LREC, pp. 1474–1480 (2012)
Google Scholar
Garivier, A., Moulines, E.: On upper-confidence bound policies for switching bandit problems. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) Algorithmic Learning Theory. ALT 2011. LNCS, vol. 6925, pp. 174–188. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24412-4_16
Howe, J.: Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Crown Publishing Group, New York (2008)
Google Scholar
Ishita, E., Fukuda, S., Oga, T., Tomiura, Y., Oard, D.W., Fleischmann, K.R.: Cost-effective learning for classifying human values. In: Proceedings of the iConference (2020)
Google Scholar
Kuriyama, K., Kando, N., Nozue, T., Eguchi, K.: Pooling for a large-scale test collection: an analysis of the search results from the first NTCIR workshop. Inf. Retr. 5(1), 41–59 (2002)
Article Google Scholar
Nguyen, A.T., Wallace, B.C., Lease, M.: Combining crowd and expert labels using decision theoretic active learning. In: Proceedings of the HCOMP, pp. 120–129 (2015)
Google Scholar
Raj, V. and Kalyani, S.: Taming nonstationary bandits: A bayesian approach. arXiv preprint arXiv:1707.09727 (2017)
Scott, W.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955)
Article Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4), 285–294 (1933)
Article Google Scholar
Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. The MIT Press, Cambridge (2005)
Google Scholar
Welinder, P., Branson, S., Belongie, S., Perona, P.: The multidimensional wisdom of crowds. In: Proceedings of the NIPS, pp. 2424–2432 (2010)
Google Scholar
Zhang, Y., Cui, L., Huang, J., Miao, C.: CrowdMerge: achieving optimal crowdsourcing quality management by sequent merger. In: Proceedings of the ICCSE, pp. 1–8 (2018)
Google Scholar

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP18H03495.

Author information

Authors and Affiliations

Chuo University, Tokyo, 112-8551, Japan
Satoshi Fukuda
Kyushu University, Fukuoka, 819-0395, Japan
Emi Ishita & Yoichi Tomiura
University of Maryland, College Park, MD, 20742, USA
Douglas W. Oard

Authors

Satoshi Fukuda
View author publications
You can also search for this author in PubMed Google Scholar
Emi Ishita
View author publications
You can also search for this author in PubMed Google Scholar
Yoichi Tomiura
View author publications
You can also search for this author in PubMed Google Scholar
Douglas W. Oard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satoshi Fukuda .

Editor information

Editors and Affiliations

National Taiwan Normal University, Taipei, Taiwan
Hao-Ren Ke
Nanyang Technological University, Singapore, Singapore
Chei Sian Lee
Kyoto University, Kyoto, Japan
Kazunari Sugiyama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fukuda, S., Ishita, E., Tomiura, Y., Oard, D.W. (2021). Automating the Choice Between Single or Dual Annotation for Classifier Training. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-91669-5_19
Published: 30 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics