Skip to main content

Automating the Choice Between Single or Dual Annotation for Classifier Training

  • Conference paper
  • First Online:
Towards Open and Trustworthy Digital Societies (ICADL 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13133))

Included in the following conference series:

  • 898 Accesses

Abstract

Many emerging digital library applications rely on automated classifiers that are trained using manually assigned labels. Accurately labeling training data for text classification requires either highly trained coders or multiple annotations, either of which can be costly. Previous studies have shown that there is a quality-quantity trade-off for this labeling process, and the optimal balance between quality and quantity varies depending on the annotation task. In this paper, we present a method that learns to choose between higher-quality annotation that results from dual annotation and higher-quantity annotation that results from the use of a single annotator per item. We demonstrate the effectiveness of this approach through an experiment in which a binary classifier is constructed for assigning human value categories to sentences in newspaper editorials.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The name comes from a colloquial reference to slot machines used by gamblers called “one-armed bandits.” In the imagined multi-armed bandit scenario, the gambler seeks to pull the arm that would yield the greatest profit.

  2. 2.

    http://chasen.org/~taku/software/TinySVM/

  3. 3.

    http://nlp.ist.i.kyoto-u.ac.jp/EN/index.php?JUMAN.

  4. 4.

    Note the difference in annotation strategy between constructing annotated data in [11] and applying the multi armed bandit (MAB)-problem method: in [11], coders assigned several labels to each sentence in one sitting; however for the MAB-problem method, one label is assigned or not for each sentence.

References

  1. Artstein, R., Poesio, M.: Inter-coder agreement for computational linguistics. Comput. Linguist. 34(4), 555–596 (2008)

    Article  Google Scholar 

  2. Auer, P., Cesa­Bianchi, N., Fischer, P.: Finite­time analysis of the multiarmed bandit problem. Mach. Learn. 47(2­3), 235–256 (2002)

    Google Scholar 

  3. Bennett, E.M., Alpert, R., Goldstein, A.C.: Communications through limited response questioning. Public Opin. Q. 18(3), 303–308 (1954)

    Article  Google Scholar 

  4. Cai, W., Zhang, Y., Zhou, J.: Maximizing expected model change for active learning in regression. In: Proceedings of the ICDM, pp. 51–60 (2013)

    Google Scholar 

  5. Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)

    Google Scholar 

  6. Cohen, J.: Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70(4), 213–220 (1968)

    Article  Google Scholar 

  7. Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks. In: Proceedings of the AAAI, pp. 746–751 (2005)

    Google Scholar 

  8. Fort, K., François, C., Galibert, O., Ghribi, M.: Analyzing the impact of prevalence on the evaluation of a manual annotation campaign. In: Proceedings of the LREC, pp. 1474–1480 (2012)

    Google Scholar 

  9. Garivier, A., Moulines, E.: On upper-confidence bound policies for switching bandit problems. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) Algorithmic Learning Theory. ALT 2011. LNCS, vol. 6925, pp. 174–188. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-24412-4_16

  10. Howe, J.: Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Crown Publishing Group, New York (2008)

    Google Scholar 

  11. Ishita, E., Fukuda, S., Oga, T., Tomiura, Y., Oard, D.W., Fleischmann, K.R.: Cost-effective learning for classifying human values. In: Proceedings of the iConference (2020)

    Google Scholar 

  12. Kuriyama, K., Kando, N., Nozue, T., Eguchi, K.: Pooling for a large-scale test collection: an analysis of the search results from the first NTCIR workshop. Inf. Retr. 5(1), 41–59 (2002)

    Article  Google Scholar 

  13. Nguyen, A.T., Wallace, B.C., Lease, M.: Combining crowd and expert labels using decision theoretic active learning. In: Proceedings of the HCOMP, pp. 120–129 (2015)

    Google Scholar 

  14. Raj, V. and Kalyani, S.: Taming nonstationary bandits: A bayesian approach. arXiv preprint arXiv:1707.09727 (2017)

  15. Scott, W.: Reliability of content analysis: the case of nominal scale coding. Public Opin. Q. 19, 321–325 (1955)

    Article  Google Scholar 

  16. Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4), 285–294 (1933)

    Article  Google Scholar 

  17. Voorhees, E.M., Harman, D.K.: TREC: Experiment and Evaluation in Information Retrieval. The MIT Press, Cambridge (2005)

    Google Scholar 

  18. Welinder, P., Branson, S., Belongie, S., Perona, P.: The multidimensional wisdom of crowds. In: Proceedings of the NIPS, pp. 2424–2432 (2010)

    Google Scholar 

  19. Zhang, Y., Cui, L., Huang, J., Miao, C.: CrowdMerge: achieving optimal crowdsourcing quality management by sequent merger. In: Proceedings of the ICCSE, pp. 1–8 (2018)

    Google Scholar 

Download references

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP18H03495.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Satoshi Fukuda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fukuda, S., Ishita, E., Tomiura, Y., Oard, D.W. (2021). Automating the Choice Between Single or Dual Annotation for Classifier Training. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91669-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91668-8

  • Online ISBN: 978-3-030-91669-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics