Abstract
In this paper, we introduce an open-source tool, YEDDA, supported by a pre-annotation module based deep learning. EPAD proposes a novel annotation workflow, combining pre-annotation and manual annotation, which improves the efficiency and quality of annotation. The pre-annotation module can effectively reduce the annotation time, and meanwhile improve the precision and recall of annotation. EPAD also contains some of the mechanisms to facilitate the usage of the pre-annotation module. As a collaborative design, EPAD provides administrators with annotation statistics and analysis functions. Experiments showed that EPAD shortened almost 60.0\(\%\) of the total annotation time, and improved 12.7\(\%\) of F-measure for annotation quality.
Supported by Sichuan Science and Technology Program (No. 2017SZYZF0002).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
On average, there are 24 sentences per document, 100 characters per sentence and 5 entities per sentence.
- 5.
References
Marcińczuk, M., Oleksy, M., Kocoń, J.: Inforex-a collaborative system for text corpora annotation and analysis. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP. INCOMA Shoumen, pp. 473–482 (2017)
Yang, J., Zhang, Y., Li, L., Li, X.: YEDDA: a lightweight collaborative text span annotation tool, arXiv preprint arXiv:1711.03759 (2017)
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107. Association for Computational Linguistics (2012)
Yu, X., Lam, W., Chan, S.-K., Wu, Y.K., Chen, B.: Chinese NER using CRFs and logic for the fourth SIGHAN bakeoff. In: Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing (2008)
Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, vol. 1 (1996)
Chen, W.-T., Styler, W.: Anafora: a web-based general purpose annotation tool. In: Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting, vol. 2013, p. 14. NIH Public Access (2013)
Bontcheva, K., et al.: Gate teamware: a web-based, collaborative text annotation framework. Lang. Res. Eval. 47(4), 1007–1029 (2013)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: an architecture for development of robust HLT applications. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 168–175. Association for Computational Linguistics (2002)
Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)
Morton, T., LaCivita, J.: WordFreak: an open tool for linguistic annotation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations-Volume 4, pp. 17–18. Association for Computational Linguistics (2003)
Ogren, P.V.: Knowtator: a protégé plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume: Demonstrations, pp. 273–275. Association for Computational Linguistics (2006)
Noy, N.F., et al.: Protégé-2000: an open-source ontology-development and knowledge-acquisition environment. In: AMIA... Annual Symposium Proceedings. AMIA Symposium, vol. 2003, p. 953. American Medical Informatics Association (2003)
Alonso, H.M., Johannsen, A., Plank, B.: Supersense tagging with inter-annotator disagreement. In: Linguistic Annotation Workshop 2016, pp. 43–48 (2016)
Saito, K., Nagata, M.: Multi-language named-entity recognition system based on HMM. In: Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition-Volume 15, pp. 41–48. Association for Computational Linguistics (2003)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Teng, F., Ma, M., Ma, Z., Huang, L., Xiao, M., Li, X. (2019). A Text Annotation Tool with Pre-annotation Based on Deep Learning. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11775. Springer, Cham. https://doi.org/10.1007/978-3-030-29551-6_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-29551-6_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29550-9
Online ISBN: 978-3-030-29551-6
eBook Packages: Computer ScienceComputer Science (R0)