AATOS – A Configurable Tool for Automatic Annotation

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10318)


This paper presents an automatic annotation tool AATOS for providing documents with semantic annotations. The tool links entities found from the texts to ontologies defined by the user. The application is highly configurable and can be used with different natural language Finnish texts. The application was developed as a part of the WarSampo ( and Semantic Finlex ( projects and tested using Kansa Taisteli magazine articles and consolidated Finnish legislation of Semantic Finlex. The quality of the automatic annotation was evaluated by measuring precision and recall against existing manual annotations. The results showed that the quality of the input text, as well as the selection and configuration of the ontologies impacted the results.


Automatic Annotation Name Entity Recognition SPARQL Query Proper Noun Link Open Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Our work was funded by the Ministry of Education and Culture and Finnish Cultural Foundation and Ministry of Justice. The Association for Military History in Finland and Bonnier Publications provided the project with resources and published the Kansa Taisteli magazine articles for public usage. Kasper Apajalahti originally converted the metadata into an RDF format. Timo Hakala provided the manual annotations for the Kansa Taisteli magazine articles.


  1. 1.
    Anderson, J.D.: Guidelines for Indexes and Related Information Retrieval Devices. NISO Press, Bethesda (1997)Google Scholar
  2. 2.
    Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. EACL 6, 9–16 (2006)Google Scholar
  3. 3.
    Chung, Y.M., Pottenger, W.M., Schatz, B.R.: Automatic subject indexing using an associative neural network. In: Proceedings of the Third ACM Conference on Digital Libraries, pp. 59–68. ACM (1998)Google Scholar
  4. 4.
    Committee on Cataloging: Task force on metadata. Final report. Technical report, June 2000.
  5. 5.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)Google Scholar
  6. 6.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)Google Scholar
  7. 7.
    Frosterus, M., Tuominen, J., Hyvönen, E.: Facilitating re-use of legal data in applications - finnish law as a linked open data service. In: Proceedings of the 27th International Conference on Legal Knowledge and Information Systems (JURIX 2014), pp. 115–124. IOS Press, December 2014Google Scholar
  8. 8.
    Grishman, R., Sundheim, B.: Message understanding conference-6: a brief history. In: Coling, vol. 96, pp. 466–471 (1996)Google Scholar
  9. 9.
    Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with Wikipedia. Artif. Intell. 194, 130–150 (2013). MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011 pp. 782–792. Association for Computational Linguistics, Stroudsburg (2011).
  11. 11.
    Hu, Y., Janowicz, K., Prasad, S.: Improving Wikipedia-based place name disambiguation in short texts using structured data from DBpedia. In: Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014, pp. 8:1–8:8. ACM, New York (2014).
  12. 12.
    Hyvönen, E., Heino, E., Leskinen, P., Ikkala, E., Koho, M., Tamper, M., Tuominen, J., Mäkelä, E.: WarSampo data service and semantic portal for publishing linked open data about the second world war history. In: Sack, H., Blomqvist, E., d’Aquin, M., Ghidini, C., Ponzetto, S.P., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9678, pp. 758–773. Springer, Cham (2016). doi: 10.1007/978-3-319-34129-3_46 CrossRefGoogle Scholar
  13. 13.
    Hyvönen, E., Tuominen, J., Alonen, M., Mäkelä, E.: Linked data Finland: a 7-star model and platform for publishing and re-using linked datasets. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 226–230. Springer, Cham (2014). doi: 10.1007/978-3-319-11955-7_24 Google Scholar
  14. 14.
    Kettunen, K., Kunttu, T., Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? J. Doc. 61(4), 476–496 (2005)CrossRefGoogle Scholar
  15. 15.
    Lauser, B., Hotho, A.: Automatic multi-label subject indexing in a multilingual environment. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 140–151. Springer, Heidelberg (2003). doi: 10.1007/978-3-540-45175-4_14 CrossRefGoogle Scholar
  16. 16.
    Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)Google Scholar
  17. 17.
    Mikkonen, P., Paikkala, S.: Sukunimet. Otavan kirjapaino Oy (2000)Google Scholar
  18. 18.
    Mäkelä, E.: Combining a REST lexical analysis web service with SPARQL for mashup semantic annotation from text. In: Presutti, V., Blomqvist, E., Troncy, R., Sack, H., Papadakis, I., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8798, pp. 424–428. Springer, Cham (2014). doi: 10.1007/978-3-319-11955-7_60 Google Scholar
  19. 19.
    Mäkelä, E., Lindquist, T., Hyvönen, E.: CORE - a contextual reader based on linked data. In: Proceedings of Digital Humanities 2016, Long Papers, pp. 267–269, July 2016.
  20. 20.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  21. 21.
    Overell, S., Rüger, S.: Using co-occurrence models for placename disambiguation. Int. J. Geogr. Inf. Sci. 22(3), 265–287 (2008)CrossRefGoogle Scholar
  22. 22.
    SFS 5471: Guidelines for the establisment and maintenance of Finnish language thesauri. SFS standard. Finnish Standards Association (1988)Google Scholar
  23. 23.
    Sinkkilä, R., Suominen, O., Hyvönen, E.: Automatic semantic subject indexing of web documents in highly inflected languages. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6643, pp. 215–229. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21034-1_15 CrossRefGoogle Scholar
  24. 24.
    The Association for Military History in Finland: Kansa taisteli magazines 1957–1986 (2014).
  25. 25.
    Wentland, W., Knopp, J., Silberer, C., Hartung, M.: Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA), Marrakech, May 2008.
  26. 26.
    Yimam, S.M., Biemann, C., Eckart de Castilho, R., Gurevych, I.: Automatic annotation suggestions and custom annotation layers in WebAnno. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 91–96. Association for Computational Linguistics, Baltimore, June 2014.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Semantic Computing Research Group (SeCo)Aalto UniversityEspooFinland
  2. 2.HELDIG – Helsinki Centre for Digital HumanitiesUniversity of HelsinkiHelsinkiFinland

Personalised recommendations