A linked open data architecture for the historical archives of the Getulio Vargas Foundation

  • Alexandre Rademaker
  • Dário Augusto Borges Oliveira
  • Valeria de Paiva
  • Suemi Higuchi
  • Asla Medeiros e Sá
  • Moacyr Alvim


This paper presents an architecture for historical archives maintenance based on Open Linked Data technologies and open source distributed development model and tools. The proposed architecture is being implemented for the archives of the Centro de Pesquisa e Documentação de História Contemporânea do Brasil (Center for Research and Documentation of Brazilian Contemporary History) of the Fundação Getulio Vargas (Getulio Vargas Foundation). We discuss the benefits of this initiative and suggest ways of implementing it, as well as describing the preliminary milestones already achieved. We also present some of the possibilities for extending the accessibility and usefulness of the data archives information using semantic web technologies, natural language processing, image analysis tools, and audio–textual alignment, both in progress and planned.


Historical archives Digital humanities Semantic Web NLP Image processing Audio processing Open data 


  1. 1.
    Abreu, A.A., Lattman-Weltman, F., de Paula, C.J.: Dicionário Histórico–Biográfico Brasileiro pós-1930, 3 edn. CPDOC/FGV, Rio de Janeiro (2010)Google Scholar
  2. 2.
    Ben-Kiki, O., Evans, C., dot Net, I.: Yaml: Yaml ain’t markup language.
  3. 3.
    Bergman, M.K.: White paper: the deep web: surfacing hidden value. J. Electron. Publ. 7(1) (2001).;rgn=main
  4. 4.
    Berners-Lee, T.: Relational databases on the semantic web. Tech. rep., W3C (1998).
  5. 5.
    Bizer, C., Cyganiak, R.: D2R server-publishing relational databases on the semantic web. In: 5th International Semantic Web Conference, p. 26 (2006).
  6. 6.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia—a crystallization point for the web of data. Web Semant. 7(3), 154–165 (2009). doi: 10.1016/j.websem.2009.07.002 CrossRefGoogle Scholar
  7. 7.
    Bond, F., Paik, K.: A survey of wordnets and their licenses. In: Proceedings of the 6th Global WordNet Conference (GWC 2012), pp. 64–71. Matsue (2012).
  8. 8.
    Brickley, D., Miller, L.: FOAF vocabulary specification (2010).
  9. 9.
    Cafezeiro, I., Haeusler, E.H., Rademaker, A.: Ontology and context. In: IEEE International Conference on Pervasive Computing and Communications. IEEE Computer Society, Los Alamitos (2008). doi: 10.1109/PERCOM.2008.21
  10. 10.
    Clark, K.G., Feigenbaum, L., Torres, E.: SPARQL protocol for RDF. Tech. rep., W3C (2008)Google Scholar
  11. 11.
    Coelho, L.M.R., Rademaker, A., de Paiva, V., de Melo, G.: Embedding NomLex-BR nominalizations into OpenWordnet-PT In: Orav, H., Fellbaum, C., Vossen, P. (eds.) Proceedings of the 7th global WordNet conference, Tartu, Estonia, pp. 378–382. (2014)
  12. 12.
    Crofts, N., Doerr, M., Gill, T., Stead, S., Stiff, M.: Definition of the CIDOC conceptual reference model. Tech. Rep. 5.0.4, CIDOC CRM Special Interest Group (SIG) (2011).
  13. 13.
    da Cultura, M.: Registro aberto da cultura (r.a.c): manual do usuário (2013).
  14. 14.
    Cyganiak, R., Bizer, C., Garbers, J., Maresch, O., Becker, C.: The D2RQ mapping language.
  15. 15.
    Davis, I., Galbraith, D.: BIO: a vocabulary for biographical information (2011).
  16. 16.
    Deborah L., McGuinness, F.v.H. (ed.): OWL 2 Web Ontology Language Document Overview, 2 edn. W3C Recommendation. World Wide Web Consortium (2012)Google Scholar
  17. 17.
    de Paiva, V., Rademaker, A., de Melo, G.: Openwordnet-pt: an open brazilian wordnet for reasoning. In: Proceedings of the 24th International Conference on Computational Linguistics (2012).
  18. 18.
    de Paiva, V., Oliveira, D.A.B., Higuchi, S., Rademaker, A., de Melo, G.: Exploratory information extraction from a historical dictionary. In: Proceedings of IEEE 10th International Conference on e-Science (e-Science), Sao Paulo, 20-24 Oct 2014, vol. 2, pp. 11–18 (2014)Google Scholar
  19. 19.
    Federal, G.: Governo federal dados abertos (2013).
  20. 20.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  21. 21.
    Fokkens, A., ter Braake, S., Ockeloen, N., Vossen, P., Legêne, S., Schreiber, G.: Biographynet: methodological issues when nlp supports historical research. In: Proceedings of the 9th Edition of the Language Resources and Evaluation Conference (LREC). Reykjavik, Iceland (2014)Google Scholar
  22. 22.
    Friesen, N., Hill, H.J., Wegener, D., Doerr, M., Stalmann, K.: Semantic-based retrieval of cultural heritage multimedia objects. Int. J. Semantic Comput. 06(03), 315–327 (2012). doi: 10.1142/S1793351X12400107.
  23. 23.
    Gil, Y., Miles, S.: PROV model primer. Tech. rep., W3C (2013).
  24. 24.
    Gruber, J.: Markdown language.
  25. 25.
    Haslhofer, B., Isaac, A.:—the europeana linked open data pilot. In: DCMI International Conference on Dublin Core and Metadata Applications. The Hague, The Netherlands (2011).
  26. 26.
    Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., Junnila, M., Kettula, S.: Finnish museums on the semantic web. J. Web Semant. 3, 25 (2005)CrossRefGoogle Scholar
  27. 27.
    Initiative, D.C.: Dublin core metadata element set (2012).
  28. 28.
    Initiative, O.D.: Open data initiative (2013).
  29. 29.
    Isaac, A., Summers, E.: SKOS simple knowledge organization system prime. Tech. Rep., W3C (2009).
  30. 30.
    Lagoze, C., de Sompel, H.V., Nelson, M., Warner, S.: The open archives initiative protocol for metadata harvesting (2008).
  31. 31.
    LexML: Rede de informação informativa e jurídica (2013).
  32. 32.
    Library of Congress: The library of congress’ photostream on flickr.
  33. 33.
    Macleod, C., Grishman, R., Meyers, A., Barret, L., Reeves, R.: Nomlex: A lexicon of nominalizations. In: Proceedings of Euralex 1998, pp. 187–193. Liege, Belgium (1998)Google Scholar
  34. 34.
    Manola, F., Miller, E. (eds.): RDF Primer. W3C Recommendation. World Wide Web Consortium (2004).
  35. 35.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  36. 36.
    Neto, N., Patrick, C., Klautau, A., Trancoso, I.: Free tools and resources for Brazilian Portuguese speech recognition. J. Braz. Comput. Soc. 17, 53–68 (2011)CrossRefGoogle Scholar
  37. 37.
    Niles, I., Pease, A.: Towards a standard upper ontology. In: Proceedings of the International Conference on Formal Ontology in Information Systems, vol. 2001, pp. 2–9. ACM, New York (2001)Google Scholar
  38. 38.
    Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pp. 23–25. European Language Resources Association (ELRA), Istanbul, Turkey (2012)Google Scholar
  39. 39.
    Purday, J.: Think culture: from concept to construction. Electron. Libr. 27, 919–937 (2009)CrossRefGoogle Scholar
  40. 40.
    Rademaker, A., Higuchi, S., Oliveira, D.A.B.: A linked open data architecture for contemporary historical archives. In: Predoiu, L., Mitschick, A., Nurnberger, A., Risse, T., Ross, S. (eds.) Proceedings of 3rd Edition of the Semantic Digital Archives Workshop. Valetta, Malta (2013). Workshop website at Proceedings at
  41. 41.
    Raggett, D., Hors, A.L., Jacobs, I.: Html 4.01 specification. Tech. Rep. REC-html401-19991224, W3C (1999).
  42. 42.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: 16th International World Wide Web Conference (WWW 2007). ACM Press, New York (2007)Google Scholar
  43. 43.
    Szekely, P., Knoblock, C., Yang, F., Zhu, X., Fink, E., Allen, R., Goodlander, G.: Connecting the smithsonian american art museum to the linked data cloud. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) The Semantic Web: Semantics and Big Data, Lecture Notes in Computer Science, vol. 7882, pp. 593–607. Springer, Berlin (2013). doi: 10.1007/978-3-642-38288-8_40
  44. 44.
    Vasconcelos, C.N., Sa, A.M., Carvalho, P.C., Sa, M.I.: Structuring and embedding image captions: the v.i.f. multi-modal system. In: VAST: International Symposium on Virtual Reality. Archaeology and Intelligent Cultural Heritage, pp. 25–32. Eurographics Association, Brighton (2012)Google Scholar
  45. 45.
    Vatant, B., Wick, M.: Geonames Ontology (2012).
  46. 46.
    Wick, M., Vatant, B.: Geonames Ontology (2011).
  47. 47.
    Young, S.J., Evermann, G., Gales, M., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Engineering Department, Cambridge (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Alexandre Rademaker
    • 1
  • Dário Augusto Borges Oliveira
    • 2
  • Valeria de Paiva
    • 3
  • Suemi Higuchi
    • 2
  • Asla Medeiros e Sá
    • 4
  • Moacyr Alvim
    • 4
  1. 1.IBM Research and FGV/EMApRio de JaneiroBrazil
  2. 2.FGV/CPDOCRio de JaneiroBrazil
  3. 3.Nuance CommunicationsSunnyvaleUSA
  4. 4.FGV/EMApRio de JaneiroBrazil

Personalised recommendations