Advertisement

Connecting Data for Digital Libraries: The Library, the Dictionary and the Corpus

  • Maciej OgrodniczukEmail author
  • Włodzimierz Gruszczyński
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11853)

Abstract

The paper presents two experiments related to enhancing the content of a digital library with data from external repositories. The concept involves three related resources: a digital library of Middle Polish prints where items are stored in image form, the same items in textual form in a linguistically annotated corpus, and a dictionary of Middle Polish. The first experiment demonstrates how the results of automated OCR obtained with open source tools can be replaced with transcribed content from the corpus, enabling the user to search within individual prints. The second experiment links the print content with the electronic dictionary, filtering relevant entries with the dictionary of modern Polish to eliminate redundant results. Interconnecting all relevant resources in a digital library-centered platform creates new possibilities both for researchers involved in development of these resources as well as for scholars studying the Polish language of the 17th and 18th centuries.

Keywords

Digital library Linguistic corpus Electronic dictionary Middle Polish 

Notes

Acknowledgements

The authors would like to thank Grzegorz Kulesza for his diligent proofreading of this paper.

References

  1. 1.
  2. 2.
    Atkins, B.T.S., Rundell, M.: The Oxford Guide to Practical Lexicography. Oxford University Press, Oxford (2008)Google Scholar
  3. 3.
    Bień, J.S.: Efficient search in hidden text of large DjVu documents. In: Bernardi, R., Chambers, S., Gottfried, B., Segond, F., Zaihrayeu, I. (eds.) AT4DL/NLP4DL -2009. LNCS, vol. 6699, pp. 1–14. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23160-5_1CrossRefGoogle Scholar
  4. 4.
    Bilińska, J., Bronikowska, R., Gawłowicz, Z., Ogrodniczuk, M., Wieczorek, A., Żółtak, M.: Integration of the electronic dictionary of the 17th–18th century Polish and the electronic corpus of the 17th and 18th century Polish texts. Accepted for the Sixth Conference on Electronic Lexicography (eLex 2019) (2019)Google Scholar
  5. 5.
    Borgman, C.L.: What are digital libraries? Competing visions. Inf. Process. Manag. 35(3), 227–243 (1999).  https://doi.org/10.1016/S0306-4573(98)00059-4CrossRefGoogle Scholar
  6. 6.
    Bronikowska, R., Gruszczyński, W., Ogrodniczuk, M., Woliński, M.: The use of electronic historical dictionary data in corpus design. Stud. Polish Linguist. 11(2), 47–56 (2016).  https://doi.org/10.4467/23005920SPL.16.003.4818CrossRefGoogle Scholar
  7. 7.
    Gruszczyński, W. (ed.): Elektroniczny słownik języka polskiego XVII i XVIII w. (Electronic Dictionary of the 17th and the 18th century Polish, in Polish). Instytut Języka Polskiego PAN (2004). https://sxvii.pl/
  8. 8.
    Gruszczyński, W.: O przyszłości słownika języka polskiego XVII i 1. połowy XVIII wieku (On the future of the Polish dictionary of 17 and the first half of the 18th century, in Polish). Poradnik Językowy 7, 48–61 (2005)Google Scholar
  9. 9.
    Gruszczyński, W., Ogrodniczuk, M.: Cyfrowa Biblioteka Druków Ulotnych Polskich i Polski dotyczących z XVI, XVII i XVIII w. w nauce i dydaktyce (Digital library of Poland-related old ephemeral prints in research and teaching, in Polish). In: Materiały konferencji Polskie Biblioteki Cyfrowe 2010 (Proceedings of the Polish Digital Libraries 2010 Conference), Poznań, Poland, pp. 23–27 (2010)Google Scholar
  10. 10.
    Heliński, M., Kmieciak, M., Parkoła, T.: Report on the comparison of Tesseract and ABBYY FineReader OCR engines. Technical report, Poznań Supercomputing and Networking Center, Poznań (2012)Google Scholar
  11. 11.
    Joffe, D., MacLeod, M., de Schryver, G.M.: Software demonstration: the TshwaneLex electronic dictionary system. In: Elisenda Bernal, J.D. (ed.) Proceedings of the Thirteenth EURALEX International Congress, pp. 421–424. Institut Universitari de Linguistica Aplicada, Universitat Pompeu Fabra, Barcelona (2008)Google Scholar
  12. 12.
    Kilgariff, A.: Putting the corpus into the dictionary. In: Ooi, V.B., Pakir, A., Talib, I.S., Tan, P.K. (eds.) Perspectives in Lexicography: Asia and Beyond, pp. 239–247. K Dictionaries (2009)Google Scholar
  13. 13.
    Miłkowski, M.: Developing an open-source, rule-based proofreading tool. Softw. Pract. Exp. 40(7), 543–566 (2010).  https://doi.org/10.1002/spe.v40:7CrossRefGoogle Scholar
  14. 14.
    Ogrodniczuk, M., Gruszczyński, W.: Digital library of Poland-related old ephemeral prints: preserving multilingual cultural heritage. In: Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage, Hissar, Bulgaria, pp. 27–33 (2011). http://www.aclweb.org/anthology/W11-4105
  15. 15.
    Ogrodniczuk, M., Gruszczyński, W.: Digital library 2.0 – source of knowledge and research collaboration platform. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1649–1653. European Language Resources Association, Reykjavík (2014). http://www.lrec-conf.org/proceedings/lrec2014/pdf/14_Paper.pdf
  16. 16.
    Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego (National Corpus of Polish, in Polish). Wydawnictwo Naukowe PWN, Warsaw (2012)Google Scholar
  17. 17.
    Siekierska, K. (ed.): Słownik języka polskiego XVII i 1. płoowy XVIII w. (Dictionary of the 17th century and 1st half of the 18th century Polish, in Polish), vol. 1. Instytut Języka Polskiego PAN, Kraków (1999–2004)Google Scholar
  18. 18.
    Woliński, M., Kieraś, W.: The on-line version of Grammatical Dictionary of Polish. In: Calzolari, N., et al. (eds.) Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 2589–2594. European Language Resources Association, Portorož (2016). http://www.lrec-conf.org/proceedings/lrec2016/pdf/1157_Paper.pdf
  19. 19.
    Woliński, M., Miłkowski, M., Ogrodniczuk, M., Przepiórkowski, A., Szałkiewicz, Ł.: PoliMorf: a (not so) new open morphological dictionary for Polish. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pp. 860–864. European Language Resources Association, Istanbul (2012). http://www.lrec-conf.org/proceedings/lrec2012/pdf/263_Paper.pdf
  20. 20.
    Zawadzki, K.: Gazety ulotne polskie i Polski dotyczące z XVI, XVII i XVIII wieku (Polish and Poland-related ephemeral prints from the 16th–18th centuries, in Polish). National Ossoliński Institute, Polish Academy of Sciences, Wrocław (1990)Google Scholar
  21. 21.
    Zhang, X.: Knowledge service and digital library: a roadmap for the future. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 104–114. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30544-6_11CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Computer SciencePolish Academy of SciencesWarszawaPoland
  2. 2.Institute of Polish LanguagePolish Academy of SciencesKrakówPoland

Personalised recommendations