Skip to main content

Diagnostic Tools in plWordNet Development Process

  • Conference paper
  • First Online:
Human Language Technology. Challenges for Computer Science and Linguistics (LTC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9561))

Included in the following conference series:

Abstract

With the growing size of a wordnet, it is becoming more and more difficult to avoid, identify and eliminate errors in it, especially when a group of editors work in parallel. That is the case of plWordNet. Thus we need elaborated tools for both error prevention during editing, and diagnostic tools for error detection after the work was completed. In this paper, first, we present error prevention mechanisms built-in the plWordNet editor application and the system for group-working of a linguistic team. Next, we discuss diagnostic tests and diagnostic tools dedicated to plWordNet – the Polish wordnet. plWordNet has been in steady development for almost ten years and has reached the size of 193 k synsets and 255 k lexical meanings. We propose a typology of the diagnostic levels: describe formal, structural and semantic rules for seeking errors within plWordNet, as well as, a new method of automated induction of the diagnostic rules. Finally, we discuss results and benefits of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    According to semanticists lexical relations form a continuum [5, p. 143].

  2. 2.

    The vast majority of dictionaries is significantly different than wordnets, so the comparison is difficult.

  3. 3.

    In the plWordNet model two LUs are synonymous if they share all constitutive relations to other LUs, for details, please, look at [10].

  4. 4.

    The application gives special position to the synonymy relation.

  5. 5.

    Linguists sometimes create several LUs in advance and later forget to add them to synsets.

  6. 6.

    It is called unofficially ‘plWordNet Big Brother’.

  7. 7.

    plWordNet domains follows those of Princeton WordNet that originated from the names of the lexicographer files.

  8. 8.

    There are four guidelines created for the need of the four Parts of Speech covered by plWordNet and several more written for specific tasks: register label applying, multi-word LU recognition, differentiating gerunds from other deverbal nouns, describing adjectives derived from proper nouns etc.

  9. 9.

    Suggestions are not obligatory for the editors and who can choose a different place for the LUS of the given lemma.

  10. 10.

    http://www.redmine.org/.

  11. 11.

    In WordnetLoom 2.0 this problem has been eliminated on the level of editing.

  12. 12.

    For instance, There may also occur instances of relations, where at least one of its sides was deleted without proper removal of the relation.

  13. 13.

    In one case, erroneous modifications in the relation definitions made by a human had the same effect.

  14. 14.

    There are three documents describing the lexico-semantic systems available on the site [2]: for nouns (31 pages), for verbs (66 pages) and for adjectives (32 pages).

  15. 15.

    Here will be a link to a full description of the rules.

  16. 16.

    Glosses appeared in plWordNetsince the version 2.2, but they became numerous in the version 2.3, but still they are intended to be more comments for the users than a tool for defining the LU semantics. In a lexico-semantic network it are relations that should be the primary defining means. Constitutive relations are frequent and shared among groups of LUs, cf. [10].

  17. 17.

    The cross-categorial synonymy was introduced into plWordNet after EuroWordNet [9].

  18. 18.

    http://www.cs.waikato.ac.nz/ml/weka/index.html.

  19. 19.

    http://sjp.pwn.pl/.

  20. 20.

    https://pl.wiktionary.org/.

References

  1. Słownik Języka Polskiego. Wydawnictwo Naukowe PWN (2007)

    Google Scholar 

  2. The site of Wroclaw University of Technology Language Technology Group G4.19 (2013). http://www.nlp.pwr.wroc.pl

  3. Broda, B., Maziarz, M., Piasecki, M.: Tools for plWordNet development. Presentation and perspectives. In: Calzolari, N., Choukri, K., Declerck, T., Dovgan, M., Maegaard, B., Mariani, J., JanOdijk, Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resourcesand Evaluation (LREC 2012), pp. 3647–3652. European Language Resources Association (ELRA), Istanbul, Turkey, May 2012

    Google Scholar 

  4. Broda, B., Piasecki, M.: Evaluating LexCSD in a large scale experiment. Cont. Cybern. 40(2), 419–436 (2011)

    MATH  Google Scholar 

  5. Cruse, A.: Meaning in Language. An Introduction to Semantics and Pragmatics. Oxford University Press, Oxford (2004)

    Google Scholar 

  6. Huang, C.R., Calzolari, N., Gangemi, A., Oltramari, A., Prévot, L. (eds.): Ontology and the Lexicon. A Natural Languge Processing Perspective. Studies in Natural Languge Processing. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  7. Kubis, M.: A query language for WordNet-like lexical databases. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ACIIDS 2012, Part III. LNCS, vol. 7198, pp. 436–445. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  8. Lohk, A., Vare, K., Võhandu, L.: Visual study of Estonian wordnet using bipartite graphs and minimal crossing algorithm. In: Proceedings of 6th Global Wordnet Conference, Matsue, Japan, January 2012

    Google Scholar 

  9. Maziarz, M., Piasecki, M., Rabiega-Wisniewska, J., Szpakowicz, S.: Semantic relations among nouns in Polish wordnet grounded in lexicographic and semantic tradition. Cogn. Stud. 11, 161–181 (2011). http://www.eecs.uottawa.ca/~szpak/pub/Maziarz_et_al_CS2011a.pdf

  10. Maziarz, M., Piasecki, M., Szpakowicz, S.: The chicken-and-egg problem in WordNet design: synonymy, synsets and constitutive relations. Lang. Resour. Eval. 47(3), 769–796 (2013)

    Article  Google Scholar 

  11. Maziarz, M., Piasecki, M., Szpakowicz, S., Rabiega-Wiśniewska, J., Hojka, B.: Semantic relations between verbs in Polish WordNet 2.0. Cogn. Stud. 11, 183–200 (2011)

    Google Scholar 

  12. Maziarz, M., Szpakowicz, S., Piasecki, M.: Semantic relations among adjectives in Polish WordNet 2.0: a new relation set, discussion and evaluation. Cogn. Stud. 12, 149–179 (2012)

    Google Scholar 

  13. Miłkowski, M.: Open thesaurus - polski thesaurus (2007). http://www.synomix.pl/

  14. Piasecki, M., Marcińczuk, M., Ramocki, R., Maziarz, M.: WordNetLoom: a WordNet development system integrating form-based and graph-based perspectives. Int. J. Data Min. Model. Manage. 5(3), 210–232 (2013)

    Google Scholar 

  15. Piasecki, M., Szpakowicz, S., Broda, B.: A WordNet from the Ground Up. University of Technology Press, Wrocław (2009)

    Google Scholar 

  16. Rizov, B.: Hydra: a modal logic tool for wordnet development, validation and exploration. In: Calzolari, N., et al. (eds.) Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008). European Language Resources Association (ELRA), Marrakech, Morocco, May 2008

    Google Scholar 

  17. SJP.PL, Z.: Słownik języka polskiego [A dictionary of the Polish language] (2015). http://sjp.pl/

  18. Smrž, P.: Quality control and checking for wordnet development: a case study of balkanet. Rom. J. Inf. Sci. Technol. 2004(1), 173–182 (2004)

    Google Scholar 

  19. Witten, I., Frank, E., Hall, M.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers, San Francisco (2011)

    Google Scholar 

Download references

Acknowledgments

Work financed by the Polish Ministry of Science and Higher Education, a program in support of scientific units involved in the development of a European research infrastructure for the humanities and social sciences in the scope of the consortia CLARIN ERIC and ESS-ERIC, 2015–2016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Piasecki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Piasecki, M., Burdka, Ł., Maziarz, M., Kaliński, M. (2016). Diagnostic Tools in plWordNet Development Process. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43808-5_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43807-8

  • Online ISBN: 978-3-319-43808-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics