Abstract
This paper proposes a multifaceted approach to the design of an algorithm for the automatic recognition of chemical compounds in Croatian written as multiword expressions. The algorithm, which we have named the Croatian Chemical Compounds Module, consists of three layers: it uses (1) the NooJ dictionary as the basis for (2) a morphological grammar, and both (1) and (2) are used for (3) a syntactic grammar. This module supports not only single-unit words and homoatomic entities but also variations of chemical names recognized through a variety of suffixes, multiplicative prefixes, hyphens, Roman and Latin numerals, Greek letters, and round, square and curly brackets. Terminological diversity and inconsistency in writing style are discussed as they present a great problem for any such endeavor.
Notes
- 1.
Generally, suffixation is the most productive derivational process in Croatian. Babić [22] provides a list of 771 suffixes used for the derivation of all parts of speech (526 suffixes for nouns, 160 suffixes for adjectives, 61 suffixes for verbs and 24 suffixes for adverbs). On the other hand, there are only 77 prefixes used in the derivation of all major parts of speech.
References
Silberztein, M.: Formalizing Natural Languages: The NooJ Approach. Wiley-ISTE, London (2016)
Kocijan, K., Kurolt, S., Mijić, L.: Building croatian medical dictionary from medical corpus. Rasprave Instituta za hrvatski jezik i jezikoslovlje 46(2), 765–782 (2020)
Kocijan, K., Šojat, K., Kurolt, S.: Multiword expressions in the medical domain: who carries the domain-specific meaning. In: Bekavac, B., Kocijan, K., Silberztein, M., Šojat, K. (eds.) Formalising Natural Languages: Applications to Natural Language Processing and Digital Humanities. 14th International Conference, NooJ 2020, Zagreb, Croatia, June 5–7, 2020, Revised Selected Papers, pp. 49–60. Springer, Cham (2021)
Kocijan, K., Šojat, K.: Formalizing the Recognition of Medical Domain Multiword Units. In: Dash, S., Parida, S., Tello, E., Acharya, B., Bojar, O. (eds.) Natural Language Processing in Healthcare: A Special Focus on Low Resource Languages, pp. 89–120. CRC Press, Boca Raton (2022)
Portada, T., Stilinović, V.: Što treba znati o hrvatskoj kemijskoj nomenklaturi? Kem. Ind. 56(4), 209–215 (2007)
Ball, D.W.: Elemental Etymology: What’s in a Name? J. Chem. Educ. 62, 787–788 (1985)
Ringnes, V.: Origin of the names of chemical elements. J. Chem. Educ. 66, 731–738 (1989)
Raos, N., Portada, T., Stilinović, V.: Anionic names of acids: an experiment in chemical nomenclature. Bull. Hist. Chem. 38, 61–66 (2013)
Dijskstra, A.J., Hellwich, K.-H., Hartshorn, R.M., Reedijk, J., Szabo, E.: End-of-Line Hyphenation of Chemical Names (IUPAC Recommendations 2020). Pure Appl. Chem. 93(1), 47–68 (2021)
Raos, N.: Kako definirati organsku kemiju? Kem. Ind. 71(7–8), 507–512 (2022)
Gotkova, T., Chepurnykh, N.: Public perception and usage of the term carbon: linguistic analysis in an environmental social media corpus. Psychol. Lang. Commun. 26(1), 297–312 (2022)
Giomini, C., Cardinali, M.E., Cardellini, L.: Simples and compounds: a proposal. Chem. Int. 27(1), 18 (2005)
Portada, T., Stilinović, V.: Simples and compounds: another opinion. Chem. Int. 27(5), 20 (2005)
Portada, T., Stilinović, V.: Prijedlog pridjevske funkcijsko-razredne nomenklature. Kem. Ind. 58(10), 461–464 (2009)
Portada, T.: Kako na hrvatskom jeziku reći entacapone? Kem. Ind. 61(3), 177–178 (2012)
Ingrosso, F., Polguère, A.: How terms meet in small-world lexical networks: the case of chemistry terminology. In: Poibeau, T., Faber, P. (eds.) Proceedings of the 11th International Conference on Terminology and Artificial Intelligence, pp. 167–171. Granada (2015)
Simeon, V.: Proslov hrvatskomu izdanju. In: Međunarodna unija za čistu i primijenjenu kemiju, Hrvatska nomenklatura anorganske kemije, preporuke HKD 1995, pp. IX–XVI. Školska knjiga, Zagreb (1996)
Strohal, D.: Prijedlog za izmjenu kemijskog nazivlja kiselina. Kemijski vjestnik 15(16), 126 (1941/1942)
Stojanov, T., Lewis, K., Portada, T.: Rad na Struni na primjeru hrvatskoga kemijskog nazivlja. In: Ledinek, N., Žagar Karer, M., Humar, M. (eds.) Terminologija in sodobna terminografija, pp. 181–194 (2009)
Grdinić, V.: Farmaceutski naslovi u Hrvatskoj farmakopeji. Farm. Glas. 63(1), 37–55 (2007)
Lowe, D.M., Corbett, P.T., Murray-Rust, P., Glen, R.C.: Chemical name to structure: OPSIN, an open source solution. J. Chem. Inf. Model. 51(3), 739–753 (2011)
Babić, S.: Tvorba riječi u hrvatskome književnome jeziku. HAZU i Nakladni zavod Globus, Zagreb (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kocijan, K., Šojat, K., Portada, T. (2024). Deciphering the Nomenclature of Chemical Compounds in NooJ. In: Bartulović, A., Mijić, L., Silberztein, M. (eds) Formalizing Natural Languages: Applications to Natural Language Processing and Digital Humanities. NooJ 2023. Communications in Computer and Information Science, vol 1816. Springer, Cham. https://doi.org/10.1007/978-3-031-56646-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-56646-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56645-5
Online ISBN: 978-3-031-56646-2
eBook Packages: Computer ScienceComputer Science (R0)