Abstract
In many documents, like receipts or invoices, textual information is constrained by the space and organization of the document. The document information has no natural language context, and expressions are often abbreviated to respect the graphical layout, both at word level and phrase level. In order to analyze the semantic content of these types of document, we need to understand each phrase, and particularly each name of sold products. In this paper, we propose an approach to find the right expansion of abbreviations and acronyms, without context. First, we extract information about sold products from our receipts corpus and we analyze the different linguistic processes of abbreviation. Then, we retrieve a list of expanded names of products sold by the company that emitted receipts, and we propose an algorithm to pair extracted names of products with the corresponding expansions. We provide the research community with a unique document collection for abbreviation expansion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Medical Literature Analysis and Retrieval System Online, https://www.nlm.nih.gov/pubs/factsheets/medline.html.
- 2.
Fraud Detection Contest: Find it!: http://findit.univ-lr.fr.
References
Riegel, M., Pellat, J.C., Rioul, R.: Grammaire méthodique du français. Presses universitaires de France (2016)
Grevisse, M., Lits, M.: Le petit Grevisse: Grammaire française. Grevisse Langue Française, De Boeck Secondaire (2009)
Martinet, A.: Éléments de Linguistique Générale ... Collection Armand Colin, no. 340. Section de litérature. Librairie Armand Colin (1967)
Yeates, S.: Automatic extraction of acronyms from text, pp. 117–124. University of Waikato, (1999)
Yu, M., Li, G., Deng, D., Feng, J.: String similarity search and join: a survey, vol. 10(3), pp. 399–417 (2016)
Larkey, L.S., Ogilvie, P., Price, M.A., Tamilio, B.: Acrophile: An automated acronym extractor and server. In: Proceedings of the ACM Fifth International Conference on Digital Libraries, DL 2000, Dallas TX, pp. 205–214. ACM Press (2000)
Nadeau, D., Turney, P.D.: A supervised learning approach to acronym identification. In: Kégl, B., Lapalme, G. (eds.) AI 2005. LNCS (LNAI), vol. 3501, pp. 319–329. Springer, Heidelberg (2005). https://doi.org/10.1007/11424918_34
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in medline. Bioinformatics 21(18), 3658–3664 (2005)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics doklady. 10, 707–710 (1966)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Artaud, C., Doucet, A., Poulain D’Andecy, V., Ogier, JM. (2023). Automatic Matching and Expansion of Abbreviated Phrases Without Context. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2018. Lecture Notes in Computer Science, vol 13396. Springer, Cham. https://doi.org/10.1007/978-3-031-23793-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-23793-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23792-8
Online ISBN: 978-3-031-23793-5
eBook Packages: Computer ScienceComputer Science (R0)