(German) Language Processing for Lucene
This paper introduces an open-source Java-package called German Language Processing for Lucene (glp4lucene). Although it was originally developed to work with German texts, it is to a large degree language independent. It aims at facilitating four language processing steps for working with non-English texts and Apache Lucene/Solr: lemmatizing words, weighting terms based on their part-of-speech, adding synonyms and decompounding nouns, without the necessity of a thorough understanding of natural language processing.
This package was created for and within the GeoBib project to facilitate searching the project’s data set and will be used in the planed website. GeoBib is funded by the German Federal Ministry of Education and Research (grant no. 01UG1238A-B).
- 2.Bohnet, B.: Very high accuracy and fast dependency parsing is not a contradiction. In: Proceedings of the 23rd International Conference on Computational Linguistics, COLING 2010, pp. 89–97. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
- 4.Hamp, B., Feldweg, H.: GermaNet - a lexical-semantic net for german. In: Proceedings of ACL Workshop Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, pp. 9–15 (1997)Google Scholar
- 6.Jespersen, O.: The Philosophy of Grammar. Chicago Studies in Ethnomusicology Series. University of Chicago Press, Chicago (1992)Google Scholar
- 7.Kraaij, W., Pohlmann, R.E.: Viewing stemming as recall enhancement. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 40–48 (1996)Google Scholar
- 10.Lioma, C., van Rijsbergen, C.K.: Part of speech based term weighting for information retrieval. In: Revue Franaise de Linguistique Applique, vol. 1 (2008)Google Scholar
- 12.McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, Second Edition: Covers Apache Lucene 3.0. Manning Publications Co., Greenwich (2010)Google Scholar
- 14.Seeker, W., Kuhn, J.: Making ellipses explicit in dependency conversion for a german treebank. In: LREC, pp. 3132–3139 (2012)Google Scholar
- 15.Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the NAACL on Human Language Technology, NAACL 2003, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003)Google Scholar