Abstract
Real-world ontologies such as, for instance, those for the medical domain often represent highly specific, fine-grained concepts using complex labels that consist of a sequence of sublabels. In this paper, we investigate the problem of automatically detecting meaningful compounds in such complex class labels to support methods that require an automatic understanding of their meaning such as, for example, ontology matching, ontology learning and semantic search. We formulate compound identification as a supervised learning task and investigate a variety of heterogeneous features, including statistical (i.e., knowledge-lean) as well as knowledge-based, for the task at hand. Our classifiers are trained and evaluated using a manually annotated dataset consisting of about 300 complex labels taken from real-world ontologies, which we designed to provide a benchmarking gold standard for this task. Experimental results show that by using a combination of distributional and knowledge-based features we are able to reach an accuracy of more than 90 % for compounds of length one and almost 80 % for compounds of length two. Finally, we evaluate our method in an extrinsic experimental setting: this consists of a use case highlighting the benefits of using automatically identified compounds for the high-end semantic task of ontology matching.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this work, we focus primarily on labels of length 3: however, our approach can be used in principle with labels of arbitrary lengths.
- 2.
- 3.
- 4.
SUMO is originally published in the SUMO-KIF format [25]. In our work we use the OWL version available at http://www.ontologyportal.org/.
- 5.
The gold standard is freely available at https://madata.bib.uni-mannheim.de/57/.
- 6.
The head of a phrase is the word which is grammatically most important in the phrase, since it determines the nature of the overall phrase [28]. For basic non-recursive noun phrases, this typically corresponds to the rightmost noun.
References
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature Genet. 25(1), 25–29 (2000)
Baldwin, T., Kim, S.N.: Multiword expressions. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton (2010)
Bergsma, S., Wang, Q.: Learning noun phrase query segmentation. In: Proceedings of EMNLP-CoNLL-07, pp. 819–826 (2007)
Brants, T., Franz, A.: Web 1T 5-gram version 1. LDC2006T13, Philadelphia, Penn.: Linguistic Data Consortium (2006)
Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)
d’Aquin, M., Motta, E., Sabou, M., Angeletou, S., Gridinoc, L., Lopez, V., Guidi, D.: Towards a new generation of semantic web applications. IEEE Intell. Syst. 23(3), 20–28 (2008)
Doan, A., Halevy, A.: Semantic-integration research in the database community. AI Mag. 26(1), 83–94 (2005)
Enslen, E., Hill, E., Pollock, L., Vijay-Shanker, K.: Mining source code to automatically split identifiers for software analysis. In: Proceedings of MSR-09, pp. 71–80 (2009)
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of IJCAI-11, pp. 3–10. AAAI Press (2011)
Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology alignment evaluation initiative: six years of experience. In: Spaccapietra, S. (ed.) Journal on Data Semantics XV. LNCS, vol. 6720, pp. 158–192. Springer, Heidelberg (2011)
Fernandez-Breis, J.T., Iannone, L., Palmisano, I., Rector, A.L., Stevens, R.: Enriching the gene ontology via the dissection of labels using the ontology pre-processor language. In: Cimiano, P., Pinto, H.S. (eds.) EKAW 2010. LNCS, vol. 6317, pp. 59–73. Springer, Heidelberg (2010)
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)
Giuliano, C., Gliozzo, A., Strapparava, C.: FBK-irst: lexical substitution task exploiting domain and syntagmatic coherence. In: Proceedings of SemEval-2007 (2007)
Hagen, M., Potthast, M., Stein, B., Bräutigam, C.: The power of naive query segmentation. In: Crestani, F., Marchand-Maillet, S., Chen, H.H., Efthimiadis, E., Savoy, J. (eds.) Proceedings of SIGIR-10, pp. 797–798 (2010)
Hagen, M., Potthast, M., Stein, B., Bräutigam, C.: Query segmentation revisited. In: Proceedings of WWW-11, pp. 97–106 (2011)
Hovy, E., Navigli, R., Ponzetto, S.P.: Collaboratively built semi-structured content and Artificial Intelligence: the story so far. Artif. Intell. 194, 2–27 (2013)
Kolb., P.: DISCO: a multilingual database of distributionally similar words. In: Proceedings of KONVENS-08 (2008)
Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL-01, pp. 1–8 (2001)
Kwiatkowski, T., Choi, E., Artzi, Y., Zettlemoyer, L.: Scaling semantic parsers with on-the-fly ontology matching. In: Proceedings of EMNLP-13, pp. 1545–1556 (2013)
Leopold, H., Smirnov, S., Mendling, J.: Recognising activity labeling styles in business process models. Enterp. Model. Inf. Syst. Architectures 6(1), 16–29 (2011)
Manaf, N.A.A., Bechhofer, S., Stevens, R.: A survey of identifiers and labels in OWL ontologies. In: Proceedings of OWLED 2010 (2010)
Mendling, J., Reijers, H., Recker, J.: Activity labeling in process modeling: empirical insights and recommendations. Inf. Syst. 35(4), 467–482 (2010)
Mierswa, I.: Rapid miner. Künstliche Intelligenz 23(2), 5–11 (2009)
Nakov, P., Hearst, M.: Search engine statistics beyond the n-gram: application to noun compound bracketing. In: Proceedings of CoNLL-05, pp. 17–24 (2005)
Pease, A.: Ontology: A Practical Guide. Articulate Software Press, Angwin (2011)
Ponzetto, S.P., Strube, M.: Taxonomy induction based on a collaboratively built knowledge repository. Artif. Intell. 175, 1737–1756 (2011)
Quesada-Martínez, M., Fernández-Breis, J.T., Stevens, R.: Lexical characterization and analysis of the bioportal ontologies. In: Peek, N., Marín Morales, R., Peleg, M. (eds.) AIME 2013. LNCS, vol. 7885, pp. 206–215. Springer, Heidelberg (2013)
Radford, A.: Syntax: A Minimalist Introduction. Cambridge University Press, Cambridge (1997)
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)
Ritze, D., Meilicke, C., Šváb Zamazal, O., Stuckenschmidt, H.: A pattern-based ontology matching approach for detecting complex correspondences. In: Proceedings of OM-2009 (2009)
Ritze, D., Völker, J., Meilicke, C., Šváb Zamazal, O.: Linguistic analysis for complex ontology matching. In: Proceedings of OM-2010 (2010)
Sabou, M., d’Aquin, M., Motta, E.: Exploring the semantic web as background knowledge for ontology matching. J. Data Semant. 11, 156–190 (2000)
Sang, E., Buchholz, S.: Introduction to the CoNLL 2000 shared task: chunking. In: Proceedings of CoNLL-00, pp. 127–132 (2000)
Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–124 (1998)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL-03, pp. 134–141 (2003)
Shvaiko, P., Euzenat, J.: Ontology matching: : state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25, 158–176 (2012)
Stuckenschmidt, H., Predoiu, L., Meilicke, C.: Learning complex ontology mappings - a challenge for ILP research. In: Proceedings of ILP-08 - Late Breaking Papers (2008)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Vadas, D., Curran, J.R.: Parsing noun phrase structure with CCG. In: Proceedings of ACL-08: HLT, pp. 335–343 (2008)
Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of LREC-08 (2008)
Zhang, W., Liu, S., Yu, C., Sun, C., Liu, F., Meng, W.: Recognition and classification of noun phrases in queries for effective retrieval. In: Proceedings of CIKM-07, pp. 711–720 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Stuckenschmidt, H., Ponzetto, S.P., Meilicke, C. (2016). Detecting Meaningful Compounds in Complex Class Labels. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-49004-5_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49003-8
Online ISBN: 978-3-319-49004-5
eBook Packages: Computer ScienceComputer Science (R0)