Detecting Meaningful Compounds in Complex Class Labels

  • Heiner Stuckenschmidt
  • Simone Paolo Ponzetto
  • Christian Meilicke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10024)


Real-world ontologies such as, for instance, those for the medical domain often represent highly specific, fine-grained concepts using complex labels that consist of a sequence of sublabels. In this paper, we investigate the problem of automatically detecting meaningful compounds in such complex class labels to support methods that require an automatic understanding of their meaning such as, for example, ontology matching, ontology learning and semantic search. We formulate compound identification as a supervised learning task and investigate a variety of heterogeneous features, including statistical (i.e., knowledge-lean) as well as knowledge-based, for the task at hand. Our classifiers are trained and evaluated using a manually annotated dataset consisting of about 300 complex labels taken from real-world ontologies, which we designed to provide a benchmarking gold standard for this task. Experimental results show that by using a combination of distributional and knowledge-based features we are able to reach an accuracy of more than 90 % for compounds of length one and almost 80 % for compounds of length two. Finally, we evaluate our method in an extrinsic experimental setting: this consists of a use case highlighting the benefits of using automatically identified compounds for the high-end semantic task of ontology matching.


Noun Phrase Head Noun Rift Valley Fever Concept Hierarchy Ontology Match 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  2. 2.
    Baldwin, T., Kim, S.N.: Multiword expressions. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton (2010)Google Scholar
  3. 3.
    Bergsma, S., Wang, Q.: Learning noun phrase query segmentation. In: Proceedings of EMNLP-CoNLL-07, pp. 819–826 (2007)Google Scholar
  4. 4.
    Brants, T., Franz, A.: Web 1T 5-gram version 1. LDC2006T13, Philadelphia, Penn.: Linguistic Data Consortium (2006)Google Scholar
  5. 5.
    Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)Google Scholar
  6. 6.
    d’Aquin, M., Motta, E., Sabou, M., Angeletou, S., Gridinoc, L., Lopez, V., Guidi, D.: Towards a new generation of semantic web applications. IEEE Intell. Syst. 23(3), 20–28 (2008)CrossRefGoogle Scholar
  7. 7.
    Doan, A., Halevy, A.: Semantic-integration research in the database community. AI Mag. 26(1), 83–94 (2005)Google Scholar
  8. 8.
    Enslen, E., Hill, E., Pollock, L., Vijay-Shanker, K.: Mining source code to automatically split identifiers for software analysis. In: Proceedings of MSR-09, pp. 71–80 (2009)Google Scholar
  9. 9.
    Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of IJCAI-11, pp. 3–10. AAAI Press (2011)Google Scholar
  10. 10.
    Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology alignment evaluation initiative: six years of experience. In: Spaccapietra, S. (ed.) Journal on Data Semantics XV. LNCS, vol. 6720, pp. 158–192. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Fernandez-Breis, J.T., Iannone, L., Palmisano, I., Rector, A.L., Stevens, R.: Enriching the gene ontology via the dissection of labels using the ontology pre-processor language. In: Cimiano, P., Pinto, H.S. (eds.) EKAW 2010. LNCS, vol. 6317, pp. 59–73. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)CrossRefGoogle Scholar
  13. 13.
    Giuliano, C., Gliozzo, A., Strapparava, C.: FBK-irst: lexical substitution task exploiting domain and syntagmatic coherence. In: Proceedings of SemEval-2007 (2007)Google Scholar
  14. 14.
    Hagen, M., Potthast, M., Stein, B., Bräutigam, C.: The power of naive query segmentation. In: Crestani, F., Marchand-Maillet, S., Chen, H.H., Efthimiadis, E., Savoy, J. (eds.) Proceedings of SIGIR-10, pp. 797–798 (2010)Google Scholar
  15. 15.
    Hagen, M., Potthast, M., Stein, B., Bräutigam, C.: Query segmentation revisited. In: Proceedings of WWW-11, pp. 97–106 (2011)Google Scholar
  16. 16.
    Hovy, E., Navigli, R., Ponzetto, S.P.: Collaboratively built semi-structured content and Artificial Intelligence: the story so far. Artif. Intell. 194, 2–27 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Kolb., P.: DISCO: a multilingual database of distributionally similar words. In: Proceedings of KONVENS-08 (2008)Google Scholar
  18. 18.
    Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL-01, pp. 1–8 (2001)Google Scholar
  19. 19.
    Kwiatkowski, T., Choi, E., Artzi, Y., Zettlemoyer, L.: Scaling semantic parsers with on-the-fly ontology matching. In: Proceedings of EMNLP-13, pp. 1545–1556 (2013)Google Scholar
  20. 20.
    Leopold, H., Smirnov, S., Mendling, J.: Recognising activity labeling styles in business process models. Enterp. Model. Inf. Syst. Architectures 6(1), 16–29 (2011)Google Scholar
  21. 21.
    Manaf, N.A.A., Bechhofer, S., Stevens, R.: A survey of identifiers and labels in OWL ontologies. In: Proceedings of OWLED 2010 (2010)Google Scholar
  22. 22.
    Mendling, J., Reijers, H., Recker, J.: Activity labeling in process modeling: empirical insights and recommendations. Inf. Syst. 35(4), 467–482 (2010)CrossRefGoogle Scholar
  23. 23.
    Mierswa, I.: Rapid miner. Künstliche Intelligenz 23(2), 5–11 (2009)Google Scholar
  24. 24.
    Nakov, P., Hearst, M.: Search engine statistics beyond the n-gram: application to noun compound bracketing. In: Proceedings of CoNLL-05, pp. 17–24 (2005)Google Scholar
  25. 25.
    Pease, A.: Ontology: A Practical Guide. Articulate Software Press, Angwin (2011)Google Scholar
  26. 26.
    Ponzetto, S.P., Strube, M.: Taxonomy induction based on a collaboratively built knowledge repository. Artif. Intell. 175, 1737–1756 (2011)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Quesada-Martínez, M., Fernández-Breis, J.T., Stevens, R.: Lexical characterization and analysis of the bioportal ontologies. In: Peek, N., Marín Morales, R., Peleg, M. (eds.) AIME 2013. LNCS, vol. 7885, pp. 206–215. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  28. 28.
    Radford, A.: Syntax: A Minimalist Introduction. Cambridge University Press, Cambridge (1997)CrossRefGoogle Scholar
  29. 29.
    Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)Google Scholar
  30. 30.
    Ritze, D., Meilicke, C., Šváb Zamazal, O., Stuckenschmidt, H.: A pattern-based ontology matching approach for detecting complex correspondences. In: Proceedings of OM-2009 (2009)Google Scholar
  31. 31.
    Ritze, D., Völker, J., Meilicke, C., Šváb Zamazal, O.: Linguistic analysis for complex ontology matching. In: Proceedings of OM-2010 (2010)Google Scholar
  32. 32.
    Sabou, M., d’Aquin, M., Motta, E.: Exploring the semantic web as background knowledge for ontology matching. J. Data Semant. 11, 156–190 (2000)Google Scholar
  33. 33.
    Sang, E., Buchholz, S.: Introduction to the CoNLL 2000 shared task: chunking. In: Proceedings of CoNLL-00, pp. 127–132 (2000)Google Scholar
  34. 34.
    Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–124 (1998)Google Scholar
  35. 35.
    Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL-03, pp. 134–141 (2003)Google Scholar
  36. 36.
    Shvaiko, P., Euzenat, J.: Ontology matching: : state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25, 158–176 (2012)CrossRefGoogle Scholar
  37. 37.
    Stuckenschmidt, H., Predoiu, L., Meilicke, C.: Learning complex ontology mappings - a challenge for ILP research. In: Proceedings of ILP-08 - Late Breaking Papers (2008)Google Scholar
  38. 38.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Vadas, D., Curran, J.R.: Parsing noun phrase structure with CCG. In: Proceedings of ACL-08: HLT, pp. 335–343 (2008)Google Scholar
  40. 40.
    Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of LREC-08 (2008)Google Scholar
  41. 41.
    Zhang, W., Liu, S., Yu, C., Sun, C., Liu, F., Meng, W.: Recognition and classification of noun phrases in queries for effective retrieval. In: Proceedings of CIKM-07, pp. 711–720 (2007)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Heiner Stuckenschmidt
    • 1
  • Simone Paolo Ponzetto
    • 1
  • Christian Meilicke
    • 1
  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany

Personalised recommendations