Skip to main content

Detecting Meaningful Compounds in Complex Class Labels

  • Conference paper
  • First Online:
Book cover Knowledge Engineering and Knowledge Management (EKAW 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Included in the following conference series:

Abstract

Real-world ontologies such as, for instance, those for the medical domain often represent highly specific, fine-grained concepts using complex labels that consist of a sequence of sublabels. In this paper, we investigate the problem of automatically detecting meaningful compounds in such complex class labels to support methods that require an automatic understanding of their meaning such as, for example, ontology matching, ontology learning and semantic search. We formulate compound identification as a supervised learning task and investigate a variety of heterogeneous features, including statistical (i.e., knowledge-lean) as well as knowledge-based, for the task at hand. Our classifiers are trained and evaluated using a manually annotated dataset consisting of about 300 complex labels taken from real-world ontologies, which we designed to provide a benchmarking gold standard for this task. Experimental results show that by using a combination of distributional and knowledge-based features we are able to reach an accuracy of more than 90 % for compounds of length one and almost 80 % for compounds of length two. Finally, we evaluate our method in an extrinsic experimental setting: this consists of a use case highlighting the benefits of using automatically identified compounds for the high-end semantic task of ontology matching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this work, we focus primarily on labels of length 3: however, our approach can be used in principle with labels of arbitrary lengths.

  2. 2.

    http://openie.cs.washington.edu/.

  3. 3.

    http://sourceforge.net/projects/jwordnet/.

  4. 4.

    SUMO is originally published in the SUMO-KIF format [25]. In our work we use the OWL version available at http://www.ontologyportal.org/.

  5. 5.

    The gold standard is freely available at https://madata.bib.uni-mannheim.de/57/.

  6. 6.

    The head of a phrase is the word which is grammatically most important in the phrase, since it determines the nature of the overall phrase [28]. For basic non-recursive noun phrases, this typically corresponds to the rightmost noun.

References

  1. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nature Genet. 25(1), 25–29 (2000)

    Article  Google Scholar 

  2. Baldwin, T., Kim, S.N.: Multiword expressions. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton (2010)

    Google Scholar 

  3. Bergsma, S., Wang, Q.: Learning noun phrase query segmentation. In: Proceedings of EMNLP-CoNLL-07, pp. 819–826 (2007)

    Google Scholar 

  4. Brants, T., Franz, A.: Web 1T 5-gram version 1. LDC2006T13, Philadelphia, Penn.: Linguistic Data Consortium (2006)

    Google Scholar 

  5. Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)

    Google Scholar 

  6. d’Aquin, M., Motta, E., Sabou, M., Angeletou, S., Gridinoc, L., Lopez, V., Guidi, D.: Towards a new generation of semantic web applications. IEEE Intell. Syst. 23(3), 20–28 (2008)

    Article  Google Scholar 

  7. Doan, A., Halevy, A.: Semantic-integration research in the database community. AI Mag. 26(1), 83–94 (2005)

    Google Scholar 

  8. Enslen, E., Hill, E., Pollock, L., Vijay-Shanker, K.: Mining source code to automatically split identifiers for software analysis. In: Proceedings of MSR-09, pp. 71–80 (2009)

    Google Scholar 

  9. Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: Proceedings of IJCAI-11, pp. 3–10. AAAI Press (2011)

    Google Scholar 

  10. Euzenat, J., Meilicke, C., Stuckenschmidt, H., Shvaiko, P., Trojahn, C.: Ontology alignment evaluation initiative: six years of experience. In: Spaccapietra, S. (ed.) Journal on Data Semantics XV. LNCS, vol. 6720, pp. 158–192. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  11. Fernandez-Breis, J.T., Iannone, L., Palmisano, I., Rector, A.L., Stevens, R.: Enriching the gene ontology via the dissection of labels using the ontology pre-processor language. In: Cimiano, P., Pinto, H.S. (eds.) EKAW 2010. LNCS, vol. 6317, pp. 59–73. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378 (1971)

    Article  Google Scholar 

  13. Giuliano, C., Gliozzo, A., Strapparava, C.: FBK-irst: lexical substitution task exploiting domain and syntagmatic coherence. In: Proceedings of SemEval-2007 (2007)

    Google Scholar 

  14. Hagen, M., Potthast, M., Stein, B., Bräutigam, C.: The power of naive query segmentation. In: Crestani, F., Marchand-Maillet, S., Chen, H.H., Efthimiadis, E., Savoy, J. (eds.) Proceedings of SIGIR-10, pp. 797–798 (2010)

    Google Scholar 

  15. Hagen, M., Potthast, M., Stein, B., Bräutigam, C.: Query segmentation revisited. In: Proceedings of WWW-11, pp. 97–106 (2011)

    Google Scholar 

  16. Hovy, E., Navigli, R., Ponzetto, S.P.: Collaboratively built semi-structured content and Artificial Intelligence: the story so far. Artif. Intell. 194, 2–27 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  17. Kolb., P.: DISCO: a multilingual database of distributionally similar words. In: Proceedings of KONVENS-08 (2008)

    Google Scholar 

  18. Kudoh, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL-01, pp. 1–8 (2001)

    Google Scholar 

  19. Kwiatkowski, T., Choi, E., Artzi, Y., Zettlemoyer, L.: Scaling semantic parsers with on-the-fly ontology matching. In: Proceedings of EMNLP-13, pp. 1545–1556 (2013)

    Google Scholar 

  20. Leopold, H., Smirnov, S., Mendling, J.: Recognising activity labeling styles in business process models. Enterp. Model. Inf. Syst. Architectures 6(1), 16–29 (2011)

    Google Scholar 

  21. Manaf, N.A.A., Bechhofer, S., Stevens, R.: A survey of identifiers and labels in OWL ontologies. In: Proceedings of OWLED 2010 (2010)

    Google Scholar 

  22. Mendling, J., Reijers, H., Recker, J.: Activity labeling in process modeling: empirical insights and recommendations. Inf. Syst. 35(4), 467–482 (2010)

    Article  Google Scholar 

  23. Mierswa, I.: Rapid miner. Künstliche Intelligenz 23(2), 5–11 (2009)

    Google Scholar 

  24. Nakov, P., Hearst, M.: Search engine statistics beyond the n-gram: application to noun compound bracketing. In: Proceedings of CoNLL-05, pp. 17–24 (2005)

    Google Scholar 

  25. Pease, A.: Ontology: A Practical Guide. Articulate Software Press, Angwin (2011)

    Google Scholar 

  26. Ponzetto, S.P., Strube, M.: Taxonomy induction based on a collaboratively built knowledge repository. Artif. Intell. 175, 1737–1756 (2011)

    Article  MathSciNet  Google Scholar 

  27. Quesada-Martínez, M., Fernández-Breis, J.T., Stevens, R.: Lexical characterization and analysis of the bioportal ontologies. In: Peek, N., Marín Morales, R., Peleg, M. (eds.) AIME 2013. LNCS, vol. 7885, pp. 206–215. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  28. Radford, A.: Syntax: A Minimalist Introduction. Cambridge University Press, Cambridge (1997)

    Book  Google Scholar 

  29. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 82–94 (1995)

    Google Scholar 

  30. Ritze, D., Meilicke, C., Šváb Zamazal, O., Stuckenschmidt, H.: A pattern-based ontology matching approach for detecting complex correspondences. In: Proceedings of OM-2009 (2009)

    Google Scholar 

  31. Ritze, D., Völker, J., Meilicke, C., Šváb Zamazal, O.: Linguistic analysis for complex ontology matching. In: Proceedings of OM-2010 (2010)

    Google Scholar 

  32. Sabou, M., d’Aquin, M., Motta, E.: Exploring the semantic web as background knowledge for ontology matching. J. Data Semant. 11, 156–190 (2000)

    Google Scholar 

  33. Sang, E., Buchholz, S.: Introduction to the CoNLL 2000 shared task: chunking. In: Proceedings of CoNLL-00, pp. 127–132 (2000)

    Google Scholar 

  34. Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–124 (1998)

    Google Scholar 

  35. Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL-03, pp. 134–141 (2003)

    Google Scholar 

  36. Shvaiko, P., Euzenat, J.: Ontology matching: : state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25, 158–176 (2012)

    Article  Google Scholar 

  37. Stuckenschmidt, H., Predoiu, L., Meilicke, C.: Learning complex ontology mappings - a challenge for ILP research. In: Proceedings of ILP-08 - Late Breaking Papers (2008)

    Google Scholar 

  38. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  39. Vadas, D., Curran, J.R.: Parsing noun phrase structure with CCG. In: Proceedings of ACL-08: HLT, pp. 335–343 (2008)

    Google Scholar 

  40. Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of LREC-08 (2008)

    Google Scholar 

  41. Zhang, W., Liu, S., Yu, C., Sun, C., Liu, F., Meng, W.: Recognition and classification of noun phrases in queries for effective retrieval. In: Proceedings of CIKM-07, pp. 711–720 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simone Paolo Ponzetto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Stuckenschmidt, H., Ponzetto, S.P., Meilicke, C. (2016). Detecting Meaningful Compounds in Complex Class Labels. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_40

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49004-5_40

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49003-8

  • Online ISBN: 978-3-319-49004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics