Text Mining pp 41-62 | Cite as

Simple, Fast and Accurate Taxonomy Learning

Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

Although many algorithms have been developed to extract lexical resources, few organize the mined terms into taxonomies. We propose (1) a semi-supervised algorithm that uses a root term, a seed example and lexico-syntactic patterns to learn automatically from the Web hyponyms and hypernyms subordinated to the root; (2) a Web based concept positioning test to validate the learned terms and is-a relations; (3) a graph algorithm that induces from scratch the taxonomy structure of all terms and (4) a pattern-based procedure for enriching the learned taxonomies with verb-based relations. We conduct an exhaustive empirical evaluations on four different domains and show that our algorithm quickly and accurately acquires and taxonomies the knowledge. We conduct comparative studies against WordNet and existing knowledge repositories and show that our algorithm finds many additional terms and relations missing from these resources. We conduct an evaluation against other taxonomization algorithms and show how our algorithm can further enrich the taxonomies with verb-based relations.

References

  1. 1.
    Agirre E, Lopez de Lacalle O (2004) Publicly available topic signatures for all WordNet nominal senses. In: Proceedings of the 4rd international conference on Languages Resources and Evaluations (LREC), LisbonGoogle Scholar
  2. 2.
    Amsler RA (1981) A taxonomy for English nouns and verbs. In: Proceedings of the 19th annual meeting on association for computational linguistics, Morristown, NJ. Association for Computational Linguistics, pp 133–138Google Scholar
  3. 3.
    Banko M, Cafarella, MJ, Soderl, S, Broadhead M, Etzionio O (2007) Open information extraction from the web. In: Proceedings of IJCAI, pp 2670–2676Google Scholar
  4. 4.
    Cuadros M, Rigau G (2008) KnowNet: building a large net of knowledge from the web. In: The 22nd international conference on computational linguistics (Coling’08), ManchesterGoogle Scholar
  5. 5.
    Davidov D, Rappoport A (2006) Efficient unsupervised discovery of word categories using symmetric patterns and high frequency words. In: Proceedings of the 21st international conference on Computational Linguistics COLING and the 44th annual meeting of the ACL, pp 297–304Google Scholar
  6. 6.
    Etzioni O, Cafarella, M, Downey D, Popescu AM, Shaked T, Soderland S, Weld DS, Yates A (2005) Unsupervised named-entity extraction from the web: an experimental study. Artif Intell 165(1):91–134CrossRefGoogle Scholar
  7. 7.
    Fader A, Soderland S, Etzioni O (2011) Identifying relations for open information extraction. In: Proceedings of the 2011 conference on empirical methods in natural language processing, pp 1535–1545Google Scholar
  8. 8.
    Fleiss J (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382CrossRefGoogle Scholar
  9. 9.
    Girju R, Badulescu A, Moldovan D (2003) Learning semantic constraints for the automatic discovery of part-whole relations. In: Proceedings of the conference of the North American chapter of the Association for Computational Linguistics on Human Language Technology (NAACL-HLT), pp 1–8Google Scholar
  10. 10.
    Glickman O, Dagan I, Koppel M (2005) A probabilistic classification approach for lexical textual entailment. In: Proceedings of the twentieth national conference on artificial intelligence and the seventeenth innovative applications of artificial intelligence conference, pp 1050–1055Google Scholar
  11. 11.
    Hearst M (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on computational linguistics, pp 539–545Google Scholar
  12. 12.
    Heyer G, Läuter M, Quasthoff U, Wittig Th, Wolff Chr (2001) Learning relations using collocations. In: Maedche A, Staab S, Nedellec C, Hovy E (eds) Proceedings of the IJCAI workshop on ontology learning, Seattle/WAGoogle Scholar
  13. 13.
    Hovy EH (1998) Combining and standardizing large-scale, practical ontologies for machine translation and other uses. In: Proceedings of the LREC conferenceGoogle Scholar
  14. 14.
    Hovy EH, Kozareva Z, Riloff E (2009) Toward completeness in concept extraction and classification. In: Proceedings of the 2009 conference on empirical methods in natural language processing (EMNLP), pp 948–957Google Scholar
  15. 15.
    Ide N, Veronis J (1994) Machine readable dictionaries: what have we learned, where do we go. In: Proceedings of the post-COLING 94 international workshop on directions of lexical research, Beijing, pp 137–146Google Scholar
  16. 16.
    Joaquim S, Kozareva Z, Noncheva V, Lopes G (2004) Proceedings of TALN, pp 19–21Google Scholar
  17. 17.
    Katz B, Lin J (2003) Selectively using relations to improve precision in question answering. In: Proceedings of the EACL-2003 workshop on natural language processing for question answering, pp 43–50Google Scholar
  18. 18.
    Kozareva Z (2012) Learning verbs on the fly. In: Proceedings of the 24th international conference on computational linguistics (COLING 2012)Google Scholar
  19. 19.
    Kozareva Z, Hovy EH (2010) A semi-supervised method to learn and construct taxonomies using the web. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 1110–1118Google Scholar
  20. 20.
    Kozareva Z, Hovy EH (2010) Not all seeds are equal: measuring the quality of text mining seeds. In: Proceedings of the human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, pp 618–626Google Scholar
  21. 21.
    Kozareva Z, Hovy EH (2010) Learning arguments and supertypes of semantic relations using recursive patterns. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 1482–1491Google Scholar
  22. 22.
    Kozareva Z, Riloff E, Hovy EH (2008) Semantic class learning from the web with hyponym pattern linkage graphs. In: Proceedings of the NAACL-HLT conference, pp 1048–1056Google Scholar
  23. 23.
    Kozareva Z, Hovy EH, Riloff E (2009). Learning and evaluating the content and structure of a term taxonomy. In: Proceedings of AAAI spring symposium: learning by reading and learning to read, pp 50–57Google Scholar
  24. 24.
    Kozareva Z, Voevodski K, Teng S-H (2011) Class label enhancement via related instances. In: Proceedings of the conference on empirical methods in natural language processing, pp 118–128Google Scholar
  25. 25.
    Lin D (1998) Automatic retrieval and clustering of similar words. In: Proceedings of the 17th international conference on computational linguistics (COLING), pp 768–774Google Scholar
  26. 26.
    Lin D, Pantel P (2002) Concept discovery from text. In: Proceedings of the 19th international conference on computational linguistics (COLING), pp 1–7Google Scholar
  27. 27.
    Litkowski K, Hargraves O (2007) SemEval-2007 Task 06: Word-sense disambiguation of prepositions. In: Proceedings of the fourth international workshop on semantic evaluations, pp 24–29Google Scholar
  28. 28.
    Liu H, Singh P (2004) Focusing on ConceptNet’s natural language knowledge representation. In: Commonsense reasoning in and over natural language proceedings of the 8th international conference on knowledge-based intelligent information and engineering systems (KES 2004), pp 71–84Google Scholar
  29. 29.
    Moldovan DI, Harabagiu SM, Pasca M, Mihalcea R, Goodrum R, Girju R, Rus V (1999) Lasso: a tool for surfing the answer net. In: Proceedings of the TREC conferenceGoogle Scholar
  30. 30.
    Nakov P, Kozareva Z (2011) Combining relational and attributional similarity for semantic relation classification. In: Proceedings of recent advances in natural language processing, pp 323–330Google Scholar
  31. 31.
    Navigli R, Velardi P, Cucchiarelli A, Neri F, Cucchiarelli R (2004) Extending and enriching WordNet with OntoLearn. In: Proceedings of the second Global WordNet conference 2004 (GWC 2004), pp 279–284Google Scholar
  32. 32.
    Pantel P, Pennacchiotti M (2006) Espresso: leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of 21st international conference on computational linguistics (COLING) and 44th annual meeting of the Association for Computational Linguistics (ACL)Google Scholar
  33. 33.
    Pasca M (2004) Acquisition of categorized named entities for web search. In: Proceedings of the thirteenth ACM international conference on Information and knowledge management (CIKM), pp 137–145Google Scholar
  34. 34.
    Pennacchiotti M, Pantel P (2006) Ontologizing semantic relations. In: Proceedings of the international conference on Computational Linguistics (COLING) and the annual meeting of the Association for Computational Linguistics (ACL), pp 793–800Google Scholar
  35. 35.
    Ponzetto S, Navigli R (2010) Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics (ACL 2010), UppsalaGoogle Scholar
  36. 36.
    Richardson SD, Dolan WB, Vanderwende L (1998) Mindnet: acquiring and structuring semantic information from text. In: Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics (ACL ’98), vol 2. Association for Computational Linguistics, Stroudsburg, PA, pp 1098–1102Google Scholar
  37. 37.
    Rigau G, Rodriguez H, Agirre E (1998) Building accurate semantic taxonomies from monolingual MRDs. In: Proceedings of the 36th annual meeting of the Association for Computational Linguistics and 17th international conference on computational linguistics (ACL ’98), vol 2. Association for Computational Linguistics, Stroudsburg, PA, pp 1103–1109Google Scholar
  38. 38.
    Riloff E, Shepherd J (1997) A corpus-based approach for building semantic lexicons. In: Proceedings of the second conference on empirical methods in natural language processing (EMNLP), pp 117–124Google Scholar
  39. 39.
    Ritter A, Soderland S, Etzioni O (2009) What is this, anyway: automatic hypernym discovery. In: Proceedings of the AAAI spring symposium on learning by reading and learning to readGoogle Scholar
  40. 40.
    Ritter A, Mausam, Etzioni O (2010) A latent Dirichlet allocation method for selectional preferences. In: Proceedings of the Association for Computational Linguistics conference (ACL)Google Scholar
  41. 41.
    Roberto N, Velardi P, Faralli S (2011) A graph-based algorithm for inducing lexical taxonomies from scratch. In: Proceedings of IJCAI 2011, pp 1872–1877Google Scholar
  42. 42.
    Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the international conference on new methods in language processing, pp 44–49Google Scholar
  43. 43.
    Snow R, Jurafsky D, Ng AY (2006). Semantic taxonomy induction from heterogenous evidence. In: Proceedings of the international conference on computational linguistics (COLING) and the annual meeting of the Association for Computational Linguistics (ACL)Google Scholar
  44. 44.
    Suchanek FM, Kasneci G, Weikum G (2007). Yago: a core of semantic knowledge. In: Proceedings of the 16th international conference on World Wide Web (WWW), pp 697–706Google Scholar
  45. 45.
    Szpektor I, Dagan I, Bar-Haim R, Goldberger J (2008) Contextual preferences. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL), pp 683–691Google Scholar
  46. 46.
    Widdows D (2003) Unsupervised methods for developing taxonomies by combining syntactic and statistical information. In: Proceedings of the HLT-NAACL conferenceGoogle Scholar
  47. 47.
    Wilks Y, Fass D, ming Guo C, Mcdonald JE, Plate T, Slator BM (1988) Machine tractable dictionaries as tools and resources for natural language processing. In: Proceedings of the 12th conference on computational linguistics, Morristown, NJ. Association for Computational Linguistics, pp 750–755Google Scholar
  48. 48.
    Wu F, Weld D (2010) Open information extraction using Wikipedia. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, pp 118–127Google Scholar
  49. 49.
    Yang H, Callan J (2009) A metric-based framework for automatic taxonomy induction. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP (ACL-IJCNLP), vol 1, pp 271–279Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Computer Science DepartmentMarina del ReyUSA

Personalised recommendations