Skip to main content

Concept Set Expansion

  • Chapter
  • First Online:
Automated Taxonomy Discovery and Exploration

Part of the book series: Synthesis Lectures on Data Mining and Knowledge Discovery ((SLDMKD))

  • 114 Accesses

Abstract

The first step toward automated taxonomy discovery is to identify important concepts. In this chapter, we discuss how to obtain high-quality concepts via concept set expansion techniques. Specifically, we first layout general approaches toward the concept set expansion task and then present an iterative expansion framework with thorough experiment analysis. After that, we discuss how to extend this expansion framework by exploiting automatically discovered negative sets and incorporating signals from pre-trained language model. Finally, we conclude this chapter with interesting future research directions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 59.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Results of SEISA on PubMed-CVD are omitted due to the scalability issue.

References

  1. Balasubramanyan, R., Dalvi, B.B., Cohen, W.W.: From topic models to semi-supervised learning: Biasing mixed-membership models to exploit topic-indicative features in entity clustering. In: Proceedings of 2013 Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2013)

    Google Scholar 

  2. Chen, Z., Cafarella, M., Jagadish, H.: Long-tail vocabulary dictionary extraction from the web. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining (2016)

    Google Scholar 

  3. Chierichetti, F., Kumar, R., Pandey, S., Vassilvitskii, S.: Finding the jaccard median. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (2010)

    Google Scholar 

  4. Curran, J.R., Murphy, T., Scholz, B.: Minimising semantic drift with mutual exclusion bootstrapping. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics (2007)

    Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2019)

    Google Scholar 

  6. Ghahramani, Z., Heller, K.A.: Bayesian sets. In: Proceedings of the 19th Conference on Neural Information Processing Systems (2005)

    Google Scholar 

  7. Gupta, S., MacLean, D.L., Heer, J., Manning, C.D.: Research and applications: induced lexico-syntactic patterns improve information extraction from online medical forums. J Amer Med Inform Assoc (2014)

    Google Scholar 

  8. Gupta, S., Manning, C.D.: Improved pattern learning for bootstrapped entity extraction. In: Proceedings of the 18th Conference on Computational Natural Language Learning (2014)

    Google Scholar 

  9. Gupta, S., Manning, C.D.: Distributed representations of words to guide bootstrapped entity classifiers. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015)

    Google Scholar 

  10. He, Y., Xin, D.: SEISA: set expansion by iterative similarity aggregation. In: Proceedings of the 20th International Conference on World Wide Web (2011)

    Google Scholar 

  11. Huang, J., Xie, Y., Meng, Y., Shen, J., Zhang, Y., Han, J.: Guiding corpus-based set expansion by auxiliary sets generation and co-expansion. In: Proceedings of the 2020 Web Conference (2020)

    Google Scholar 

  12. Jindal, P., Roth, D.: Learning from negative examples in set-expansion. In: Proceedings of IEEE 11th International Conference on Data Mining (2011)

    Google Scholar 

  13. Lin, D., Wu, X.: Phrase clustering for discriminative learning. In: Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (2009)

    Google Scholar 

  14. Lin, W., Yangarber, R., Grishman, R.: Bootstrapped learning of semantic classes from positive and negative examples. In: Proceedings of ICML-2003 Workshop on The Continuum from Labeled to Unlabeled Data (2003)

    Google Scholar 

  15. Ling, X., Weld, D.S.: Fine-grained entity recognition. In: Proceedings of the 2012 AAAI Conference on Artificial Intelligence (2012)

    Google Scholar 

  16. Liu, J., Shang, J., Wang, C., Ren, X., Han, J.: Mining quality phrases from massive text corpora. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (2015)

    Google Scholar 

  17. Mamou, J., Pereg, O., Wasserblat, M., Eirew, A., Green, Y., Guskin, S., Izsak, P., Korat, D.: Term set expansion based NLP Architect by Intel AI Lab. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018)

    Google Scholar 

  18. McIntosh, T., Curran, J.R.: Weighted mutual exclusion bootstrapping for domain independent lexicon and template acquisition. In: Proceedings of the Australasian Language Technology Association Workshop 2008 (2008)

    Google Scholar 

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Conference on Neural Information Processing Systems (2013)

    Google Scholar 

  20. Pantel, P., Crestan, E., Borkovsky, A., Popescu, A.M., Vyas, V.: Web-scale distributional similarity and entity set expansion. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (2009)

    Google Scholar 

  21. Ren, X., El-Kishky, A., Wang, C., Tao, F., Voss, C.R., Han, J.: ClusType: effective entity recognition and typing by relation phrase-based clustering. In: Proceedings of the 24th International Conference on World Wide Web (2015)

    Google Scholar 

  22. Ren, X., Lv, Y., Wang, K., Han, J.: Comparative document analysis for large text corpora. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining (2017)

    Google Scholar 

  23. Riloff, E.: Automatically generating extraction patterns from untagged text. In: Proceedings of the 1996 AAAI Conference on Artificial Intelligence (1996)

    Google Scholar 

  24. Rong, X., Chen, Z., Mei, Q., Adar, E.: Egoset: exploiting word ego-networks and user-generated ontology for multifaceted set expansion. In: Proceedings of the 9th ACM International Conference on Web Search and Data Mining (2016)

    Google Scholar 

  25. Shen, J., Wu, Z., Lei, D., Shang, J., Ren, X., Han, J.: SetExpan: corpus-based set expansion via context feature selection and rank ensemble. In: Proceedings of the 2017 Joint European Conference on Machine Learning and Knowledge Discovery in Databases (2017)

    Google Scholar 

  26. Shi, B., Zhang, Z., Sun, L., Han, X.: A probabilistic co-bootstrapping method for entity set expansion. In: Proceedings of the 25th International Conference on Computational Linguistics (2014)

    Google Scholar 

  27. Shi, S., Zhang, H., Yuan, X., Wen, J.R.: Corpus-based semantic class mining: distributional vs. pattern-based approaches. In: Proceedings of the 23rd International Conference on Computational Linguistics (2010)

    Google Scholar 

  28. Talukdar, P.P., Reisinger, J., Pasca, M., Ravichandran, D., Bhagat, R., Pereira, F.: Weakly-supervised acquisition of labeled class instances using graph random walks. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (2008)

    Google Scholar 

  29. Tang, J., Qu, M., Mei, Q.: PTE: predictive text embedding through large-scale heterogeneous text networks. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015)

    Google Scholar 

  30. Thelen, M., Riloff, E.: A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (2002)

    Google Scholar 

  31. Tong, S., Dean, J.: System and methods for automatically creating lists (2008). US Patent 7,350,187

    Google Scholar 

  32. Velardi, P., Faralli, S., Navigli, R.: Ontolearn reloaded: a graph-based algorithm for taxonomy induction. In: Computational Linguistics (2013)

    Google Scholar 

  33. Wang, C., Chakrabarti, K., He, Y., Ganjam, K., Chen, Z., Bernstein, P.A.: Concept expansion using web tables. In: Proceedings of the 24th International Conference on World Wide Web (2015)

    Google Scholar 

  34. Wang, R.C., Cohen, W.W.: Language-independent set expansion of named entities using the web. In: Proceedings of the 7th IEEE International Conference on Data Mining (2007)

    Google Scholar 

  35. Wang, Y.Y., Hoffmann, R., Li, X., Szymanski, J.: Semi-supervised learning of semantic classes for query understanding: from the web and for the web. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management (2009)

    Google Scholar 

  36. Yan, L., Han, X., Sun, L., He, B.: Learning to bootstrap for entity set expansion. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (2019)

    Google Scholar 

  37. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (2019)

    Google Scholar 

  38. Yu, P., Huang, Z., Rahimi, R., Allan, J.D.: Corpus-based set expansion with lexical features and distributed representations. In: Proceedings of the 42nd International ACM SIGIR Conference on Research & Development in Information Retrieval (2019)

    Google Scholar 

  39. Zhang, Y., Shen, J., Shang, J., Han, J.: Empower entity set expansion via language model probing. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaming Shen .

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shen, J., Han, J. (2022). Concept Set Expansion. In: Automated Taxonomy Discovery and Exploration. Synthesis Lectures on Data Mining and Knowledge Discovery. Springer, Cham. https://doi.org/10.1007/978-3-031-11405-2_2

Download citation

Publish with us

Policies and ethics