Self-adjusting Bootstrapping

Fujiwara, Shoji; Sekine, Satoshi

doi:10.1007/978-3-642-19437-5_15

Shoji Fujiwara¹⁷ &
Satoshi Sekine¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6609))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1270 Accesses

Abstract

Bootstrapping has been used as a very efficient method to extract a group of items similar to a given set of seeds. However, the bootstrapping method intrinsically has several parameters whose optimal values differ from task to task, and from target to target. In this paper, first, we will demonstrate that this is really the case and serious problem. Then, we propose self-adjusting bootstrapping, where the original seed is segmented into the real seed and validation data. We initially bootstrap starting with the real seed, trying alternative parameter settings, and use the validation data to identify the optimal settings. This is done repeatedly with alternative segmentations in typical cross-validation fashion. Then the final bootstrapping is performed using the best parameter setting and the entire original seed set in order to create the final output. We conducted experiments to collect sets of company names in different categories. Self-adjusting bootstrapping substantially outperformed a baseline using a uniform parameter setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agichtein, E., Gravano, L.: Snowball: Extracting Relations from Large Plain-Text Collections. In: Proc. 5th ACM International Conference on Digital Libraries (ACM DL) (2000)
Google Scholar
Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: Proc. Conference of Extending Database Technology, Workshop on the Web and Databases (1998)
Google Scholar
Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: Proc. of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Google Scholar
Goldberg, A.B., Zhu, X.: Keepin’ it real: Semi-supervised learning with realistic tuning. In: NAACL 2009, Workshop on Semi-supervised Learning for NLP (2009)
Google Scholar
Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
MATH Google Scholar
Pantel, P.: Of search and Semantics. In: NSF Symposium on Semantic Knowledge Discovery, Organization and Use (2008)
Google Scholar
Ravichandran, D., Hovy, E.: Learning surface text patterns for a question answering system. In: Proceedings of ACL 2002, Philadelphia, PA, pp. 41–47 (2002)
Google Scholar
Sun, A.: A Two-Stage Bootstrapping Algorithm for Relation Extraction. In: Proceedings of Recent Advances in Natural Language Processing 2009, Borovets, Bulgaria (2009)
Google Scholar
Strzalkowski, T., Wang, J.: A Self-Learning Universal Concept Spotter. In: COLING 1996 (1996)
Google Scholar
Yangarber, R., Grishman, R., Tapanainen, P., Huttunen, S.: Automatic Acquisition of Domain Knowledge for Information Extraction. In: COLING 2000 (2000)
Google Scholar
Paşca, M.: Organizing and Searching the World Wide Web of Fact — Step Two: Harnessing the Wisdom of the Crowds. In: Proceedings of the 16th International World Wide Web Conference (WWW 2007), pp. 101–110 (2007)
Google Scholar
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 113–120 (2006)
Google Scholar
Sekine, S., Suzuki, H.: Acquiring Ontological Knowledge from Query Logs. In: Proceedings of the 16th International World Wide Web Conference (WWW 2007), pp. 101–110 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Nikkei Digital Media, Inc., Japan
Shoji Fujiwara
Computer Science Department, New York University, USA
Satoshi Sekine

Authors

Shoji Fujiwara
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Sekine
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fujiwara, S., Sekine, S. (2011). Self-adjusting Bootstrapping. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-19437-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics