Abstract
Extracting knowledge from text has long been a goal of AI. Initial approaches were purely logical and brittle. More recently, the availability of large quantities of text on the Web has led to the development of machine learning approaches. However, to date these have mainly extracted ground facts, as opposed to general knowledge. Other learning approaches can extract logical forms, but require supervision and do not scale. In this paper we present an unsupervised approach to extracting semantic networks from large volumes of text. We use the TextRunner system [1] to extract tuples from text, and then induce general concepts and relations from them by jointly clustering the objects and relational strings in the tuples. Our approach is defined in Markov logic using four simple rules. Experiments on a dataset of two million tuples show that it outperforms three other relational clustering approaches, and extracts meaningful semantic networks.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proc. IJCAI 2007, Hyderabad, India. AAAI Press, Menlo Park (2007)
Banko, M., Etzioni, O.: Strategies for lifelong knowledge extraction from the web. In: Proc. K-CAP-2007, British Columbia, Canada (2007)
Charniak, E.: Toward a Model of Children’s Story Comprehension. PhD thesis, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Boston, MA (1972)
Craven, M.W., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the World Wide Web. In: Proc. AAAI 1998, Madison, WI, pp. 509–516. AAAI Press, Menlo Park (1998)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proc. KDD 2003, Washington, DC (2003)
Dyer, M.G.: In-Depth Understanding. MIT Press, Cambridge (1983)
Etzioni, O., Banko, M., Cafarella, M.J.: Machine reading. In: Proc. 2007 AAAI Spring Symposium on Machine Reading, Palo Alto, CA. AAAI Press, Menlo Park (2007)
Gellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Genesereth, M.R., Nilsson, N.J.: Logical Foundations of Artificial Intelligence. Morgan Kaufmann, San Mateo (1987)
Hasegawa, T., Sekine, S., Grishman, R.: Discovering relations among named entities from large corpora. In: Proc. ACL 2004, Barcelona, Spain (2004)
Kemp, C., Tenenbaum, J.B., Griffiths, T.L., Yamada, T., Ueda, N.: Learning systems of concepts with an infinite relational model. In: Proc. AAAI 2006, Boston, MA. AAAI Press, Menlo Park (2006)
Kok, S., Domingos, P.: Statistical predicate invention. In: Proc. ICML 2007, Corvallis, Oregon, pp. 440–443. ACM Press, New York (2007)
Lehnert, W.G.: The Process of Question Answering. Erlbaum, Hillsdale (1978)
McCallum, A., Jensen, D.: A note on the unification of information extraction and data mining using conditional-probability, relational models. In: Proc. IJCAI 2003 Workshop on Learning Statistical Models from Relational Data, Acapulco, Mexico, pp. 79–86. IJCAII (2003)
McCallum, A., Nigam, K., Ungar, L.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proc. KDD 2000, pp. 169–178 (2000)
Mitchell, T.: Reading the web: A breakthrough goal for AI. AI Magazine 26(3), 12–16 (2005)
Mooney, R.J.: Learning for semantic parsing. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 311–324. Springer, Heidelberg (2007)
Pasca, M., Lin, D., Bigham, J., Lifchits, A., Jain, A.: Names and similarities on the web: Fact extraction on the fast lane. In: Proc. ACL/COLING 2006 (2006)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco (1988)
Quillian, M.R.: Semantic memory. In: Minsky, M.L. (ed.) Semantic Information Processing, pp. 216–270. MIT Press, Cambridge (1968)
Rajaraman, K., Tan, A.-H.: Mining semantic networks for knowledge discovery. In: Proc. ICMD 2003 (2003)
Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62, 107–136 (2006)
Schank, R.C., Riesbeck, C.K.: Inside Computer Understanding. Erlbaum, Hillsdale (1981)
Schwarz, G.: Estimating the dimension of a model. Annals of Statistics 6, 461–464 (1978)
Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: Proc. HLT-NAACL 2006, New York (2006)
Wong, Y.W., Mooney, R.J.: Learning synchronous grammars for semantic parsing with lambda calculus. In: Proc. ACL 2007, Prague, Czech Republic (2007)
Xu, Z., Tresp, V., Yu, K., Kriegel, H.-P.: Infinite hidden relational models. In: Proc. UAI 2006, Cambridge, MA (2006)
Yates, A., Etzioni, O.: Unsupervised resolution of objects and relations on the web. In: Proc. NAACL-HLT 2007, Rochester, NY (2007)
Zettlemoyer, L.S., Collins, M.: Learning to map sentences to logical form: Structured classification with probabilistic categorial grammers. In: Proc. UAI 2005, Edinburgh, Scotland (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kok, S., Domingos, P. (2008). Extracting Semantic Networks from Text Via Relational Clustering. In: Daelemans, W., Goethals, B., Morik, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008. Lecture Notes in Computer Science(), vol 5211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87479-9_59
Download citation
DOI: https://doi.org/10.1007/978-3-540-87479-9_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87478-2
Online ISBN: 978-3-540-87479-9
eBook Packages: Computer ScienceComputer Science (R0)