Machine Learning and Data Mining in Pattern Recognition
Volume 5632 of the series Lecture Notes in Computer Science pp 689-703
An Approach to Web-Scale Named-Entity Disambiguation
- Luís SarmentoAffiliated withFaculdade de Engenharia da Universidade do Porto - DEI - LIACC
- , Alexander KehlenbeckAffiliated withGoogle Inc
- , Eugénio OliveiraAffiliated withFaculdade de Engenharia da Universidade do Porto - DEI - LIACC
- , Lyle UngarAffiliated withUniversity of Pennsylvania - CS
Abstract
We present a multi-pass clustering approach to large scale, wide-scope named-entity disambiguation (NED) on collections of web pages. Our approach uses name co-occurrence information to cluster and hence disambiguate entities, and is designed to handle NED on the entire web. We show that on web collections, NED becomes increasingly difficult as the corpus size increases, not only because of the challenge of scaling the NED algorithm, but also because new and surprising facets of entities become visible in the data. This effect limits the potential benefits for data-driven approaches of processing larger data-sets, and suggests that efficient clustering-based disambiguation methods for the web will require extracting more specialized information from documents.
- Title
- An Approach to Web-Scale Named-Entity Disambiguation
- Book Title
- Machine Learning and Data Mining in Pattern Recognition
- Book Subtitle
- 6th International Conference, MLDM 2009, Leipzig, Germany, July 23-25, 2009. Proceedings
- Pages
- pp 689-703
- Copyright
- 2009
- DOI
- 10.1007/978-3-642-03070-3_52
- Print ISBN
- 978-3-642-03069-7
- Online ISBN
- 978-3-642-03070-3
- Series Title
- Lecture Notes in Computer Science
- Series Volume
- 5632
- Series ISSN
- 0302-9743
- Publisher
- Springer Berlin Heidelberg
- Copyright Holder
- Springer-Verlag Berlin Heidelberg
- Additional Links
- Topics
- Industry Sectors
- eBook Packages
- Editors
-
- Petra Perner (19)
- Editor Affiliations
-
- 19. Institut für Bildverarbeitung und angewandte Informatik
- Authors
-
- Luís Sarmento (20)
- Alexander Kehlenbeck (21)
- Eugénio Oliveira (20)
- Lyle Ungar (22)
- Author Affiliations
-
- 20. Faculdade de Engenharia da Universidade do Porto - DEI - LIACC, Rua Dr. Roberto Frias, s/n, 4200-465, Porto, Portugal
- 21. Google Inc, New York, NY, USA
- 22. University of Pennsylvania - CS, 504 Levine, 200 S. 33rdSt, Philadelphia, PA, USA
Continue reading...
To view the rest of this content please follow the download PDF link above.