Data Mining and Knowledge Discovery

, Volume 27, Issue 3, pp 372–395

Growing a list

  • Benjamin Letham
  • Cynthia Rudin
  • Katherine A. Heller
Article

DOI: 10.1007/s10618-013-0329-7

Cite this article as:
Letham, B., Rudin, C. & Heller, K.A. Data Min Knowl Disc (2013) 27: 372. doi:10.1007/s10618-013-0329-7

Abstract

It is easy to find expert knowledge on the Internet on almost any topic, but obtaining a complete overview of a given topic is not always easy: information can be scattered across many sources and must be aggregated to be useful. We introduce a method for intelligently growing a list of relevant items, starting from a small seed of examples. Our algorithm takes advantage of the wisdom of the crowd, in the sense that there are many experts who post lists of things on the Internet. We use a collection of simple machine learning components to find these experts and aggregate their lists to produce a single complete and meaningful list. We use experiments with gold standards and open-ended experiments without gold standards to show that our method significantly outperforms the state of the art. Our method uses the ranking algorithm Bayesian Sets even when its underlying independence assumption is violated, and we provide a theoretical generalization bound to motivate its use.

Keywords

Set completion Ranking Internet data mining Collective intelligence 

Supplementary material

10618_2013_329_MOESM1_ESM.pdf (467 kb)
Supplementary material 1 (pdf 466 KB)

Copyright information

© The Author(s) 2013

Authors and Affiliations

  • Benjamin Letham
    • 1
  • Cynthia Rudin
    • 2
  • Katherine A. Heller
    • 3
  1. 1.Operations Research CenterMassachusetts Institute of TechnologyCambridgeUSA
  2. 2.MIT Sloan School of ManagementMassachusetts Institute of TechnologyCambridgeUSA
  3. 3.Center for Cognitive Neuroscience, Statistical ScienceDuke UniversityDurhamUSA

Personalised recommendations