Data Mining pp 335-350 | Cite as

Using Web Text Mining to Predict Future Events: A Test of the Wisdom of Crowds Hypothesis

Part of the Annals of Information Systems book series (AOIS, volume 8)


This chapter describes an algorithm that predicts events by mining Internet data. A number of specialized Internet search engine queries were designed to summarize results from relevant web pages. At the core of these queries was a set of algorithms that embody the wisdom of crowds hypothesis. This hypothesis states that under the proper conditions the aggregated opinion of a number of nonexperts is more accurate than the opinion of a set of experts. Natural language processing techniques were used to summarize the opinions expressed from all relevant web pages. The specialized queries predicted event results at a statistically significant level. It was hypothesized that predictions from the entire Internet would outperform the predictions of a smaller number of highly ranked web pages. This hypothesis was not confirmed. This data replicated results from an earlier study and indicated that the Internet can make accurate predictions of future events. Evidence that the Internet can function as a wise crowd as predicted by the wisdom of crowds hypothesis was mixed.


Proper Noun Reality Television Super Bowl Gubernatorial Election Music Album 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    ABC (2007) Dancing with the Stars. Available:
  2. 2.
  3. 3.
    Bagrow JP, Rozenfeld HD, Bollt EM Ben-Avraham, D (2004) How famous is a scientist? -Famous to those who know us. Europhys Lett 67(4):511–516CrossRefGoogle Scholar
  4. 4. (2007) Billboard Album Charts - Top 100 Albums - Music Retail Sales. Available:
  5. 5. (2007) Television and Movie Betting at Bodog Sportsbook. Available:
  6. 6. (2007) Television and Movie Betting, American Idol Odds at Bodog Sportsbook. Available:
  7. 7.
    Brin S, Page L (2007) The Anatomy of a Large-Scale Hypertextual Web Search Engine. Available:
  8. 8.
    CNN (2006) - Elections 2006. Available:
  9. 9.
    CNN (2006) - Elections 2006. Available:
  10. 10.
    CNN (2006) - Elections 2006. Available:
  11. 11.
    Debnath S, Pennock DM, Giles CL, Lawrence S (2003) Information incorporation in online in-game sports betting markets. In: Proceedings of the 4th ACM conference on electronic commerce. ACM, New YorkGoogle Scholar
  12. 12.
    Fama EF (1965) Random Walks in Stock Market Prices. Financial Anal J September/OctoberGoogle Scholar
  13. 13.
    Gelbukh A (2006) Computational Linguistics and Intelligent Text Processing. Springer, BerlinCrossRefGoogle Scholar
  14. 14.
    Pion S, Hamel L (2007) The Internet Democracy: A Predictive Model Based on Web Text Mining. In: Stahlbock R et al. (eds) Proceedings of the 2007 International Conference on Data Mining. CSREA Press, USAGoogle Scholar
  15. 15.
    Simkin MV, Roychowdhury VP (2006) Theory of Aces: Fame by chance or merit? Available:
  16. 16.
    Surowiecki J (2004) The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Doubleday Publishing, Westminster, MDGoogle Scholar
  17. 17.
  18. 18.
    Tyburski G (2006) It’s Tough to Get a Good Date with a Search Engine. Available:
  19. 19. (2007) College Basketball Future Book Odds at, the leader in Sportsbook and Gaming information - College Basketball Odds, College Basketball Futures, College Basketball Future Odds. Available:
  20. 20. (2007) College Basketball Future Book Odds at, the leader in Sportsbook and Gaming information - NHL Odds, NHL Futures, Pro Hockey Odds, Pro Hockey Futures. Available:
  21. 21. (2007) NBA Future Odds at, The Leader in Sportsbook and Gaming Information - NBA Odds, NBA Futures, NBA Future Odds. Available:
  22. 22.
    Wikipedia (2007) Project Runway. Available:
  23. 23.
    Wikipedia (2007) Survivor: Cook Islands. Available:
  24. 24.
    Yahoo! (2006) Available:
  25. 25.
    Yahoo! (2006) Yahoo! search web services. Available:
  26. 26.
    Yahoo! Movies (2007) Yahoo! Movies - In Theaters This Weekend. Available:
  27. 27.
    Yahoo! Movies (2007) Yahoo! Movies - Weekend Box Office and Buzz. Available:
  28. 28.

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.University of Rhode IslandKingstonUSA

Personalised recommendations