Skip to main content

Part of the book series: Texts in Computer Science ((TCS))

  • 3314 Accesses

Abstract

In this chapter, we briefly touch on topics that may increase in importance for text mining, but are not yet central to prediction. These include summarization, active learning, learning with unlabeled data, learning with multiple samples or models, online learning, cost-sensitive learning, unbalanced samples and rare events, distributed text mining, rank learning and question answering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • X. Bao, L. Bergman, and R. Thompson. Stacking recommendation engines with additional meta-features. In RecSys’09: Proceedings of the Third ACM Conference on Recommender Systems, pages 109–116. ACM, New York, 2009.

    Google Scholar 

  • R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize. IEEE Spectrum, pages 28–33, 2009.

    Google Scholar 

  • A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 92–100. ACM, New York, 1998.

    Chapter  Google Scholar 

  • C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML’05, 2005.

    Google Scholar 

  • M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of EMNLP’02. ACL, East Stroudsburg, 2002.

    Google Scholar 

  • D. Cossock and T. Zhang. Statistical analysis of Bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11):5140–5154, 2008.

    Article  MathSciNet  Google Scholar 

  • F. Damerau. Problems and some solutions in customization of natural language database front ends. ACM Transactions on Information Systems, 3(2):165–184, 1985.

    Article  Google Scholar 

  • J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.

    Article  Google Scholar 

  • S. Dzeroski and B. Ženko. Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54(3):255–273, 2004.

    Article  MATH  Google Scholar 

  • Y. Freund, R. Iyer, R. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933–969, 2003.

    MathSciNet  Google Scholar 

  • S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. SIGOPS Operating Systems Review, 37(5):29–43, 2003.

    Article  Google Scholar 

  • R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In B. Schölkopf, A. Smola, P. Bartlett and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 115–132. MIT Press, Cambridge, 2000.

    Google Scholar 

  • V. Iyengar, C. Apté, and T. Zhang. Active learning using adaptive resampling. In The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 91–98. ACM, New York, 2000.

    Chapter  Google Scholar 

  • K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR’00, pages 41–48, 2000.

    Google Scholar 

  • D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, San Francisco, 1994.

    Google Scholar 

  • R. Liere and P. Tadepalli. Active learning with committees for text categorization. In Proceedings of the 14th National Conference on Artificial Intelligence, pages 591–596. AAAI Press, Menlo Park, 1997.

    Google Scholar 

  • N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.

    Google Scholar 

  • H. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159–165, 1958.

    Article  MathSciNet  Google Scholar 

  • K. Nigam. Using unlabeled data to improve text classification. Ph.D. thesis, Carnegie Mellon University, 2001.

    Google Scholar 

  • D. Radev and S. Tenfel, editors. Proceedings of the HLT NAACL 2003 Workshop on Text Summarization. ACL, East Stroudsburg, 2003.

    Google Scholar 

  • D. Radev, M. Topper, and A. Winkel. Multi-document centroid-based text summarization. In Proceedings of ACL-02 Demo Session, pages 112–113. ACL, East Stroudsburg, 2002.

    Google Scholar 

  • F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, New York, 1962.

    MATH  Google Scholar 

  • R. Schapire and Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000.

    Article  MATH  Google Scholar 

  • S. Seshasai. Winston, Katz sue Ask Jeeves: AI lab researchers attempt to enforce natural language patent. The Tech (MIT), 2000. http://www-tech.mit.edu/V119/N66/.

  • E. Voorhees and L. Buckland, editors. NIST Special Publication 500-251: The Eleventh Text Retrieval Conference (TREC 2002), Gaithersburg, Maryland, 19–22 November 2002. NIST Press, Washington, 2002. Co-sponsored by DARPA and ARDA.

    Google Scholar 

  • S. Weiss, C. Apté, F. Damerau, D. Johnson, F. Oles, T. Goetz, and T. Hampp, Maximizing text-mining performance. IEEE Intelligent Systems, 14(4):63–69, 1999.

    Article  Google Scholar 

  • T. White. Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol, 2009.

    Google Scholar 

  • T. Zhang and F. Oles. A probability analysis on the value of unlabeled data for classification problems. In Proceedings of ICML-00, pages 1191–1198. Morgan Kaufmann, San Francisco, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sholom M. Weiss .

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London Limited

About this chapter

Cite this chapter

Weiss, S.M., Indurkhya, N., Zhang, T. (2010). Emerging Directions. In: Fundamentals of Predictive Text Mining. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84996-226-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-84996-226-1_9

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84996-225-4

  • Online ISBN: 978-1-84996-226-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics