Emerging Directions

Weiss, Sholom M.; Indurkhya, Nitin; Zhang, Tong

doi:10.1007/978-1-84996-226-1_9

Sholom M. Weiss⁵,
Nitin Indurkhya⁶ &
Tong Zhang⁷

Part of the book series: Texts in Computer Science ((TCS))

3314 Accesses

Abstract

In this chapter, we briefly touch on topics that may increase in importance for text mining, but are not yet central to prediction. These include summarization, active learning, learning with unlabeled data, learning with multiple samples or models, online learning, cost-sensitive learning, unbalanced samples and rare events, distributed text mining, rank learning and question answering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

X. Bao, L. Bergman, and R. Thompson. Stacking recommendation engines with additional meta-features. In RecSys’09: Proceedings of the Third ACM Conference on Recommender Systems, pages 109–116. ACM, New York, 2009.
Google Scholar
R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize. IEEE Spectrum, pages 28–33, 2009.
Google Scholar
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 92–100. ACM, New York, 1998.
Chapter Google Scholar
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML’05, 2005.
Google Scholar
M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of EMNLP’02. ACL, East Stroudsburg, 2002.
Google Scholar
D. Cossock and T. Zhang. Statistical analysis of Bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11):5140–5154, 2008.
Article MathSciNet Google Scholar
F. Damerau. Problems and some solutions in customization of natural language database front ends. ACM Transactions on Information Systems, 3(2):165–184, 1985.
Article Google Scholar
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
Article Google Scholar
S. Dzeroski and B. Ženko. Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54(3):255–273, 2004.
Article MATH Google Scholar
Y. Freund, R. Iyer, R. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933–969, 2003.
MathSciNet Google Scholar
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. SIGOPS Operating Systems Review, 37(5):29–43, 2003.
Article Google Scholar
R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In B. Schölkopf, A. Smola, P. Bartlett and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 115–132. MIT Press, Cambridge, 2000.
Google Scholar
V. Iyengar, C. Apté, and T. Zhang. Active learning using adaptive resampling. In The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 91–98. ACM, New York, 2000.
Chapter Google Scholar
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR’00, pages 41–48, 2000.
Google Scholar
D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, San Francisco, 1994.
Google Scholar
R. Liere and P. Tadepalli. Active learning with committees for text categorization. In Proceedings of the 14th National Conference on Artificial Intelligence, pages 591–596. AAAI Press, Menlo Park, 1997.
Google Scholar
N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.
Google Scholar
H. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159–165, 1958.
Article MathSciNet Google Scholar
K. Nigam. Using unlabeled data to improve text classification. Ph.D. thesis, Carnegie Mellon University, 2001.
Google Scholar
D. Radev and S. Tenfel, editors. Proceedings of the HLT NAACL 2003 Workshop on Text Summarization. ACL, East Stroudsburg, 2003.
Google Scholar
D. Radev, M. Topper, and A. Winkel. Multi-document centroid-based text summarization. In Proceedings of ACL-02 Demo Session, pages 112–113. ACL, East Stroudsburg, 2002.
Google Scholar
F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, New York, 1962.
MATH Google Scholar
R. Schapire and Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000.
Article MATH Google Scholar
S. Seshasai. Winston, Katz sue Ask Jeeves: AI lab researchers attempt to enforce natural language patent. The Tech (MIT), 2000. http://www-tech.mit.edu/V119/N66/.
E. Voorhees and L. Buckland, editors. NIST Special Publication 500-251: The Eleventh Text Retrieval Conference (TREC 2002), Gaithersburg, Maryland, 19–22 November 2002. NIST Press, Washington, 2002. Co-sponsored by DARPA and ARDA.
Google Scholar
S. Weiss, C. Apté, F. Damerau, D. Johnson, F. Oles, T. Goetz, and T. Hampp, Maximizing text-mining performance. IEEE Intelligent Systems, 14(4):63–69, 1999.
Article Google Scholar
T. White. Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol, 2009.
Google Scholar
T. Zhang and F. Oles. A probability analysis on the value of unlabeled data for classification problems. In Proceedings of ICML-00, pages 1191–1198. Morgan Kaufmann, San Francisco, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

T.J. Watson Research Center, IBM Corporation, Kitchawan Road 1101, Yorktown Heights, 10598, NY, USA
Sholom M. Weiss
School of Computer Science & Engg., University of New South Wales, Sydney, 2052, NSW, Australia
Nitin Indurkhya
Dept. Statistics, Hill Center, Rutgers University, Piscataway, 08854-8019, NJ, USA
Tong Zhang

Authors

Sholom M. Weiss
View author publications
You can also search for this author in PubMed Google Scholar
Nitin Indurkhya
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sholom M. Weiss .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Weiss, S.M., Indurkhya, N., Zhang, T. (2010). Emerging Directions. In: Fundamentals of Predictive Text Mining. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84996-226-1_9

Download citation

DOI: https://doi.org/10.1007/978-1-84996-226-1_9
Publisher Name: Springer, London
Print ISBN: 978-1-84996-225-4
Online ISBN: 978-1-84996-226-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics