M.W. Berry and M. Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.SIAM, Philadelphia, 1999.
P. Berkhin and J.D. Becher. Learning simple relations: Theory and applications. In Proceedings of the Second SIAM International Conference on Data Mining, Arlington, VA, pages 410–436, April 2002.
D. Boley, M. Gini, R. Gross, E.-H. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore. Document categorization and query generation on the World Wide Web using WebACE. AIReview, 13 (5,6): 365–391, 1999.
D. Boley, M. Gini, R. Gross, E.-H. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore.Partitioning-based clustering for Web document categorization.Decision Support Systems, 27 (3): 329–341, 1999.
D.L. Boley.Principal direction divisive partitioning.Data Mining and Knowledge Discovery, 2 (4): 325–344, 1998.
M. Damashek. Gauging similarity with n-grams: Language-independent categorization of text. Science, 267: 843–848, 1995.
I.S. Dhillon, Y. Guan, and J. Kogan.Refining clusters in high-dimensional text data.In Proceedings of the Workshop on Clustering High Dimensional Data and Its Applications at the Second SIAM International Conference on Data Mining, I.S. Dhillon and J. Kogan, eds., pages 71–82. SIAM, Philadelphia, 2002.
I.S. Dhillon and D.S. Modha.Concept decompositions for large sparse text data using clustering.Machine Learning, 42(1): 143–175, Jan 2001.A1so appears as IBM Research Report RJ 10147, Jul 1999.
I.S. Dhillon, S. Malella, and R. Kumar.Enhanced word clustering for hierarchical text classification. In KDD-2002,2002.
R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification, second edition. Wiley, New York, 2001.
E. Forgy. Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications.Biometrics, 21 (3): 768, 1965.
E. Gendler and J. Kogan.Index terms selection for clustering large text data.In Proceedings of the Workshop on Text Mining at the Second SIAM International Conference on Data Mining, M.W. Berry, ed., pages 87–94, 2002.
M. Ganapathiraju, J. Klein-Seetharaman, R. Rosenfeld, J. Carbonell, and R. Reddy.Rare and frequent n-grams in whole-genome protein sequences.In Proceedings of RECOMB’02: The Sixth Annual International Conference on Research in Computational Molecular Biology,2002.
G. Grefenstette. Explorations in Automatic Thesaurus Discover y.Kluwer Academic, Boston, 1994.
J. Kogan. Clustering large unstructured document sets. In Computational Information Retrieval, M.W. Berry, ed., pages 107–117, SIAM, Philadelphia, 2001.
J. Kogan. Means clustering for text data. In Proceedings of the Workshop on Text Mining at the First SIAM International Conference on Data Mining, M.W. Berry, ed., pages 47–57, 2001.
J. Kogan. Computational information retrieval. Springer-Verlag Lecture Notes in Contributions to Statistics, H.R. Lerche, ed., 2002. To appear.
C. Pearce and C. Nicholas.TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data.Journal of the American Society for Information Science, 47: 263–275, 1996.
M.F. Porter. An algorithm for suffix stripping.Program, 14: 130–137, 1980.
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval.Mc Graw-Hill, New York, 1983.
H. Schütze and J. Pedersen. Information retrieval based on word senses. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, pages 161–175, 1995.
R. Shamir and R. Sharan. Algorithmic approaches to clustering gene expression data. In Current Topics in Computational Molecular Biology, T. Jiang, T. Smith, Y. Xu, and M. Q. Zhang, eds., pages 269–300, MIT Press, Cambridge, MA, 2002.
N. Slonim and N. Tishby. The power of word clusters for text classification. In Proceedings of the 23rd European Colloquium on Information Retrieval Research (ECIR), Darmstadt, 2001.
Y. Zhao and G. Karypis. Comparison of agglomerative and partitional document clustering algorithms. In Proceedings of the Workshop on Clustering High Dimensional Data and Its Applications at the Second SIAM International Conference on Data Mining, I.S. Dhillon and J. Kogan, eds., pages 83–93. SIAM, Philadelphia, 2002.