Constructing Feature Set by Using Temporal Clustering of Term Usages in Document Categorization

Abe, Hidenao; Tsumoto, Shusaku

doi:10.1007/978-3-642-30114-8_14

Hidenao Abe³ &
Shusaku Tsumoto⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 423))

502 Accesses

Abstract

For discovering some chances in documents with temporal context, it is important to handle their contents represented as words and phrases, called “keywords”. However, in conventional methods, keywords are selected based on their frequency and/or a particular importance index such as tf-idf throughout their observed period. In this chapter, we describe a method for characterizing large number of documents, considering the temporal features of appeared terms, by obtaining document clusters based on the similarities between the document that are characterized by the temporal patterns of an importance index for considering temporal differences in term usages. As an experiment, we performed document clustering for four sets of bibliographical documents using two feature sets: popular feature terms appearances and the appearances of temporal patterns for each document. Then, we compared the time dependencies of the two document clustering results. Our feature construction method succeeded in representing the time differences in the documents using features based on temporal patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The dblp computer science bibliography, http://www.informatik.uni-trier.de/~ley/db/
Abe, H., Tsumoto, S.: Text categorization with considering temporal patterns of term usages. In: Fan, W., Hsu, W., Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) ICDM Workshops, pp. 800–807. IEEE Computer Society (2010)
Google Scholar
Anderberg, M.R.: Cluster Analysis for Applications. Monographs and Textbooks on Probability and Mathematical Statistics. Academic Press, Inc., New York (1973)
Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: KDD 2001: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 269–274. ACM, New York (2001)
Chapter Google Scholar
Keogh, E., Chu, S., Hart, D., Pazzani, M.: Segmenting time series: A survey and novel approach. In: Data mining in Time Series Databases, pp. 1–22. World Scientific (2003) (an Edited Volume)
Google Scholar
Kontostathis, A., Galitsky, L., Pottenger, W.M., Roy, S., Phelps, D.J.: A survey of emerging trend detection in textual data mining. A Comprehensive Survey of Text Mining (2003)
Google Scholar
Lent, B., Agrawal, R., Srikant, R.: Discovering trends in text databases. In: KDD 1997: Proceedings of the third ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 227–230. AAAI Press (1997)
Google Scholar
Liao, T.W.: Clustering of time series data: a survey. Pattern Recognition 38, 1857–1874 (2005)
Article MATH Google Scholar
Lin, D., Wu, X.: Phrase clustering for discriminative learning. In: ACL-IJCNLP 2009: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 2, pp. 1030–1038. Association for Computational Linguistics, USA (2009)
Google Scholar
Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: KDD 2005: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 198–207. ACM, USA (2005)
Chapter Google Scholar
Nakagawa, H.: Automatic term recognition based on statistics of compound nouns. Terminology 6(2), 195–210 (2000)
Google Scholar
Ohsaki, M., Abe, H., Yamaguchi, T.: Numerical time-series pattern extraction based on irregular piecewise aggregate approximation and gradient specification. New Generation Comput. 25(3), 213–222 (2007)
Article MATH Google Scholar
Ohsawa, Y., McBurney, P.: Chance discovery. Advanced information processing. Springer (2003)
Google Scholar
Shaparenko, B., Caruana, R., Gehrke, J., Joachims, T.: Identifying temporal patterns and key players in document collections. In: IEEE ICDM Workshop on Temporal Data Mining: Algorithms, Theory and Applications (TDM 2005), pp. 165–174 (2005)
Google Scholar
Sparck Jones K.: A statistical interpretation of term specificity and its application in retrieval. Document Retrieval Systems, 132–142 (1988)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann (2000)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML 1997: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information and Communications, Bunkyo University, 1100 Namegaya, Chigasaki, Kanagawa, 2538550, Japan
Hidenao Abe
Department of Medical Informatics, Shimane University, School of Medicine, 89-1 Enya-cho, Izumo, Shimane, 693-8501, Japan
Shusaku Tsumoto

Authors

Hidenao Abe
View author publications
You can also search for this author in PubMed Google Scholar
Shusaku Tsumoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hidenao Abe .

Editor information

Editors and Affiliations

The University of Tokyo, Hongo 7-3-1, Tokyo, 113-8656, Japan
Yukio Ohsawa
Faculty of Letters, Chiba University, Chiba, Japan
Akinori Abe

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Abe, H., Tsumoto, S. (2013). Constructing Feature Set by Using Temporal Clustering of Term Usages in Document Categorization. In: Ohsawa, Y., Abe, A. (eds) Advances in Chance Discovery. Studies in Computational Intelligence, vol 423. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30114-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-30114-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30113-1
Online ISBN: 978-3-642-30114-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics