Abstract
Transductive classification is a useful way to classify texts when just few labeled examples are available. Transductive classification algorithms rely on term frequency to directly classify texts represented in vector space model or to build networks and perform label propagation. Related terms tend to belong to the same class and this information can be used to assign relevance scores of terms for classes and consequently the labels of documents. In this paper we propose the use of term networks to model term relations and perform transductive classification. In order to do so, we propose (i) different ways to generate term networks, (ii) how to assign initial relevance scores for terms, (iii) how to propagate the relevance scores among terms, and (iv) how to use the relevance scores of terms in order to classify documents. We demonstrate that transductive classification based on term networks can surpass the accuracies obtained by transductive classification considering texts represented in other types of networks or vector space model, or even the accuracies obtained by inductive classification. We also demonstrated that we can decrease the size of term networks through feature selection while keeping classification accuracy and decreasing computational complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Zhao, P.: Towards graphical models for text processing. Knowledge & Information Systems 36(1), 1–21 (2013)
Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: Proc. Special Interest Group on Information Retrieval Conference, pp. 485–492. ACM (2006)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. Conf. Computational Learning Theory, pp. 92–100. ACM (1998)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proc. Int. Conf. Knowledge Discovery and Data Mining, pp. 269–274. ACM (2001)
Forman, G.: 19MclassTextWc dataset (2006), http://sourceforge.net/projects/weka/files/datasets/text-datasets/19MclassTextWc.zip/download
GarcÃa, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences 180(10), 2044 (2010)
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: A survey. ACM Computing Surveys 38(3) (2006)
Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 570–586. Springer, Heidelberg (2010)
Jiang, C., Coenen, F., Sanderson, R., Zito, M.: Text classification using graph mining-based feature extraction. Knowledge-Based Systems 23(4), 302–308 (2010)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proc. Int. Conf. on Machine Learning, pp. 200–209 (1999)
Marcacini, R.M., Rezende, S.O.: Incremental construction of topic hierarchies using hierarchical term clustering. In: Int. Conf. Software Engineering & Knowledge Engineering, pp. 553–558 (2010)
Markov, A., Last, M., Kandel, A.: Model-based classification of web documents represented by graphs. In: Proc. Workshop on Web Mining and Web Usage Analysis, pp. 1–8 (2006)
Matsuo, Y., Sakaki, T., Uchiyama, K., Ishizuka, M.: Graph-based word clustering using a web search engine. In: Prof. Conf. on Empirical Methods in Natural Language Processing, pp. 542–550. ACL (2006)
Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proc. Conf. Empirical Methods in Natural Language Processing (2004)
Mishra, M., Huan, J., Bleik, S., Song, M.: Biomedical text categorization with concept graph representations using a controlled vocabulary. In: Proc. Int. Workshop on Data Mining in Bioinformatics, pp. 26–32. ACM (2012)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford University (November 1998), http://ilpubs.stanford.edu:8090/422/
Porter, M.F.: An algorithm for suffix stripping. Readings in Information Retrieval 14(3), 130–137 (1980)
Rossi, R.G., de Andrade Lopes, A., de Paulo Faleiros, T., Rezende, S.O.: Inductive model generation for text classification using a bipartite heterogeneous network. Journal of Computer Science and Technology 3(29), 361–375 (2014)
Rossi, R.G., Lopes, A.A., Rezende, S.O.: A parameter-free label propagation algorithm using bipartite heterogeneous networks for text classification. In: Proc. Symposium on Applied Computing. ACM (2014)
Rossi, R.G., Marcacini, R.M., Rezende, S.O.: Benchmarking text collections for classification and clustering tasks. Tech. Rep. 395, Institute of Mathematics and Computer Sciences - University of Sao Paulo (2013), http://www.icmc.usp.br/CMS/Arquivos/arquivos_enviados/BIBLIOTECA_113_RT_395.pdf
Salton, G.: Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc. (1989)
Schenker, A., Last, M., Bunke, H., Kandel, A.: Classification of web documents using a graph model. In: Proc. Int. Conf. Document Analysis and Recognition, pp. 240–244. IEEE Computer Society (2003)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Solé, R.V., Corominas-Murtra, B., Valverde, S., Steels, L.: Language networks: their structure, function, and evolution. Complexity 15(6), 20–26 (2010)
Steyvers, M., Tenenbaum, J.B.: The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science 29, 41–78 (2005)
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proc. Int. Conf. Knowledge Discovery and Data Mining, pp. 32–41. ACM (2002)
Tseng, Y.-H., Ho, Z.-P., Yang, K.-S., Chen, C.-C.: Mining term networks from text collections for crime investigation. Expert Systems with Applications 39(11), 10082–10090 (2012)
Wang, W., Do, D.B., Lin, X.: Term graph model for text classification. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 19–30. Springer, Heidelberg (2005)
Weiss, S.M., Indurkhya, N., Zhang, T.: Fundamentals of Predictive Text Mining. Springer London Ltd. (2010)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann (2005)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. Annual Meeting on Association for Computational Linguistics, pp. 189–196. Association for Computational Linguistics (1995)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Advances in Neural Information Processing Systems, vol. 16, pp. 321–328 (2004)
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proc. Int. Conf. Machine Learning, pp. 912–919. AAAI Press (2003)
Zhu, X., Goldberg, A.B., Brachman, R., Dietterich, T.: Introduction to Semi-Supervised Learning. Morgan and Claypool Publishers (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Rossi, R.G., Rezende, S.O., de Andrade Lopes, A. (2015). Term Network Approach for Transductive Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-18117-2_37
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)