Mining Text Enriched Heterogeneous Citation Networks

Kralj, Jan; Valmarska, Anita; Robnik-Šikonja, Marko; Lavrač, Nada

doi:10.1007/978-3-319-18038-0_52

Jan Kralj^10,11,
Anita Valmarska^10,11,
Marko Robnik-Šikonja¹² &
…
Nada Lavrač^10,11,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9077))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3543 Accesses
3 Citations

Abstract

The paper presents an approach to mining text enriched heterogeneous information networks, applied to a task of categorizing papers from a large citation network of scientific publications in the field of psychology. The methodology performs network propositionalization by calculating structural context vectors from homogeneous networks, extracted from the original network. The classifier is constructed from a table of structural context vectors, enriched with the bag-of-words vectors calculated from individual paper abstracts. A series of experiments was performed to examine the impact of increasing the number of publications in the network, and adding different types of structural context vectors. The results indicate that increasing the network size and combining both types of information is beneficial for improving the accuracy of paper categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Burt, R., Minor, M.: Applied Network Analysis: A Methodological Introduction. Sage Publications (1983)
Google Scholar
Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288 (2011)
Google Scholar
D’Orazio, V., Landis, S.T., Palmer, G., Schrodt, P.: Separating the wheat from the chaff: Applications of automated document classification using support vector machines. Polytical Analysis 22(2), 224–242 (2014)
Article Google Scholar
Dutkowski, J., Ideker, T.: Protein networks as logic functions in development and cancer. PLoS Computational Biology 7(9), (2011)
Google Scholar
Grčar, M., Trdin, N., Lavrač, N.: A methodology for mining document-enriched heterogeneous information networks. The Computer Journal 56(3), 321–335 (2013)
Article Google Scholar
Han, E.-H.S., Karypis, G.: Centroid-based document classification: analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Chapter Google Scholar
Haveliwala, T., Kamvar, S.: The second eigenvalue of the Google matrix. Technical report, Stanford InfoLab (2003)
Google Scholar
Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nature Methods 10(11), 1108–1115 (2013)
Article Google Scholar
Hwang, T., Kuang, R.: A heterogeneous label propagation algorithm for disease gene discovery. In: Proceedings of SIAM International Conference on Data Mining, pp. 583–594 (2010)
Google Scholar
Jeh, G., Widom, J.: SimRank: A measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM (2002)
Google Scholar
Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 570–586. Springer, Heidelberg (2010)
Chapter Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Article MATH MathSciNet Google Scholar
Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the 19th International Conference on Machine Learning, pp. 315–322 (2002)
Google Scholar
Kwok, J.T.-Y.: Automated text categorization using support vector machine. In: Proceedings of the 5th International Conference on Neural Information Processing, pp. 347–351 (1998)
Google Scholar
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. Journal of Machine Learning Research 2, 139–154 (2002)
MATH Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. Journal of Machine Learning Research 9, 2491–2521 (2008)
MATH MathSciNet Google Scholar
Storn, R., Price, K.: Differential evolution; a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4), 341–359 (1997)
Article MATH MathSciNet Google Scholar
Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers (2012)
Google Scholar
Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2009)
Google Scholar
Tan, S.: An effective refinement strategy for KNN text classifier. Expert Syst. Appl. 30(2), 290–298 (2006)
Article Google Scholar
Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology 6(1) (2010)
Google Scholar
Wong, T.-T.: Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification. Data Mining and Knowledge Discovery 28(1), 123–144 (2014)
Article MATH MathSciNet Google Scholar
Zachary, W.: An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 452–473 (1977)
Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Advances in Neural Information Processing Systems 16(16), 321–328 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Jan Kralj, Anita Valmarska & Nada Lavrač
Jožef Stefan International Postgraduate School, Jamova 39, 1000, Ljubljana, Slovenia
Jan Kralj, Anita Valmarska & Nada Lavrač
Faculty of Computer and Information Science, Večna pot 113, 1000, Ljubljana, Slovenia
Marko Robnik-Šikonja
University of Nova Gorica, Vipavska 13, 5000, Nova Gorica, Slovenia
Nada Lavrač

Authors

Jan Kralj
View author publications
You can also search for this author in PubMed Google Scholar
Anita Valmarska
View author publications
You can also search for this author in PubMed Google Scholar
Marko Robnik-Šikonja
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Kralj .

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, Ho Chi Minh City, Vietnam
Tru Cao
Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Japan Advanced Institute of Science and Technology, Nomi City, Japan
Tu-Bao Ho
University of Hong Kong, Hong Kong, Hong Kong SAR
David Cheung
Osaka University, Osaka, Japan
Hiroshi Motoda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kralj, J., Valmarska, A., Robnik-Šikonja, M., Lavrač, N. (2015). Mining Text Enriched Heterogeneous Citation Networks. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9077. Springer, Cham. https://doi.org/10.1007/978-3-319-18038-0_52

Download citation

DOI: https://doi.org/10.1007/978-3-319-18038-0_52
Published: 17 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18037-3
Online ISBN: 978-3-319-18038-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics