Skip to main content

Mining Text Enriched Heterogeneous Citation Networks

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9077))

Included in the following conference series:

Abstract

The paper presents an approach to mining text enriched heterogeneous information networks, applied to a task of categorizing papers from a large citation network of scientific publications in the field of psychology. The methodology performs network propositionalization by calculating structural context vectors from homogeneous networks, extracted from the original network. The classifier is constructed from a table of structural context vectors, enriched with the bag-of-words vectors calculated from individual paper abstracts. A series of experiments was performed to examine the impact of increasing the number of publications in the network, and adding different types of structural context vectors. The results indicate that increasing the network size and combining both types of information is beneficial for improving the accuracy of paper categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Burt, R., Minor, M.: Applied Network Analysis: A Methodological Introduction. Sage Publications (1983)

    Google Scholar 

  • Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288 (2011)

    Google Scholar 

  • D’Orazio, V., Landis, S.T., Palmer, G., Schrodt, P.: Separating the wheat from the chaff: Applications of automated document classification using support vector machines. Polytical Analysis 22(2), 224–242 (2014)

    Article  Google Scholar 

  • Dutkowski, J., Ideker, T.: Protein networks as logic functions in development and cancer. PLoS Computational Biology 7(9), (2011)

    Google Scholar 

  • Grčar, M., Trdin, N., Lavrač, N.: A methodology for mining document-enriched heterogeneous information networks. The Computer Journal 56(3), 321–335 (2013)

    Article  Google Scholar 

  • Han, E.-H.S., Karypis, G.: Centroid-based document classification: analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  • Haveliwala, T., Kamvar, S.: The second eigenvalue of the Google matrix. Technical report, Stanford InfoLab (2003)

    Google Scholar 

  • Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nature Methods 10(11), 1108–1115 (2013)

    Article  Google Scholar 

  • Hwang, T., Kuang, R.: A heterogeneous label propagation algorithm for disease gene discovery. In: Proceedings of SIAM International Conference on Data Mining, pp. 583–594 (2010)

    Google Scholar 

  • Jeh, G., Widom, J.: SimRank: A measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543. ACM (2002)

    Google Scholar 

  • Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 570–586. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  • Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  • Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the 19th International Conference on Machine Learning, pp. 315–322 (2002)

    Google Scholar 

  • Kwok, J.T.-Y.: Automated text categorization using support vector machine. In: Proceedings of the 5th International Conference on Neural Information Processing, pp. 347–351 (1998)

    Google Scholar 

  • Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. Journal of Machine Learning Research 2, 139–154 (2002)

    MATH  Google Scholar 

  • Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  • Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. Journal of Machine Learning Research 9, 2491–2521 (2008)

    MATH  MathSciNet  Google Scholar 

  • Storn, R., Price, K.: Differential evolution; a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4), 341–359 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  • Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers (2012)

    Google Scholar 

  • Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2009)

    Google Scholar 

  • Tan, S.: An effective refinement strategy for KNN text classifier. Expert Syst. Appl. 30(2), 290–298 (2006)

    Article  Google Scholar 

  • Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology 6(1) (2010)

    Google Scholar 

  • Wong, T.-T.: Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification. Data Mining and Knowledge Discovery 28(1), 123–144 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  • Zachary, W.: An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, 452–473 (1977)

    Google Scholar 

  • Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Advances in Neural Information Processing Systems 16(16), 321–328 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Kralj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kralj, J., Valmarska, A., Robnik-Šikonja, M., Lavrač, N. (2015). Mining Text Enriched Heterogeneous Citation Networks. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9077. Springer, Cham. https://doi.org/10.1007/978-3-319-18038-0_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18038-0_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18037-3

  • Online ISBN: 978-3-319-18038-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics