Skip to main content

Analysis of Text-Enriched Heterogeneous Information Networks

  • Chapter
  • First Online:
Big Data Analysis: New Algorithms for a New Society

Part of the book series: Studies in Big Data ((SBD,volume 16))

Abstract

This chapter addresses the analysis of information networks, focusing on heterogeneous information networks with more than one type of nodes and arcs. After an overview of tasks and approaches to mining heterogeneous information networks, the presentation focuses on text-enriched heterogeneous information networks whose distinguishing property is that certain nodes are enriched with text information. A particular approach to mining text-enriched heterogeneous information networks is presented that combines text mining and network mining approaches. The approach decomposes a heterogeneous network into separate homogeneous networks, followed by concatenating the structural context vectors calculated from separate homogeneous networks with the bag-of-words vectors obtained from textual information contained in certain network nodes. The approach is show-cased on the analysis of two real-life text-enriched heterogeneous citation networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25(3), 211–230 (2003)

    Google Scholar 

  2. Barabási, A.L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientific collaborations. Phys. A: Stat. Mech. Appl. 311(3–4), 590–614 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI (1997)

    Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Burt, R., Minor, M.: Applied Network Analysis: a Methodological Introduction. Sage Publications

    Google Scholar 

  6. Chen, B., Ding, Y., Wild, D.J.: Assessing drug target association using semantic linked data. PLoS Comput. Biol. 8(7), (2012)

    Google Scholar 

  7. Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining pubmed abstracts. BMC Bioinf. 5, 147 (2004)

    Article  Google Scholar 

  8. Cichocki, A.: Era of big data processing: a new approach via tensor networks and tensor decompositions (2014)

    Google Scholar 

  9. Consortium. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25(1), 25–29 (2000)

    Google Scholar 

  10. Crestani, F.: Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6), 453–482 (1997)

    Article  Google Scholar 

  11. Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288 (2011)

    Google Scholar 

  12. Dutkowski, J., Ideker, T.: Protein networks as logic functions in development and cancer. PLoS Comput. Biol. 7(9), (2011)

    Google Scholar 

  13. Grcar, M., Trdin, N., and Lavrac, N. A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3), 321–335 (2013)

    Google Scholar 

  14. Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nat. Meth. 10(11), 1108–1115 (2013)

    Article  Google Scholar 

  15. Hwang, T., Kuang, R.: A heterogeneous label propagation algorithm for disease gene discovery. In: Proceedings of SIAM International Conference on Data Mining, pp. 583–594 (2010)

    Google Scholar 

  16. Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002). ACM

    Google Scholar 

  17. Jenssen, T.-K., Laegreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28(1), 21–28 (2001)

    Google Scholar 

  18. Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Proceedings of the 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 570–586 (2010)

    Google Scholar 

  19. Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)

    Article  MATH  Google Scholar 

  20. Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)

    Article  Google Scholar 

  21. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering. In: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases—Part I, ECML PKDD ’08, pp. 624–639. Springer, Heidelberg (2008)

    Google Scholar 

  23. Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the 19th International Conference on Machine Learning, pp. 315–322 (2002)

    Google Scholar 

  24. Kralj, J., Valmarska, A., Robnik Šikonja, M., Lavrač, N.: Mining text enriched heterogeneous citation networks. In: Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2015)

    Google Scholar 

  25. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  Google Scholar 

  26. Lytras, M., Sheth, A.: Progressive Concepts for Semantic Web Evolution: Applications and Developments. IGI Global (2010)

    Google Scholar 

  27. Newman, M.: Clustering and preferential attachment in growing networks. Phys. Rev. E 64(2), 025102 (2001a)

    Article  Google Scholar 

  28. Newman, M.E.J.: The structure of scientific collaboration networks. Proc. Natl Acad. Sci. USA 98(2), 404–409 (2001b)

    Article  MathSciNet  MATH  Google Scholar 

  29. Nickel, M.: Tensor Factorization for Relational Learning. PhD thesis, Ludwig–Maximilians–Universitaet Muenchen (2013)

    Google Scholar 

  30. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing Order to the web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  31. Plantie, , M., Crampes, M.: Survey on social community detection. In: Ramzan, N., Zwol, R., Lee, J.-S., Cluver, K., Hua, X.-S. (eds) Social Media Retrieval, Computer Communications and Networks, pp. 65–85. Springer, London (2013)

    Google Scholar 

  32. Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)

    MathSciNet  MATH  Google Scholar 

  33. Storn, R., Price, K.: Differential evolution; a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  34. Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan and Claypool Publishers (2012)

    Google Scholar 

  35. Sun, Y., Han, J., Zhao, P., Yin, Z., Cheng, H., Wu, T.: RankClus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the International Conference on Extending Data Base Technology, pp. 565–576 (2009a)

    Google Scholar 

  36. Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2009b)

    Google Scholar 

  37. Van Landeghem, S., De Bodt, S., Drebert, Z.J., Inze, D., Van de Peer, Y.: The potential of text mining in data integration and network biology for plant research: a case study on arabidopsis. Plant Cell 25(3), 794–807 (2013)

    Article  Google Scholar 

  38. Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), (2010)

    Google Scholar 

  39. Vervliet, N., Debals, O., Sorber, L., De Lathauwer, L.: Breaking the curse of dimensionality using decompositions of incomplete tensors: tensor-based scientific computing in big data analysis. Sign. Process. Mag. IEEE 31(5), 71–79 (2014)

    Article  Google Scholar 

  40. Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393(6684), 440–442 (1998)

    Article  Google Scholar 

  41. Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, Heidelberg (2010)

    Google Scholar 

  42. Zachary, W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)

    Article  Google Scholar 

  43. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)

    Google Scholar 

Download references

Acknowledgments

The presented work was partially supported by the European Commission through the Human Brain Project (Grant number 604102) and by the Slovenian Research Agency project “Development and applications of new semantic data mining methods in life sciences” (Grant number J2-5478).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Kralj .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Kralj, J., Valmarska, A., Grčar, M., Robnik-Šikonja, M., Lavrač, N. (2016). Analysis of Text-Enriched Heterogeneous Information Networks. In: Japkowicz, N., Stefanowski, J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26989-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26987-0

  • Online ISBN: 978-3-319-26989-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics