Abstract
This chapter addresses the analysis of information networks, focusing on heterogeneous information networks with more than one type of nodes and arcs. After an overview of tasks and approaches to mining heterogeneous information networks, the presentation focuses on text-enriched heterogeneous information networks whose distinguishing property is that certain nodes are enriched with text information. A particular approach to mining text-enriched heterogeneous information networks is presented that combines text mining and network mining approaches. The approach decomposes a heterogeneous network into separate homogeneous networks, followed by concatenating the structural context vectors calculated from separate homogeneous networks with the bag-of-words vectors obtained from textual information contained in certain network nodes. The approach is show-cased on the analysis of two real-life text-enriched heterogeneous citation networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25(3), 211–230 (2003)
Barabási, A.L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientific collaborations. Phys. A: Stat. Mech. Appl. 311(3–4), 590–614 (2002)
Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI (1997)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Burt, R., Minor, M.: Applied Network Analysis: a Methodological Introduction. Sage Publications
Chen, B., Ding, Y., Wild, D.J.: Assessing drug target association using semantic linked data. PLoS Comput. Biol. 8(7), (2012)
Chen, H., Sharp, B.M.: Content-rich biological network constructed by mining pubmed abstracts. BMC Bioinf. 5, 147 (2004)
Cichocki, A.: Era of big data processing: a new approach via tensor networks and tensor decompositions (2014)
Consortium. Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25(1), 25–29 (2000)
Crestani, F.: Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6), 453–482 (1997)
Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining, pp. 281–288 (2011)
Dutkowski, J., Ideker, T.: Protein networks as logic functions in development and cancer. PLoS Comput. Biol. 7(9), (2011)
Grcar, M., Trdin, N., and Lavrac, N. A methodology for mining document-enriched heterogeneous information networks. The Computer Journal, 56(3), 321–335 (2013)
Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nat. Meth. 10(11), 1108–1115 (2013)
Hwang, T., Kuang, R.: A heterogeneous label propagation algorithm for disease gene discovery. In: Proceedings of SIAM International Conference on Data Mining, pp. 583–594 (2010)
Jeh, G., Widom, J.: SimRank: a measure of structural-context similarity. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002). ACM
Jenssen, T.-K., Laegreid, A., Komorowski, J., Hovig, E.: A literature network of human genes for high-throughput analysis of gene expression. Nat. Genet. 28(1), 21–28 (2001)
Ji, M., Sun, Y., Danilevsky, M., Han, J., Gao, J.: Graph regularized transductive classification on heterogeneous information networks. In: Proceedings of the 25th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 570–586 (2010)
Joachims, T., Finley, T., Yu, C.-N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77(1), 27–59 (2009)
Kanehisa, M., Goto, S.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Kok, S., Domingos, P.: Extracting semantic networks from text via relational clustering. In: Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases—Part I, ECML PKDD ’08, pp. 624–639. Springer, Heidelberg (2008)
Kondor, R.I., Lafferty, J.D.: Diffusion kernels on graphs and other discrete input spaces. In: Proceedings of the 19th International Conference on Machine Learning, pp. 315–322 (2002)
Kralj, J., Valmarska, A., Robnik Šikonja, M., Lavrač, N.: Mining text enriched heterogeneous citation networks. In: Proceedings of the 19th Pacific-Asia Conference on Knowledge Discovery and Data Mining (2015)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Lytras, M., Sheth, A.: Progressive Concepts for Semantic Web Evolution: Applications and Developments. IGI Global (2010)
Newman, M.: Clustering and preferential attachment in growing networks. Phys. Rev. E 64(2), 025102 (2001a)
Newman, M.E.J.: The structure of scientific collaboration networks. Proc. Natl Acad. Sci. USA 98(2), 404–409 (2001b)
Nickel, M.: Tensor Factorization for Relational Learning. PhD thesis, Ludwig–Maximilians–Universitaet Muenchen (2013)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing Order to the web. Technical report, Stanford InfoLab (1999)
Plantie, , M., Crampes, M.: Survey on social community detection. In: Ramzan, N., Zwol, R., Lee, J.-S., Cluver, K., Hua, X.-S. (eds) Social Media Retrieval, Computer Communications and Networks, pp. 65–85. Springer, London (2013)
Rakotomamonjy, A., Bach, F., Canu, S., Grandvalet, Y.: SimpleMKL. J. Mach. Learn. Res. 9, 2491–2521 (2008)
Storn, R., Price, K.: Differential evolution; a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341–359 (1997)
Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan and Claypool Publishers (2012)
Sun, Y., Han, J., Zhao, P., Yin, Z., Cheng, H., Wu, T.: RankClus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the International Conference on Extending Data Base Technology, pp. 565–576 (2009a)
Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806 (2009b)
Van Landeghem, S., De Bodt, S., Drebert, Z.J., Inze, D., Van de Peer, Y.: The potential of text mining in data integration and network biology for plant research: a case study on arabidopsis. Plant Cell 25(3), 794–807 (2013)
Vanunu, O., Magger, O., Ruppin, E., Shlomi, T., Sharan, R.: Associating genes and protein complexes with disease via network propagation. PLoS Comput. Biol. 6(1), (2010)
Vervliet, N., Debals, O., Sorber, L., De Lathauwer, L.: Breaking the curse of dimensionality using decompositions of incomplete tensors: tensor-based scientific computing in big data analysis. Sign. Process. Mag. IEEE 31(5), 71–79 (2014)
Watts, D.J., Strogatz, S.H.: Collective dynamics of ’small-world’ networks. Nature 393(6684), 440–442 (1998)
Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, Heidelberg (2010)
Zachary, W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16(16), 321–328 (2004)
Acknowledgments
The presented work was partially supported by the European Commission through the Human Brain Project (Grant number 604102) and by the Slovenian Research Agency project “Development and applications of new semantic data mining methods in life sciences” (Grant number J2-5478).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Kralj, J., Valmarska, A., Grčar, M., Robnik-Šikonja, M., Lavrač, N. (2016). Analysis of Text-Enriched Heterogeneous Information Networks. In: Japkowicz, N., Stefanowski, J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-26989-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26987-0
Online ISBN: 978-3-319-26989-4
eBook Packages: EngineeringEngineering (R0)