Abstract
Literature-based discovery aims to discover hidden connections between previously disconnected research areas. Heterogeneous bibliographic information network (HBIN) provides a latent, semi-structured, bibliographic information model to signal the potential connections between scientific papers. This paper introduces a novel literature-based discovery method that builds meta path features from HBIN network to predict co-citation links between previously disconnected literatures. We evaluated the performance of our method in predicting future co-citation links between fish oil and Raynaud’s syndrome papers. Our experimental results showed that HBIN meta path features could predict future co-citation links between these papers with high accuracy (0.851 F-Measure; 0.845 precision; 0.857 recall), outperforming the existing document similarity algorithms such as LDA, TF-IDF, and Bibliographic Coupling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Piatetsky-Shapiro, G., Dheraba, C., Getoor, L., Grossman, R., Feldman, R., Zaki, M.: What are the grand challenges for data mining?: KDD-2006 panel report. SIGKDD Explor. Newslett. 8(2), 70–77 (2006)
Smalheiser, N.R.: Literature-based discovery: Beyond the ABCs. J. Am. Soc. Inform. Sci. Tech. 63(2), 218–224 (2012)
Kostoff, R.N., Block, J.A., Solka, J.L., Briggs, M.B., Rushenberg, R.L., Stump, J.A., Johnson, D., Lyons, T.J., Wyatt, J.R.: Literature-related discovery. Annu. Rev. Inform. Sci. Tech. 43(1), 1–71 (2009)
Small, H.: Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inform. Sci. 24(4), 265–269 (1973)
Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Persp. Bio. Med. 30(1), 7–18 (1986)
Swanson, D.R., Smalheiser, N.R.: An interactive system for finding complementary literatures: a stimulus to scientific discovery. Artif. Intell. 91(2), 183–203 (1997)
Yetisgen-Yildiz, M., Pratt, W.: Using statistical and knowledge-based approaches for litera-ture-based discovery. J. Biomed. Inform. 39(6), 600–611 (2006)
Bassecoulard, E., Zitt, M.: Patents and publications. In: Moed, H.F., Glanzel, W., Schmoch, U. (eds.) Handbook of Quantitative Science and Technology Research, Chap. 30, pp. 665–694. Springer (2005)
Wei, C.-P., Chen, K.-A., Chen, L.-C.: Mining biomedical literature and ontologies for drug repositioning discovery. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., Kao, H.-Y. (eds.) PAKDD 2014, Part II. LNCS (LNAI), vol. 8444, pp. 373–384. Springer, Heidelberg (2014)
Cheng, L., Lin, H., Zhou, F., Yang, Z., Wang, J.: Enhancing the accuracy of knowledge dis-covery: a supervised learning method. BMC Bioinform. 15, S9 (2014)
Hristovski, D., Friedman, C., Rindflesch, T., Peterlin, B.: Literature-based knowledge discovery using natural language processing. In: Bruza, P., Weeber, M. (eds.) Literature-Based Discovery, pp. 133–152. Springer (2008)
Cameron, D., Bodenreider, O., Yalamanchili, H., Danh, T., Vallabhaneni, S., Thirunarayan, K., Sheth, A.P., Rindflesch, T.C.: A graph-based recovery and decomposition of Swanson’s hypothesis using semantic predications. J. Biomed. Inform. 46(2), 238–251 (2013)
Sun, Y., Han, J.: Mining heterogeneous information networks: principles and methodologies. Morgan & Claypool (2012)
Yu, X., Gu, Q., Zhou, M., Han, J.: Citation prediction in heterogeneous bibliographic net-works. In: 2012 SIAM Conference on Data Mining, Anaheim, pp. 1119–1130 (2012)
Liu, X., Yu, Y., Guo, C., Sun, Y., Gao, L.: Full-text based context-rich heterogeneous network mining approach for citation recommendation. In: 2014 ACM IEEE Joint Conference on Digital Libraries, London, pp 361–370 (2014)
Ren, X., Liu, J., Yu, X., Khandelwal, U., Gu, Q., Wang, L., Han, J.: ClusCite: effective cita-tion recommendation by information network-based clustering. In: 20th ACM International Conference on Knowledge Discovery and Data Mining, New York, pp. 821–830 (2014)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newslett. 11(1), 10–18 (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill (1983)
Kessler, M.M.: Bibliographic coupling between scientific papers. Am. Doc. 14(1), 10–25 (1963)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sebastian, Y., Siew, EG., Orimaye, S.O. (2015). Predicting Future Links Between Disjoint Research Areas Using Heterogeneous Bibliographic Information Network. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)