Abstract
Every year, thousands of papers are published in journals and conferences by researchers in many different fields. These papers are an important guide for other researchers. However, the increasing amount of digital data with the development of information technologies makes it difficult to reach the desired information. Recommendation systems play an important role in facilitating researchers' access to studies on their subjects. It provides faster and easier access to papers on the desired subject. Recommendation systems are developed according to the user profile or subject. In this paper, a novel hybrid paper recommendation system based on deep learning is proposed. The method uses a combination of document similarity, hierarchical clustering, and keyword extraction. Our aim is to group papers in different fields such as computer science, economics, medicine, or in a specific field, according to their subjects, and to present papers with high semantic similarity to the user according to the query entered. The study has been applied on real dataset containing papers from different categories such as machine learning, artificial intelligence, human–computer interaction in computer science. The success of each stage of the study has been evaluated separately. However, looking at the system as a whole, the overall performance of the proposed approach is 80%. Papers having high similarity with their queries have been recommended to users. Thus, access to the studies on the desired subject in the huge amount of papers has been made faster and easier.
Similar content being viewed by others
References
Ariff, N. M., Bakar, M. A. A., & Rahmad, M. I. (2018). Comparative study of document clustering algorithms. International Journal of Engineering and Technology (UAE), 7(4), 246–251.
Bai, X., Wang, M., Lee, I., Yang, Z., Kong, X., & Xia, F. (2019). Scientific paper recommendation: A survey. Ieee Access, 7, 9324–9339.
Bancu, C., Dagadita, M., Dascalu, M., Dobre, C., Trausan-Matu, S., & Florea, A. M. (2012). ARSYS—Article Recommender System. In 2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (pp. 349–355). IEEE.
Bharti, S. K., Babu, K. S. (2017). Automatic keyword extraction for text summarization: A survey. arXiv preprint arXiv:1704.03242.
Bulut, B., Gündoğan, E., Kaya, B., Alhajj, R., Kaya, M. (2020). User’s research interests based paper recommendation system: A deep learning approach. In Putting Social Media and Networking Data in Practice for Education, Planning, Prediction and Recommendation (pp. 117–130). Springer, Cham.
Bütün, E., & Kaya, M. (2019). Predicting citation count of scientists as a link prediction problem. IEEE Transactions on Cybernetics, 50(10), 4518–4529.
Bütün, E., Kaya, M., & Alhajj, R. (2018). Extension of neighbor-based link prediction methods for directed, weighted and temporal social networks. Information Sciences, 463, 152–165.
Dai, A. M., Olah, C., Le, Q. V. (2015). Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998.
Firoozeh, N., Nazarenko, A., Alizon, F., & Daille, B. (2020). Keyword extraction: Issues and methods. Natural Language Engineering, 26(3), 259–291.
Gündoğan, E., & Kaya, M. (2019). Creating special issues automatically for papers accepted in journals. In 2019 1st International Informatics and Software Engineering Conference (UBMYK) (pp. 1–4). IEEE.
Gündoğan, E., & Kaya, M. (2020). Research paper classification based on Word2vec and community discovery. In 2020 International Conference on Decision Aid Sciences and Application (DASA) (pp. 1032–1036). IEEE.
Lau, J. H., & Baldwin, T. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368.
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In International Conference on Machine Learning (pp. 1188–1196). PMLR.
Lee, Y. C., Yeom, J., Song, K., Ha, J., Lee, K., Yeo, J., & Kim, S. W. (2016b). Recommendation of research papers in DBpia: A Hybrid approach exploiting content and collaborative data. In 2016b IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 002966–002971). IEEE.
Lee, Y. C., Yeom, J., Song, K., Ha, J., Lee, K., Yeo, J., & Kim, S. W. (2016a). Recommendation of research papers in DBpia: A Hybrid approach exploiting content and collaborative data. In 2016a IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 002966–002971). IEEE.
Liu, H., Kou, H., Yan, C., & Qi, L. (2020). Keywords-driven and popularity-aware paper recommendation based on undirected paper citation graph. Complexity, 2020, 1–15. https://doi.org/10.1155/2020/2085638
Lorbeer, B., Kosareva, A., Deva, B., Softić, D., Ruppel, P., & Küpper, A. (2018). Variations on the clustering algorithm BIRCH. Big Data Research, 11, 44–53.
Ma, L., Zhang, Y. (2015, October). Using Word2Vec to process big text data. In 2015 IEEE International Conference on Big Data (Big Data) (pp. 2895–2897). IEEE.
Pan, L., Dai, X., Huang, S., & Chen, J. (2015). Academic paper recommendation based on heterogeneous graph. In Chinese computational linguistics and natural language processing based on naturally annotated big data (pp. 381–392). Springer, Cham.
Pan, S., Li, Z., & Dai, J. (2019). An improved TextRank keywords extraction algorithm. In Proceedings of the ACM Turing Celebration Conference-China (pp. 1–7).
Pera, M. S., & Ng, Y. K. (2014). Exploiting the wisdom of social connections to make personalized recommendations on scholarly articles. Journal of Intelligent Information Systems, 42(3), 371–391.
Qaiser, S., & Ali, R. (2018). Text mining: Use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), 25–29.
Qingyun, Z., Yuansheng, F., Zhenlei, S., & Wanli, Z. (2020). Keyword extraction method for complex nodes based on TextRank algorithm. In 2020 International Conference on Computer Engineering and Application (ICCEA) (pp. 359–363). IEEE.
Ramadhani, F., Zarlis, M., & Suwilo, S. (2020). Improve BIRCH algorithm for big data clustering. IOP Conference Series: Materials Science and Engineering, 725(1), 012090.
Shirkhorshidi, A. S., Aghabozorgi, S., Wah, T. Y., & Herawan, T. (2014). Big data clustering: A review. In International conference on computational science and its applications (pp. 707–720). Springer, Cham.
Son, J., & Kim, S. B. (2018). Academic paper recommender system using multilevel simultaneous citation networks. Decision Support Systems, 105, 24–33.
Steinert, L., & Hoppe, H. U. (2016). A comparative analysis of network-based similarity measures for scientific paper recommendations. In 2016 Third European Network Intelligence Conference (ENIC) (pp. 17–24). IEEE.
Sugiyama, K., & Kan, M. Y. (2013). Exploiting potential citation papers in scholarly paper recommendation. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 153–162).
Sun, J., Ma, J., Liu, Z., & Miao, Y. (2014). Leveraging content and connections for scientific article recommendation in social computing contexts. The Computer Journal, 57(9), 1331–1342.
Wang, G., Zhang, X., Wang, H., Chu, Y., & Shao, Z. (2021). Group-oriented paper recommendation with probabilistic matrix factorization and evidential reasoning in scientific social network. IEEE Transactions on Systems, Man, and Cybernetics: Systems.
Wang, G., He, X., & Ishuga, C. I. (2018). HAR-SI: A novel hybrid article recommendation approach integrating with social information in scientific social network. Knowledge-Based Systems, 148, 85–99.
Wang, H., Ye, J., Yu, Z., Wang, J., & Mao, C. (2020). Unsupervised keyword extraction methods based on a word graph network. International Journal of Ambient Computing and Intelligence (IJACI), 11(2), 68–79.
Wen, Y., Yuan, H., & Zhang, P. (2016). Research on keyword extraction based on word2vec weighted textrank. In 2016 2nd IEEE International Conference on Computer and Communications (ICCC) (pp. 2109–2113). IEEE.
West, J. D., Wesley-Smith, I., & Bergstrom, C. T. (2016). A recommendation system based on hierarchical clustering of an article-level citation network. IEEE Transactions on Big Data, 2(2), 113–123.
Xia, X. (2020). Clustering Analysis of Interactive Learning Activities Based on Improved BIRCH Algorithm. arXiv preprint arXiv:2010.03821.
Xia, F., Liu, H., Lee, I., & Cao, L. (2016). Scientific article recommendation: Exploiting common author relations and historical preferences. IEEE Transactions on Big Data, 2(2), 101–112.
Xia, F., Wang, W., Bekele, T. M., & Liu, H. (2017). Big scholarly data: A survey. IEEE Transactions on Big Data, 3(1), 18–35.
Zhang, Z., Petrak, J., & Maynard, D. (2018). Adapted textrank for term extraction: A generic method of improving automatic term extraction algorithms. Procedia Computer Science, 137, 102–108.
Zhao, W., Wu, R., & Liu, H. (2016). Paper recommendation based on the knowledge gap between a researcher’s background knowledge and research target. Information Processing & Management, 52(5), 976–988.
Acknowledgements
This work was supported by Scientific Research Projects Coordination Unit of Fırat University under Grant No: MF.20.09.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gündoğan, E., Kaya, M. A novel hybrid paper recommendation system using deep learning. Scientometrics 127, 3837–3855 (2022). https://doi.org/10.1007/s11192-022-04420-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04420-8