Abstract
This study investigates to evaluate feasibility of k-means clustering algorithm in order to improve effectiveness of the results recommended by RICEST Journal Finder System. More than 15,000 papers published in filed of engineering journals during 2013–2017 were collected from their websites. Their titles, abstracts and keywords were extracted, normalized and processed in order to form the test body. According to the number of papers collected, using Cochran's formula, 400 papers completely relevant to the subject of each journal were randomly and proportionally selected and entered the system as queries in order to receive the journals recommended by the system before and after k-means clustering algorithm and the results were recorded. Finally, effectiveness of the system results was determined at each stage by leave-one-out cross validation method based on precision at K top ranked results. Also, opinions of subject reviewers on relevance of the target journal were investigated through a questionnaire. Results showed that before data clustering, only 40% of target journal was recommended at the first 3 ranks. But after k-means clustering algorithm, in more than 80% of searches, the target journal was retrieved at the first 3 ranks. Also, effectiveness of the recommendations, according to 210 subject reviewers, after k-means clustering algorithm, showed that more than 80% of the recommended journals are completely relevant to the given paper. According to the study results, data clustering can significantly increase effectiveness of the results recommended by journal recommender systems.
Similar content being viewed by others
Notes
- Regional Information Center for Science and Technology.
References
Abbas, O. A. (2008). Comparisons between data clustering algorithms. International Arab Journal of Information Technology, 5(3), 320–325.
Aggarwal, C. C. (2016). An introduction to recommender systems. In Recommender systems: Springer Cham. https://doi.org/10.1007/978-3-319-29659-3_1
Ahuja, R., Solanki, A., & Nayyar, A. (2019). Movie recommender system using K-Means clustering and K-Nearest Neighbor. In 2019 9th International Conference on Cloud Computing, Data Science & Engineering, 263–268.
Almohsen, K. A., & Al-Jobori, H. (2015). Recommender systems in light of big data. International Journal of Electrical and Computer Engineering, 5(6), 1553–1563.
Anchalia, P. P., Koundinya, A. K., & Srinath, N. K. (2013, June). MapReduce design of K-means clustering algorithm. In 2013 International Conference on Information Science and Applications, 1–5
Anderson, K. (2012). Editorial Rejection - Increasingly Important, Yet Often Overlooked Or Dismissed, in The Scholarly Kitchen
Bahadoran, Z., Mirmiran, P., Kashfi, K., & Ghasemi, A. (2021). Scientific Publishing in Biomedicine: How to Choose a Journal? International Journal of Endocrinology and Metabolism, 19(1), e108417.
Bar-Ilan, J., Keenoy, K., Levene, M., & Yaari, E. (2009). Presentation bias is significant indetermining user preference for search results- A user study. Journal of the American Society for Information Science and Technology, 60(1), 135–149.
Basaran, D., Ntoutsi, E., & Zimek, A. (2017). Redundancies in data and their effect on the evaluation of recommendation systems: A case study on the amazon reviews datasets. In Proceedings of the 2017 SIAM international conference on data mining, 390–398.
Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems: A literature survey. International Journal on Digital Libraries, 17(4), 305–338.
Beheshtipur, Jafari and javanbakht, (2012). Persian document clustering algorithm based on improved algorithm and feature selection. In 7th Scientific Conference on Command and Control of Iran Tehran
Borglund, J. (2013). Event-centric clustering of news articles
Celebi, M. E., & Aydin, K. (2016). Unsupervised learning algorithms. Springer International Publishing.
Chaboki bonab, haji eskandari, sharifi, (2019). Clustering Persian Web Documents Using a Combination of Data Mining Methods and an Evolutionary Algorithm. In 6th International Conference on New Science and Technology Findings with a Focus on Science in the Service of Development Tehran, Iran
Chen, T. T., and Lee, M. (2018). Research paper recommender systems on big scholarly data. In Pacific Rim Knowledge Acquisition Workshop, 251–260
Das, D., Sahoo, L., & Datta, S. (2017). A survey on recommendation system. International Journal of Computer Applications, 160(7), 6–10.
Dash, R., Paramguru, R. L., & Dash, R. (2011). Comparative analysis of supervised and unsupervised discretization techniques. International Journal of Advances in Science and Technology, 2(3), 29–37.
Errami, M., Wren, J. D., Hicks, J. M., & Garner. H. R. (2007). eTBLAST: A web server to identify expert reviewers, appropriate journals and similar publications. Nucleic Acids Research, (35), Web Server issue. https://doi.org/10.1093/nar/gkm221.
Fayyaz, Z., Ebrahimian, M., Nawara, D., Ibrahim, A., & Kashef, R. (2020). Recommendation systems: algorithms, challenges, metrics, and business opportunities. Applied Sciences, 10(21), 7748.
Feng, X., Zhang, H., Ren, Y., Shang, P., Zhu, Y., Liang, Y., & Xu, D. (2019). The deep learning-based recommender system “pubmender” for choosing a biomedical publication venue: development and validation study. Journal of Medical Internet Research, 21(5), e12957.
Göksedef, M., & Gündüz-Öğüdücü, Ş. (2010). Combination of Web page recommender systems. Expert Systems with Applications, 37(4), 2911–2922.
Golubovic, N., Krintz, C., Wolski, R., Sethuramasamyraja, B., & Liu, B. (2019). A scalable system for executing and scoring K-means clustering techniques and its impact on applications in agriculture. International Journal of Big Data Intelligence, 6(3–4), 163–175.
Guo, X., Li, X., & Yu, Y. (2021). Publication delay adjusted impact factor: The effect of publication delay of articles on journal impact factor. Journal of Informetrics, 15(1), 101100.
Huisman, J., & Smits, J. (2017). Duration and quality of the peer review process: The author’s perspective. Scientometrics, 113(1), 633–650.
Isinkaye, F. O., Folajimi, Y. O., & Ojokoh, B. A. (2015). Recommendation systems: Principles, methods and evaluation. Egyptian Informatics Journal, 16(3), 261–273.
Jafari Powersy, H., Hariri, N., Alipour-Hafezi, M., Bab Al-Hawaiji, F., & Khademi, M. (2020). Machine indexing of documents in the field of information retrieval using text mining in the rapidminer software. Jipm, 35(2), 349–374.
Jiang, X., Li, C., & Sun, J. (2017). A modified K-means clustering for mining of multimedia databases based on dimensionality reduction and similarity measures. Cluster Computing, 4, 1–8.
Jung, Y. G., Kang, M. S., & Heo, J. (2014). Clustering performance comparison using K-means and expectation maximization algorithms. Biotechnology & Biotechnological Equipment, 28(sup1), S44–S48.
Kadkhodaei P, Shams A. (2013). Clustering of persian texts using the algorithm. 2th Extending Industrial Applications of Information, Communication and Computations (EIAICC2013 Conference); 2013 Oct 30- 31; Tabriz.
Kalra, M., Lal, N., & Qamar, S. (2018). K-Mean clustering algorithm approach for data mining of heterogeneous data. In Information and Communication Technology for Sustainable Development. https://doi.org/10.1007/978-981-10-3920-1_7
Kalra, V., & Aggarwal, R. (2017). Importance of Text Data Preprocessing & Implementation in RapidMiner. In ICITKM, 71–75.
Kang, N., Doornenbal, M., Schijvenaars, B. (2015). Elsevier Journal Finder: Recommending Journals for your Paper. RecSys '15, September 16-20, Vienna Austria.
Khusro, S., Ali, Z., & Ullah, I. (2016). Recommender systems: issues, challenges, and research opportunities. In Information Science and Applications. https://doi.org/10.1007/978-981-10-0557-2_112
Kim, K. J., & Ahn, H. (2008). A recommender system using GA K-means clustering in an online shopping market. Expert Systems with Applications, 34(2), 1200–1209.
Kumar, S., Mishra, S., & Asthana, P. (2018). Automated detection of acute leukemia using k-mean clustering algorithm. Advances in Computer and Computational Sciences. https://doi.org/10.1007/978-981-10-3773-3_64
Lama, P. (2013) Clustering system based on text mining using the K-means algorithm: news headlines clustering
Lewandowski, D. (2008). The retrieval effectiveness of web search engines: Considering results descriptions. Journal of Documentation, 64(6), 915–937.
Li, X., Li, X., & Ma, H. (2020). Deep representation clustering-based fault diagnosis method with unsupervised data applied to rotating machinery. Mechanical Systems and Signal Processing, 143, 106825.
Liang, D., Charlin, L., McInerney. J. & Blei, D. M., (2016). Modeling user exposure in recommendation, in: Proceedings of the 25th Inter-national Conference on World Wide Web, International World Wide Web Conferences Steering Committee. 951–961.
Lin, Z., Hou, S., & Wu, J. (2016). The correlation between editorial delay and the ratio of highly cited papers in Nature Science and Physical Review Letters. Scientometrics, 107(3), 1457–1464.
Lops, P., Jannach, D., Musto, C., Bogers, T., & Koolen, M. (2019). Trends in content-based recommendation. User Modeling and User-Adapted Interaction, 29(2), 239–249.
Mihelčić, M., Antulov-Fantulin, N., Bošnjak, M., & Šmuc, T. (2012). Extending rapidminer with recommender systems algorithms. In RapidMiner Community Meeting and Conference (RCOMM 2012)
Mohamed, M. H., Khafagy, M. H., & Ibrahim, M. H. 2019. Recommender systems challenges and solutions survey. In 2019 International Conference on Innovative Trends in Computer Engineering, 149–155.
Moubayed, A., Injadat, M., Shami, A., & Lutfiyya, H. (2020). Student engagement level in an e-learning environment: Clustering using k-means. American Journal of Distance Education, 34(2), 137–156.
Mulligan, A., Hall, L., & Raphael, E. (2013). Peer review in a changing world: An international study measuring the attitudes of researchers. Journal of the American Society for Information Science and Technology, 64(1), 132–161.
Nguyen, T. T., Harper, F. M., Terveen, L., & Konstan, J. A. (2018). User personality and user satisfaction with recommender systems. Information Systems Frontiers, 20(6), 1173–1189.
Nowicki, S. (2003). Student vs search engine: Undergraduates rank results for relevance. Portal Libraries and the Academy, 3(3), 503–515.
Park, D. H., Kim, H. K., Kim, J. K., Choi, I. Y., & Kim, J. K. (2011). A review and classification of recommender systems research. International Proceedings of Economics Development & Research, 5(1), 290–294.
Patibandla, R. L., & Veeranjaneyulu, N. (2018). Survey on clustering algorithms for unstructured data. In Intelligent Engineering Informatics, 421–429.
Pradhan, T., Gupta, A., & Pal, S. (2020). Hasvrec: A modularized hierarchical attention-based scholarly venue recommender system. Knowledge-Based Systems, 204, 106181.
Rahul, M., Pal, P., Yadav, V., Dellwar, D. K., & Singh, S. (2021). Impact of similarity measures in K-means clustering method used in movie recommender systems. IOP Conference Series: Materials Science and Engineering, 1022(1), 012101.
Raschka, S. (2018). Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. ArXiv, abs/1811.12808
Ricci, F., Rokach, L., & Shapira, B. (2015). Recommender systems: introduction and challenges. Recommender Systems Handbook. https://doi.org/10.1007/978-1-4899-7637-6_1
Rodriguez, M. Z., Comin, C. H., Casanova, D., Bruno, O. M., Amancio, D. R., Costa, L. D. F., & Rodrigues, F. A. (2019). Clustering algorithms: A comparative approach. PLoS ONE, 14(1), e0210236.
Rollins, J., McCusker, M., Carlson, J., & Stroll, J. (2017). Manuscript Matcher: A Content and Bibliometrics-based Scholarly Journal Recommendation System. In BIR@ ECIR, 18–29.
Schuemie, M. J., & Kors, J. A. (2008). Jane: Suggesting journals, finding experts. Bioinformatics, 24(5), 727–728.
Shahshahani, M. S., Mohseni, M., Shakery, A., & Faili, H. (2019). PAYMA: A Tagged Corpus of Persian Named Entities. JSDP, 16(1), 91–110.
Sharma, R., & Singh, R. (2016). Evolution of recommender systems from ancient times to modern era: A survey. Indian Journal of Science and Technology, 9(20), 1–12.
Soundarya, V., Kanimozhi, U., & Manjula, D. (2017). Recommendation System for Criminal Behavioral Analysis on Social Network using Genetic Weighted K-Means Clustering. JCP, 12(3), 212–220.
Vara, N., Mirzabeigi, M., Sotudeh, H., Fakhrahmad, S. M., & Mozafari, N. (forthcoming). The impact of data lack and data sparsity on the effectiveness of the results of the ricest journal finder results: A case study in the field of engineering. Iranian journal of information processing and management.
Wang, W. T., & Hou, Y. P. (2015). Motivations of employees’ knowledge sharing behaviors: A self-determination perspective. Information and Organization, 25(1), 1–26.
Wang, D., Liang, Y., Xu, D., Feng, X., & Guan, R. (2018). A content-based recommender system for computer science publications. Knowledge-Based Systems, 157, 1–9.
Wang, G., He, X., & Ishuga, C. I. (2018). HAR-SI: A novel hybrid article recommendation approach integrating with social information in scientific social network. Knowledge-Based Systems, 148, 85–99.
Yuan, Z., & Luo, F. (2019). Personalized Diet Recommendation Based on K-means and Collaborative Filtering Algorithm. Journal of Physics: Conference Series, 1213(3), 032013.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vara, N., Mirzabeigi, M., Sotudeh, H. et al. Application of k-means clustering algorithm to improve effectiveness of the results recommended by journal recommender system. Scientometrics 127, 3237–3252 (2022). https://doi.org/10.1007/s11192-022-04397-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-022-04397-4