Abstract
In this paper, the authors present an approach to benchmarking the collections of scientific journals based on the analysis of co-authorship graphs and a text models. The main methodical result is Comparative Topic Modeling (CTM) technique. The application of time series to the metrics of co-authorship graphs allowed trends in the development of author collaborations in scientific journals to be analyzed. A text model was created using machine learning methods. The content of journals was classified to determine the degree of authenticity both in various journals and their issues. Experiments was conducted on the archives of two journals in the field of Rheumatology. The authors used public data sets from the SNAP research laboratory at Stanford University to benchmark the co-authorship network metrics. The application of the research results is improving editorial strategies for development of co-authorship collaborations and scientific content excellence.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aizawa, A.: An information-theoretic perspective of TF-IDF measures. Inf. Process. Manage. 39(1), 45–65 (2003)
Alba, R.D.: A graph-theoretic definition of a sociometric clique. J. Math. Sociol. 3(1), 113–126 (1973)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics (2007)
Bholowalia, P., Kumar, A.: EBK-means: a clustering technique based on elbow method and K-means in WSN. Int. J. Comput. Appl. 105(9), 17–24 (2014)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Bondy, J.A., Murty, U.S.R., et al.: Graph Theory with Applications, vol. 290. Citeseer (1976)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cunningham, S.J., Dillon, S.M.: Authorship patterns in information systems. Scientometrics 39(1), 19 (1997)
Egghe, L., Rousseau, R., Van Hooydonk, G.: Methods for accrediting publications to authors or countries: consequences for evaluation studies. J. Am. Soc. Inf. Sci. 51(2), 145–157 (2000)
Farkas, I., Derényi, I., Jeong, H., Neda, Z., Oltvai, Z., Ravasz, E., Schubert, A., Barabási, A.L., Vicsek, T.: Networks in life: scaling properties and eigenvalue spectra. Physica A: Stat. Mech. Appl. 314(1–4), 25–34 (2002)
Garfield, E.: Is citation analysis a legitimate evaluation tool? Scientometrics 1(4), 359–375 (1979)
Hofmann, T.: Probabilistic latent semantic indexing. In: ACM SIGIR Forum, vol. 51, pp. 211–218. ACM (2017)
Kleene, S.C.: Representation of events in nerve nets and finite automata. Technical report, RAND PROJECT AIR FORCE SANTA MONICA CA (1951)
Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31
Krasnov, F., Sen, A.: The number of topics optimization: clustering approach. Mach. Learn. Knowl. Extr. 1(1), 416–426 (2019)
Krasnov, F., Ushmaev, O.: Exploration of hidden research directions in oil and gas industry via full text analysis of OnePetro digital library. Int. J. Open Inf. Technol. 6(5), 7–14 (2018)
Kucera, H., Francis, W.N.: Computational Analysis of Present - Day American English. Dartmouth Publishing Group, Hanover (1967)
Law, J., Zhuo, H.H., He, J.H., Rong, E.: LTSG: latent topical skip-gram for mutually improving topic model and vector representations. In: Lai, J.-H., et al. (eds.) PRCV 2018. LNCS, vol. 11258, pp. 375–387. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03338-5_32
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Discovery Data (TKDD) 1(1), 2 (2007)
Lovins, J.B.: Development of a stemming algorithm. Mech. Translat. Comp. Linguist. 11(2), 22–31 (1968)
Lu, X., Zheng, X., Li, X.: Latent semantic minimal hashing for image retrieval. IEEE Trans. Image Process. 26(1), 355–368 (2016)
Lucas, C., Nielsen, R.A., Roberts, M.E., Stewart, B.M., Storer, A., Tingley, D.: Computer-assisted text analysis for comparative politics. Polit. Anal. 23(2), 254–277 (2015)
Naik, R.R., Landge, M.B., Mahender, C.N.: A review on plagiarism detection tools. Int. J. Comput. Appl. 125(11) (2015)
Newman, M.E.: Scientific collaboration networks. i. Network construction and fundamental results. Phys. Rev. E 64(1), 016131 (2001)
Newman, M.E.: Analysis of weighted networks. Phys. Rev. E 70(5), 056131 (2004)
Packard, D.: Computer-assisted morphological analysis of ancient Greek. In: COLING 1973 Volume 2: Computational And Mathematical Linguistics: Proceedings of the International Conference on Computational Linguistics, vol. 2 (1973)
Porter, M.F.: Snowball: a language for stemming algorithms (2001)
Schwenk, H., Gauvain, J.L.: Connectionist language modeling for large vocabulary continuous speech recognition. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, p. I-765. IEEE (2002)
Segalovich, I.: A fast morphological algorithm with unknown word guessing induced by a dictionary for a web search engine. In: MLMTA, pp. 273–280. Citeseer (2003)
Sharoff, S., Nivre, J.: The proper place of men and machines in language technology: processing Russian without any linguistic knowledge. In: Proceedings of Dialogue 2011, Russian Conference on Computational Linguistics (2011)
Smeaton, A.F., Keogh, G., Gurrin, C., McDonald, K., Sødring, T.: Analysis of papers from twenty-five years of SIGIR conferences: what have we been doing for the last quarter of a century? In: ACM SIGIR Forum, vol. 37, pp. 49–53. ACM (2003)
Teahan, W.J., Cleary, J.G.: The entropy of English using PPM-based models. In: DCC, p. 53. IEEE (1996)
Teahan, W., Cleary, J.G.: Models of English text. In: 1997 Proceedings of Data Compression Conference, DCC’97, pp. 12–21. IEEE (1997)
Thompson, K.: Programming techniques: regular expression search algorithm. Commun. ACM 11(6), 419–422 (1968)
Vorontsov, K., Potapenko, A.: Additive regularization of topic models. Mach. Learn. 101(1–3), 303–323 (2015)
Wang, X., Ren, J., Zhang, Y., Zhu, D., Qiu, P., Huang, M.: China’s patterns of international technological collaboration 1976–2010: a patent analysis study. Technol. Anal. Strateg. Manag. 26(5), 531–546 (2014)
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, vol. 8. Cambridge University Press, Cambridge (1994)
Weizenbaum, J.: Eliza–a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1), 36–45 (1966)
Wiederhold, G.: Intelligent integration of information. In: ACM SIGMOD Record, vol. 22, pp. 434–437. ACM (1993)
Willett, P.: The porter stemming algorithm: then and now. Program 40(3), 219–223 (2006)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Zhao, W.X., et al.: Comparing Twitter and traditional media using topic models. In: Clough, P., et al. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 338–349. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20161-5_34
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Krasnov, F., Dimentov, A., Shvartsman, M. (2019). Comparative Analysis of Scientific Papers Collections via Topic Modeling and Co-authorship Networks. In: Ustalov, D., Filchenkov, A., Pivovarova, L. (eds) Artificial Intelligence and Natural Language. AINL 2019. Communications in Computer and Information Science, vol 1119. Springer, Cham. https://doi.org/10.1007/978-3-030-34518-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-34518-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34517-4
Online ISBN: 978-3-030-34518-1
eBook Packages: Computer ScienceComputer Science (R0)