Abstract
The visualization of large data sets from Polish digital libraries requires proper preparation of a comprehensive consolidated data set. Differences in the organizational systems of digital resources, and other factors affecting the heterogeneity of distributed data and metadata, require the use of clustering algorithms. To achieve this goal, the authors decided to use the PCA method and compare it with k-means results. PCA fulfills the condition of efficient size reduction for multidimensional data but is largely sensitive to deviations and differences in stochastic distributions. To eliminate the problem of noise in the input data, the deterministic model in the form of the Langevin function was used first. This leads to the “flattening” of the distribution of factors influencing the data structure. Due to such an approach, the most relevant categories to information systems were distinguished and Polish digital libraries were visualized.
Keywords
- Digital libraries
- Object type
- PCA
- k-means clustering
This research is sponsored by National Science Center (NCN) under grant: 2013/11/B/HS2/03048/Digital knowledge structure and dynamics analysis by means of visualisation
This is a preview of subscription content, access via your institution.
Buying options





References
Mazurek, C., Werla, M.: Federacja Bibliotek Cyfrowych – studium przypadku. In: Janiak, M., Krakowska, M., Próchnicka, M. (eds.) BIBLIOTEKI CYFROWE, pp. 225–239. SBP, Warszawa (2012). (in polish)
Werla, M.: Metadane dokumentów w bibliotekach cyfrowych (2010). http://lib.psnc.pl/Content/284/CPI-Werla.pdf. Accessed 18 Apr 2018. (in polish)
Calhoun, K.: Exploring Digital Libraries: Foundations, Practice, Prospects. Neal-Schuman, Chicago (2014)
Costello, L.: Title, description, and subject are the most important metadata fields for keyword discoverability. Evid. Based Libr. Inf. Pract. 11(3), 88–90 (2016)
Xie, I., Matusiak, K.: Discover Digital Libraries: Theory and Practice. Elsevier, Amsterdam (2016)
Zavalina, O.L.: Complementarity in subject metadata in large-scale digital libraries: a comparative analysis. Cataloging Classif. Q. 52(1), 77–89 (2014)
Osinska, V., Matusiak, K., Kowalska, M., Malak, P., Bednarek-Michalska, B.: Distribution of date elements and its relationship to the types of digital libraries. J. Librarianships Inf. Sci. 11 (2017)
Osinska, V., Malak, P.: Maps and mapping in scientometrics. In: Goralska, M., Wandel, A. (eds.) Tools and Methods for Analysing the Scientific Literature and Readers, pp. 59–73. WUW, Wrocław (2016)
Schramm, P., Oppenheim, I.: Properties of noise correlation functions of Langevin-like equations. Phys. A Stat. Mech. Appl. 137(1–2), 81–95 (1986)
Abrahams, E., Keffer, F.: Langevin Function. AccessScience. McGraw-Hill Education (2014)
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev. Comput. Stat. 2(4), 433–459 (2010)
Egghe, L., Leydesdorff, L.: The relation between Pearson’s correlation coefficient r and Salton’s cosine measure. JASIST 60(5), 1027–1036 (2009)
Osinska, V., Osinski, G., Komendzinski, T.: Altmetrics and visualisation – the complementary tools for analysing scientific collaboration and behaviour on researchgate. Cult. Educ. 4(110), 105–121 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Osinski, G., Osinska, V., Malak, P. (2019). PCA Algorithms in the Visualization of Big Data from Polish Digital Libraries. In: Choroś, K., Kopel, M., Kukla, E., Siemiński, A. (eds) Multimedia and Network Information Systems. MISSI 2018. Advances in Intelligent Systems and Computing, vol 833. Springer, Cham. https://doi.org/10.1007/978-3-319-98678-4_52
Download citation
DOI: https://doi.org/10.1007/978-3-319-98678-4_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98677-7
Online ISBN: 978-3-319-98678-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)