The rivalry between Bernini and Borromini from a scientometric perspective

From a historical point of view, Rome and especially the University of La Sapienza, are closely linked to two geniuses of Baroque art: Bernini and Borromini. In this study, we analyze the rivalry between them from a scientometric perspective. This study also serves as a basis for exploring which data sources may be appropriate for broad impact assessment of individuals and/or celebrities. We pay special attention to encyclopaedias, library catalogues and other databases or types of publications that are not normally used for this purpose. The results show that some sources such as Wikipedia are not exploited according to the possibilities they offer, especially those related to different languages and cultures. Moreover, analyses are often reduced to a minimum number of data sources, which can distort the relevance of the outcome. Our results show that other sources normally not considered for this purpose, like JSTOR, PQDT, Google Scholar, Catalogue Holdings, etc. can provide more relevant or abundant information than the typically used Web of Science Core Collection and Scopus. Finally, we also contrast opportunities and limitation of old and new (YouTube, Twitter) data sources (particularly the aspects quality and accuracy of the search methods). Much room for improvement has been identified in order to use data sources more efficiently and with higher accuracy.


Background
In 2019 the International Society for Scientometrics and Informetrics (ISSI) conference took place in Rome, the Eternal City, and was hosted by Sapienza University of Rome, one of the oldest universities founded in 1303 with the Papal bull. On this occasion, we presented a study showing the rivalry between Gian Carlo Bernini and

Introduction
The aim of this work is to estimate the resonance and prestige of these two geniuses several centuries after their death and to compare the footprints they have left in scholarly communication and beyond. A similar type of study has already been carried out in the past Marx 2011;Gorraiz et al. 2011Gorraiz et al. , 2015, which serves as a basis and inspiration. For this purpose, several types of bibliographic sources have been selected, which in turn are discussed in detail, both their capacity and their suitability to be applied for broader impact assessment.
In his most recent book Moed (2017a, b) describes which indicators can be used to measure different aspects of scientific communication in evaluative informetrics, and discusses their limitations as well as considerations that must be taken into account for a correct and responsible use. In this study we focus on data sources. We not only discuss the opportunities and limitations of old and new bibliographical and citation data sources, but also analyse their search features and suggest the most relevant information and indicators that can be extracted from them.
Among the sources considered are encyclopaedias, such as Wikipedia and Encyclopaedia Britannica, library catalogues, and several databases including the ones that are commonly used for bibliometric analyses like Web of Science Core Collection and Scopus (Moed 2017a, b). The results obtained in other databases with a more disciplinary scope (e.g. JSTOR), or dealing with other publication types such as dissertations (e.g. PQDT)-a document type of high potential relevance and usually not considered in this kind of analyses, or resulting in higher coverage like Google Scholar, Microsoft Academic and CrossRef all tend to provide more relevant or more abundant information than classis sources like WoS or Scopus (Hug et al. 2017;Moed 2017a, b). Last but not least, we are also compare these results with the ones obtained from other web sources like Twitter and You-Tube (Hammarfelt 2014).
Some authors have already demonstrated the incomplete use of some data sources such as Wikipedia and the danger of using these data to compile new global and composite indicators as for example the Altmetric Attention Score . The incompleteness of data can seriously distort the obtained results and the significance of their interpretation. There is a bias towards the included sources, whereas missing sources are disadvantaged.
Language, data availability, completeness and accuracy of the sources, and availability of indicators, are the issues to be tackled in this study.
An additional purpose of this article is to make broader the reach of scientometric studies, not just tackling the behaviour of current science. This paper is an extended version of the proceedings paper presented at the 2019 ISSI (International Society for Scientometrics and Informetrics (Wieland and Gorraiz 2019). A new analysis considering the Library Catalogues has been included. All the sections, especially methodology, results and conclusions have been accordingly enhanced.

Data sources and methodology
A large amalgam of data sources is used in the study: encyclopaedias and bibliographic dictionaries, library catalogues, citation databases, subject-specific databases, and some alternative web sources. They are summarized in four groups below: Encyclopaedias, biographical dictionaries and reference systems 1. Wikipedia is our main data source among the encyclopaedias, because it is already a major source of the most common systems that trace new metrics and altmetrics (see e.g. Altmetric.com or PlumX). Owned and supported by the Wikimedia Foundation, a non-profit organization that operates on money it receives from donors, Wikipedia is a multilingual, web-based, free encyclopaedia based on a model of openly editable and viewable content, a wiki (see https ://en.wikip edia.org/wiki/Wikip edia). Time magazine stated that the open-door policy of allowing anyone to edit had made Wikipedia the biggest, most popular and possibly the best encyclopaedia in the world, and was a testament to the vision of Jimmy Wales. One of the characteristics of Wikipedia is the language diversity, which seems not having been used enough for scientometric applications so far. There are currently 301 language editions of Wikipedia (also called language versions, or simply Wikipedias). Fifteen of these have over one million articles each, another four have over 500,000 articles, another 40 have over 100,000 articles, and another 78 have over 10,000 articles. 1 Actually, Wikipedia provides for each page very abundant and detailed information, comprising basic page information, page protection and properties, edit history, as well as page view statistics and WikiChecker (Katz and Rokach 2017). In our study, we are focusing on following parameters or indicators: (1) Number of language editions. This should inform about the degree of internalization, (2) Page length (in bytes), (3) Number of page watchers, (4) Number of page watchers who visited recent edits, (5) Number of redirects to this page, (6) Number of literature included (references, links, biographies, etc.), (7) Number of "What links here" (hyperlinks or web citations attracted) and (8) Number of page views and daily average (Wikipedia Contributors 2018).
Analyses have been performed not only in the English edition, but also in other main languages, namely German, French, Italian, Russian and Spanish. The results were then compared. Furthermore, the Chinese edition has also been considered in order to provide a comparison with an emerging language (Xing 2006).
According to prior studies comparing science articles from Wikipedia and Encycloplaedia Britannica (Giles 2005), Wikipedia's level of accuracy approached that of Britannica. Therefore we also compared the results from Wikipedia English with the ones resulting from Encyclopaedia Britannica online. 2. The Encyclopaedia Britannica (Latin for "British Encyclopaedia"), is the oldest Englishlanguage encyclopaedia still in production. It is written by about 100 full-time editors and more than 4000 contributors. The 2010 version of the 15th edition, which spans 32 volumes and 32,640 pages, was the last printed edition. Digital content and distribution has continued since then. In 1933, the Britannica became the first encyclopaedia to adopt "continuous revision", in which the encyclopaedia is continually reprinted, with every

Library catalogues
The importance of catalogue entries from a bibliometric point of view has already been studied (Torres-Salinas and Moed 2009; Zuccala and Guns 2013;Torres-Salinas et al. 2017). PlumX as a potential tool to assess the macroscopic multidimensional impact of books. For this study, we used WorldCat, the world's largest network of library-based content and services (Turner 2010;Bertot et al. 2012). WorldCat is a union catalog that itemizes the collections of 17,900 libraries in 123 countries and territories that participate in the Online Computer Library Center (OCLC) global cooperative (https ://en.wikip edia.org/ wiki/World Cat). The advanced search enables a search after the title, keyword, subject and author and the results can be refined by year, audience, content, format or/and language.

Databases
In order to find the most relevant subject-specific databases for the individuals Borromini and Bernini we also used Primo Search Engine hosted at the University of Vienna and named u:search. The two main bibliometric data sources, Web of Science Core Collection and Scopus, are well-known within the scientometric community. Besides them, we have included following databases: 1. JSTOR, a trusted full-text digital archive of over one thousand academic journals across the humanities, social sciences, and sciences, as well as select monographs and other 1 3 scholarly content. JSTOR provides access to more than 12 million academic journal articles, books, and primary sources in 75 disciplines (see https ://about .jstor .org/). 2. ARTbibliographies Modern (ABM) provides full abstracts of journal articles, books, essays, exhibition catalogs, PhD dissertations, and exhibition reviews on all forms of modern and contemporary art, with more than 13,000 new entries being added each year. Entries date back as far as the late 1960s. ABM is the premier source of information on modern and contemporary arts dating from the late nineteenth century onwards, including photography since its invention. It includes abstracts of English and foreignlanguage material on famous and lesser-known artists, movements, and trends. 3. ProQuest Dissertations and Theses Global (PQDT) enables the inclusion of theses and dissertations as important document types in our analyses (Andersen and Hammarfelt 2011;Gorraiz et al. 2011Gorraiz et al. , 2015. This data source is advertised as being the world's most comprehensive collection of dissertations and theses and includes more than 2.7 million searchable citations to dissertation and theses from around the world from 1861 to the present day together with 1.2 million full text dissertations that are available for download in PDF format. The results obtained from these three databases are compared with the ones resulting from WoS Core Collection (WoS CC) and Scopus. WoS CC included all the indexes (Proceedings, Books) and the Emerging Sources Citation Index since 2015.
Furthermore, Google Scholar , Microsoft Academic and Cross-Ref have been consulted via Publish or Perish (Harzing 2007).

Web tools
Finally, YouTube, an American video-sharing website headquartered in San Bruno, California and now operating as one of Google's subsidiaries, and Twitter, an American online news and social networking service on which users post and interact with messages known as "tweets", have also been used in an explorative way.
All searches were carried out in December 2018 according to the search options and syntaxes available in each data source. Search strategy and manual disambiguation were similar to the procedures described in previous studies (Gorraiz et al. 2011). Searches were performed separately in title, descriptors and abstracts fields, as in the topic field including all these options and in full text if available. Different documents and publication types were differentiated. The search for documents either related to Borromini alone or for both artists together did not present many difficulties, since the name Borromini is not very common. However, the search concerning Bernini did pose many serious difficulties, since it is a fairly common Italian name. Manual disambiguation was necessary to clean the data. This was practiced as long as the number of retrieved items was not too high and allowed to do so. Otherwise, other approaches were used, such as excluding the publications of the authors with that name (as long as it was not a Bernini biographer like his son Diego) or refining the search to certain fields or topics and excluding those of the natural sciences. In these cases, normally related to the search in full text, we have opted for a compromise between recall and precision as discussed in detail for each data source in the results section (Buckland and Gey 1994).
Citation analyses were conducted using the "Cited reference search" feature in WoS Core Collection. The number of citing documents and all citations to documents containing Borromini and/or Bernini in the research fields "Work" or "Author" were retrieved. Although Bernini did not publish anything, his works and exhibits are cited under his author's name. Borromini did publish only memoirs and notes, but also his architectural works are cited under his name. That is why we have included the two types of analysis in WoS CC. In Scopus the search was carried out with the help of the search option in "secondary documents".
Searches in Google Scholar, Microsoft Academic and CrossRef were performed using Harzing's tool "Publish or Perish" using the most relevant meaningful search fields for each data source. Even more difficult were the searches in the sources YouTube and Twitter.
These tools do not indicate the total number of items retrieved, and as the waiting time increases, their number also increases until a "potential saturation" is reached. Due to the volatility and low reliability of the retrieved data, we have limited ourselves to the search for both geniuses at the same time in this case, which is already very time consuming but reasonable. The results could then be checked manually in order to remove incorrect items.
The search difficulties and peculiarities found in each data source will also be discussed in detail in the next section.

Results
The results are presented in four groups according to the classification mentioned in the "Data sources and methodology" section. The results show a higher degree of internationalisation for Bernini. He is more popular than Borromini according to the number of language editions (66 vs. 51) as well as according to the numbers of page views. For Bernini, the most viewed edition is the English one, but not for Borromini, where the national interest seems to be higher as the international one. The only possible explanation we have found for that maximum peak is Bernini's short film that premiered in that year (https ://www.imdb.com/title /tt628 9758/). This peak is also visible in Bernini's Italian edition and in Borromini's Italian Edition. Bernini's December 2016 peak seems to be much more pronounced in the English version of Wikipedia than in the Italian version.
The trends of both artists correspond quite well and corroborate that their stories are strongly connected, even if the Borromini peak in the Italian versions seems to take place not in December 2016 but somewhat earlier. Table 2 gives an in-depth view of the Wikipedia results for both artists in the major six language editions (see "Data sources and methodology" section) and the Chinese Edition. Table 2 shows that there are notable differences between the analysed language editions, and that each one provides only a partial view according to the language. It should be noted that the pages that link to a selected language edition were also different for each considered language and the majority of them originates from the same edition. The number of attracted hyperlinks by Borromini's page is also higher in the Italian version than in the English one. The German, Spanish and French Editions also contribute with a considerably  high number of hyperlinks as well for Bernini as for Borromini. The results also reveal special particularities according to the language editions, although they are certainly very closely associated with the interests and individual bibliographic habits of their creators. The results obtained from the online version of Encyclopaedia Britannica are summarised in Table 3.
In agreement with the English edition of Wikipedia, Bernini' entry is larger and more complete than Borromini's though. On the other hand, Britannica focuses on providing a selection of the most reliable information, for example, only the "web' best sites". Under "additional readings" the user can also find a selection of the recommended bibliographies, all of them that are annotated. Furthermore, the identity of the creators of this information and their affiliation is provided clearly and transparently. Table 4 shows the results from WBIS Online and Oxford Reference. It should be noted that WBIS lists the number of biographical entries in the language archives, while all searches in Oxford Reference are sent to the Full Text. This explains the high difference between the number of items retrieved in both sources.
Excluding the Italian Archive, Bernini appears in two additional archives (BAChr and ABF), Borromini only in one. It seems paradoxical that Bernini appears in the archive of Christianity when Borromini is not, since the latter was the most fervent and devout Christian. Borromini's entry in the German Archive is due to his high popularity in Switzerland, where he was featured on the 6th series of the 100 Swiss Franc banknote, which was in circulation from 1976 until 2000. 3 On the other hand, Bernini's entry in the French Archive is explained by his politically forced visit to France, where he was working for King Louis  XIV, who required an architect to complete the royal palace of the Louvre (Gould 1982;Morrissey 2006). The results from Oxford Reference System shown in Table 5 originated only from Oxford University Press's Dictionaries, Companions and Encyclopaedias. Therefore, they should be considered of high relevance and reliability. The same applies for the results collected across Festschriften (see Table 6).
All these sources show the same trend: Bernini attracts many more mentions and rubrics than Borromini.

Library catalogues
The results from WorldCat for Bernini, Borromini and both appearing simultaneously are summarised in Tables 7 and 8. World Cat is a very interesting data source because it enables a differentiation of the retrieved results according to their content (Biography, Fiction and non-Fiction) as well their format. Table 7 illustrates the rich variety of publication types or formats retrieved for both artists. They hint at the high importance of books and monographs. They also inform on the number of formats that remain neglected in the usual assessment of the broad impact of scientists and artists, and that are undoubtedly closely associated with their societal impact.
Furthermore, WorldCat also offers a differentiation according to language (see Table 8).

Databases
The results obtained from the databases selected for this study, PQDT, JSTOR 4 and ABM (Table 9) are compared with the ones obtained in the classical bibliometric sources, WoS CC and Scopus (Table 10), as well with the ones obtained from Google Scholar, Microsoft Academic and CrossRef via Publish or Perish (Table 11). Considering the number of citing articles, the ratio between Bernini and Borromini scores seems to be higher in WoS than in Scopus (see Table 10). This is probably due, to the effect language, because Sopus indexes much more Italian and regional sources.  (94) French (280) French (13) Latin (83) Spanish (221) Spanish (6) Spanish (49) Dutch (53) Polish (2) Japanese (22) Japanese (37) Portuguese (2) Dutch (8) Latin (36) Dutch (1) Polish (8) Polish (26) Hebrew (1)

Table 9
Results from PQDT, JSTOR and ABM All the values in italics were not corrected manually Database   In the case of Google Scholar and CrossRef the most appropriate search option was the search in Title, because the search in "All the words" also included publications from authors named Bernini and not referring to our person. To exclude them manually would require exhaustive work that would not justify its value or relevance for this study. A maximum of 1000 results can be retrieved per search. In the case of the search for Bernini in Title (< 1000), the results were downloaded in four tranches of around 400 items. The searches for Borromini and "Borromini AND Bernini" could also be performed successfully in the field "All the words". For Borromini in "all words" 14 downloads were necessary. The corresponding results for Google Scholar from Table 11 show clearly the difference between high "precision" and relevance (Search in Title) and high "recall" but minor relevance as a simple mention in the full text document.In Microsoft Academic the research for all the words did not include the author field and could therefore be applied more easily.
All the results show the same trend: Bernini attracts many more mentions than Borromini either in Title or in Topic or in the full text. The results confirm the lower coverage of Scopus and WoS CC especially in comparison with JSTOR and Google Scholar.  Google Scholar is the data source providing the highest scores (papers and citations) for both the search in Title and in full text. The results from Microsoft Academic are considerably lower than the ones from Google Scholar (Hug et al. 2017). The results from CrossRef are very similar to the ones resulting from Scopus. All these results hint at the urgent need to include these data sources in the analyses grasping at assessing the broad impact.

Web tools
The results gained from YouTube and Twitter are summarised in Table 12 Comparing with the data resulting from Wikipedia and limiting to the same period (2015-2018) the views in YouTube also reach a peak in 2016. 5 While the information originating from YouTube was found of high interest, the one originating from Twitter was almost reduced to visitors' likes or displays of admiration in front of their works of art. The number of replies in Twitter was extremely low, but therefore contained interesting background information. 6

Conclusions
Beyond doubt historical celebrities like Bernini and Borromini are a good choice for a bibliometric study in order to reveal appropriate data sources for broad impact assessment in the scholarly community and beyond the "scholarly realm". Both artists and architects have left a rich legacy for posterity. Our study corroborates that their works have lost none of their timeliness and regency throughout the centuries and continue to be obligatory references in the world of the Arts and Humanities.
Our results also clearly show that sources normally not considered for comparable bibliometric analyses, like JSTOR, PQDT, Google Scholar, etc. in fact provide more relevant or abundant information than the usual suspects, like the Web of Science Core Collection and Scopus.
Today we are forced to respond to the manifold challenges of the digital and virtual eras, and we therefore constantly struggle to expand our data universe in order to paint a more complete picture of the broad impact assessment of individuals. It is therefore crucial to identify the most essential and appropriate data sources for each discipline, and always critically challenge their completeness, suitability and efficiency.
It is definitely insufficient to only count mentions and others signals collected in blogs, social media and further tools obtained from Web 2.0. Citations and/or mentions 1 3 derived from databases are still important, but here we should not only rely on traditional bibliographic data sources like WoS or Scopus. Particularly for subjects in Arts and Humanities it is important to broaden the scope of data sources with regard to subject specificity and coverage of other document types than research articles. Our typical data sources for citations and mentions still tend to be extremely reduced without consideration of other document types, like theses or monographs. These publication types should not be missing in any attempt to estimate the broad impact generated especially in the disciplines of social sciences, arts and humanities.
Moreover, our study reveals the enormous potential of Wikipedia. In the reduced way how tools like Altmetric.com or PlumX already exploit these data sources, we actually lose much of the new opportunities and the rich information they could provide. Many possibilities, especially those related to different languages and cultures, are currently ignored. Our study shows that languages play an important role particularly in the social sciences and the humanities, where linguistic and/or regional cultural factors are key. The differences observed between the English and Italian Wikipedia editions for Bernini and Borromini are certainly not casual. In an additional analysis, we compared the English and German versions for 20 German and Austrian personalities, and in almost all of them the German edition is more complete and receives more attention, except for those that have achieved great international fame.
Reducing the counts to the English version of Wikipedia cannot be regarded as best practice and potentially hampers the reliability of the assessment. Page views for each language edition in Wikipedia should at least be regarded equally or even more significant than the number of tweets or likes in social media for the assessment of the attention a subject has received on the web (see also Katz and Rokach 2017).
It is very interesting that the results from WorldCat are in complete agreement according to the language differences with the ones reported for Wikipedia. Previous studies have already shown, that fame scales based on Wikipedia coverage and on libcitation counts are very significantly associated (e.g. White and Zuccala 2018).
Italian language is the most predominant for Borromini and not English as for Bernini, as well as for their rivalry (Borromini and Bernini). These results hint at the fact that a differentiation between international and national impact is very suitable.
An analysis of the reasons for Borromini's low international impact can also be of great interest. It is obvious that some personalities may not have achieved international recognition because their works did not surpass the required threshold. But some others, perhaps, have not yet been recognized for their merits although they deserve it. They are like a sort of "sleeping celebrities" waiting for their worldwide recognition. In this case, their extremely great national impact could help us to identify or rediscover them.
For example, in our case study and according to Morrissey, "Borromini marked the end of an extraordinary career, one that would have made him the undisputed architect of Rome and the founder of the era known as the Baroque had it not been his fortuneor misfortune-to have lived during the lifetime of an artist whose acknowledged talent, worldwide reputation, and enormous success bedeviled Borromini to the very end: Gianlorenzo Bernini.
New data sources and metrics (Thelwall and Kousha 2015) can of course be considered in a complementary way, but their significance should always be challenged and checked. According to our results YouTube can be a very rich and promising data source in addition to the world of publications. Nevertheless, there are some issues to be tackled, like especially the instability of the data and the poor syntax not allowing the perform a precise search. The significance of Twitter turned out to be very low, though.
"Publish or perish" has proven to be a valuable tool for tracking mentions, but the syntax does not yet allow to conduct complex searches.
Both the classic and the new data sources show much room for improvement in order to be used more efficiently, and further studies of this kind are necessary to make more well-grounded statements.
The process of citing or mentioning is a process of equal parity, since it is one publication that cites or mentions another one, and the two are comparable. Whereas the situation is quite different in the realm of new metrics. Here it is a user-sometimes not even the author of anything-who views, downloads, comments or discusses a publication. For this reason, there is also an essential difference in the effort required in both processes. Citations are based on a creative act, such as publishing (or writing a publication), while the other indicators are based, instead, on a mere reaction, such as a comment to something that has been seen or read, or on a mere action-such as pressing or activating a button or icon (Gorraiz 2018). Therefore, there is a danger that these new metrics open the door to a radical change in the sciences and turn them into a marketing rather than a merit game.
On the other hand, this whole new internet universe has also exploded the number of indicators that we can collect quickly and easily. Being aware that nowadays a new publication appears every second, and this in turn generates an endless number of visits, downloads, comments, likes and tweets, and many other types of comments, discussions, or mere reactions, we face without exaggeration, a new danger. It has been called the "Tower of Babel" effect, for giving it a biblical accent (Gorraiz 2018). Curiously, the lantern of Sant' Ivo is topped with a spiral shape, and brings to mind this very tower of babel, another ancient (if counterintuitive) symbol of wisdom (Morrissey 2006), as Borromini also wanted to warn us that both concepts are closely linked to each other.
Finally, the winner of both Baroque opponents may be Bernini (and his footsteps and shadows may be more numerous and international than Borromini's). However, it is noteworthy that the rivalry between these two geniuses has already become legendary, as it is well reflected in one of the tweets analysed during this study: "Ronaldo and Messi are this generation's Bernini and Borromini".