Telling the story of solar energy meteorology into the satellite era by applying (co-citation) reference publication year spectroscopy

Studying the history of research fields by analyzing publication records and topical and/or keyword searches with reference publication year spectroscopy (RPYS) has been introduced as a powerful tool to identify the corresponding root publications. However, for some research fields (e.g., rather new and interdisciplinary fields) like solar energy meteorology, encompassing such research fields via a keyword- or topic-based search query is not feasible to get a reasonably exhaustive publication set. Therefore, we apply its variant RPYS-CO to all publications co-cited with two highly important marker papers, using the cited references explorer for inspecting the RPYS-CO results. We obtain two lists of seminal papers, which are able to adequately tell us the story of solar energy meteorology up to the 1990s, respectively in its subfield using satellite-based methods for solar irradiance estimation even to very recent years. Consequently, we recommend this method to gain valuable insights in (new) research fields.


Introduction
Solar Energy Meteorology (SEM) studies how solar radiation can be utilized for solar energy conversion to provide heat or electricity to energy systems and how the performance of these conversion processes is affected by meteorological influences. The mainly interesting properties of solar radiation in this respect are its availability in time (e.g., time of day, year) and space (e.g., geographical location, angular orientation). Especially the latter is decisive for its exploitation capacity in different devices of energy conversion: Concentrating devices like parabolic trough power plants need direct radiation to work whereas non-concentrating devices as photovoltaic solar cells or flat-plate solar thermal collectors can also utilize the diffuse fraction of sunlight, usually scattered in the atmosphere. The total sum of both parts is called global irradiance. So, major fields of investigation in SEM are (1) measurements and their evaluation over different time scales and (2) modeling of radiation and its components, depending on physical (e.g., available solar radiation due to Planck's law and extinction processes in the atmosphere), geometrical (e.g., position of the sun, orientation of the converter) and meteorological (e.g., cloud coverage, aerosol concentration) parameters. Both fields also involve a large amount of statistical treatment.
The main instruments of solar radiation measurement had for a long time been ground-based pyranometers, integrating the incident sunlight from the hemisphere above a planar surface to the global irradiance, or pyrheliometers, detecting the direct sunlight within a narrow solid angle centered around the source of radiation-the direct beam irradiance. The instruments as well as their maintenance are costly and time consuming, and therefore high-quality data were only available for a limited number of locations. With the availability of geostationary weather satellites, for example the GOES generation, operated by the US American NOAA from 1975 onwards (https ://www.goes-r.gov), and the METEOSAT series, operated by EUMETSAT from 1977 onwards (https ://www.eumet sat.int/), solar radiation data in the visible and infrared part of the electromagnetic spectrum in a high temporal resolution (15 to 30 min) and with an almost continuous spatial coverage became accessible. Now the main task was the assessment of the radiation reaching the earth's surface from the on-board radiometer's digital counts, i.e. the digitally coded values of the measured radiation, always including a certain offset.
Two main methodologies to tackle this task have been developed from the late 1970s on, usually subdivided in the statistical and the physical approach. Statistical models perform a regression between the radiometer counts for a target area on earth and the available ground measurements, which are still today seen as "ground truth". The physical approach is largely based on radiative transfer models, which explicitly describe the extinction processes of scattering and absorption in the atmosphere and on the ground and need no ground measurements, but complementary meteorological data to estimate the relevant quantities. Physical models are more general and need no adaptation to target areas as the statistical ones do, but the latter are-due to their simplicity-more suited for time-critical, practical calculations and therefore extensively used in near real-time operational services and the building up of databases of long-term time series of a large spatial coverage.
One of the authors was involved in this work as a member of Oldenburg University's research group on Energy Meteorology for some time, so it came naturally to investigate this field of research with bibliometric methods in order to identify seminal and landmark papers, that lead up to and through the satellite era in SEM. But due to its intrinsic interdisciplinarity, potential search terms tend to have multiple meanings, which leads to answer sets from title or topical searches with low precision (i.e., fraction of retrieved relevant papers out of all of the retrieved papers) and/or recall (i.e., fraction of the retrieved relevant papers out of the total number of relevant papers). To illustrate this, we compare search results for the two subjects "Climate Change" and "Energy Meteorology" in the WoS, having searched (on 9 April 2020) in title, abstract and keywords. With ts = "climate change" in the WoS advanced search mode more than 200,000 papers which definitely deal with the very subject of climate change are found. Approximately half of the research area is covered by this simple WoS search (Haunschild et al. 2016b(Haunschild et al. , 2019c. However, ts = "energy meteorology" yields only 36 papers with absolute thematic precision but very low recall. A broader search with proximity operators would significantly decrease the precision, but not sufficiently increase the recall. Therefore, we use a certain variant of the bibliometric method "Reference Publications Year Spectroscopy" (RPYS), where there is no need for an exhaustive paper set covering most of the research field. RPYS had been introduced by Marx et al. (2014) in order to identify the historical roots of research fields. To this purpose the citation frequency of references in the publication corpus of a specific research field is analyzed in terms of the publication years of these cited references. A plot of the reference publication years (RPY) versus the number of cited references (NCR) usually shows peaks. The more or less pronounced peaks will be primarily caused by important papers, which stand out against others from the same year by a relatively high citation count inside the whole of the analyzed publication corpus. Previously, RPYS analyses were applied to a scientific field (e.g., Higgs boson research , dentistry and neurosciences (Yeung 2017;Yeung et al. 2019), health equity (Yao et al. 2019), or density functional theory (Haunschild et al. 2016a(Haunschild et al. , 2019aHaunschild and Marx 2019), a journal (e.g., the journal Ecological Economics (Ballandonne 2019), the journal FEMS Microbiology Letters (Haunschild et al. 2019b), and the journal MDPI Information (Haunschild 2019)), or the oeuvre of a researcher (e.g., oeuvre of Eugene Garfield (Bornmann et al. 2018)).
RPYS has also successfully been applied to identify the root publications of climate change research by Marx et al. (2017). In that case, a set of more than 200,000 papers has been used. In a subsequent approach in the same paper, Marx et al. (2017) refined this large set to the greenhouse effect by keeping only cited references that are co-cited with Arrhenius (1896) and were able to retrieve the results of the RPYS on the full publication set regarding the greenhouse effect, but also lesser known works of relevance. They named this RPYS variant RPYS-CO (for co-citation), because here the analyzed publication set is defined by all publications co-cited with at least one marker paper. In another very recent study, Haunschild andMarx (2019, 2020) compared their own results of an RPYS on density functional theory, a very frequently applied method in computational chemistry (Haunschild et al. 2016a), with an RPYS-CO using one single seminal paper with a high citation count and strong relevance in the field as a marker paper. They found a striking similarity with the results of the analysis based on a search in a controlled vocabulary.
Encouraged by these results, we set out here to investigate all publications co-cited with one highly-cited marker paper, a choice discussed with and corroborated by the long term leader of the Oldenburg group: The interrelationship and characteristic distribution of direct, diffuse and total solar radiation (Liu and Jordan 1960). In order to indicate the importance of this marker paper we quote its complete abstract, emphasizing in italics all those concepts and terms that proved to be prevalent in SEM for its whole history: "Based upon the data now available, this paper presents relationships permitting the determination on a horizontal surface of the instantaneous intensity of diffuse radiation on clear days, the long term average hourly and daily sums of diffuse radiation, and the daily sums of diffuse radiation for various categories of days of differing degrees of cloudiness. For these determinations, it is necessary to have, either from actual measurements or estimates, a knowledge of the total (direct plus diffuse) radiation on a horizontal surface-its measurement is now regularly made at 98 localities in the United States and Canada. For localities where only an estimate of the long term average total radiation is available, relationships presented in this paper can be utilized to determine the statistical distribution of the daily total radiation at these localities." (Liu and Jordan 1960, abstract).
Satellite-based studies often view this paper as text book knowledge. Therefore, it is affected by obliteration by incorporation (McCain 2014) and rarely cited in this area of SEM, that gained traction in the 1980 s and early 1990s. The latter are consequently a natural end date for our study of cited references, co-cited with Liu and Jordan (1960). On the other hand, due to the recency of the research field, we do not expect decisive contributions leading to SEM before 1900. The more recently flourishing satellite-based publications need to be studied using another marker paper, for which an appropriate choice is: A method for the determination of the global solar radiation from meteorological satellite data (Cano et al. 1986). This paper had also been recommended by the long-term expert being a reference point for all the work of his group and other groups in the world-wide community.
There are some studies with a very different time frame, focus or methodology, e.g.: Du et al. (2014) analyzed the solar energy literature from 1992 to 2011, but without special consideration of energy meteorology topics. A bibliometric analysis on solar power research between 1991 and 2010, again after the period of the first part of our study, has been performed by Dong et al. (2012) using terms as, e.g., "solar radiation" in a topic search in the WoS. Their goal was to identify research trends for the twenty-first century and not to explore historical roots. In the same vein, Yang et al. (2018) tried to identify key innovations for the future of research in "solar radiation and PV power forecasting", a field mainly emerging at the turn of the millennium. They based their work on the first 1000 hits of a keyword search in Google Scholar and applied machine learning and text mining methods to full texts in order to complement conventional topical reviews.
To the best of our knowledge, the present study is one of the first using the method RPYS-CO in order to identify seminal papers for a research field-thereby complementing qualitative knowledge of experts by a quantitative evaluation of the citation counts (i.e., the reference counts within the topic-related literature). Using this method, we are confident to find those important contributors and their papers which tell the story of the emergence of solar energy meteorology from around 1900 up to the beginning of the 1990s and how the availability of huge amounts of satellite-derived data was taken up by the SEM community by developing new scientific calculation schemes and operational services driven by research groups as well as public and private institutions. So, we support the suggestion of Haunschild and Marx (2019) that this method can help researchers to explore their field of study-in a way complementary to a usual topic or keyword-based literature search.
The present study is an extended version of a paper presented at the 17th International Conference on Scientometrics and Informetrics, Rome (Italy), 2-5 September 2019 (Scheidsteger and Haunschild 2019). The new material comprises the second focus on satellite-based methods, including the related explanations in the "Introduction", the bibliometric characterization of the second marker paper in "Method and data set" and its investigation and presentation as sub-section "RPYS-CO using the second marker paper" in the "Results" section, including  Table 2, and parts of the "Discussion" section. Moreover Fig. 6 displays a further confirmation of the results in the sub-section "RPYS-CO using the first marker paper".

Method and data set
As of 8 January 2019, the marker paper Liu and Jordan (1960) had 1031 citing papers in the WoS until the end of 2018. One fourth of these papers (n = 257, 25%) as well as the marker paper itself have been published in a single journal, Solar Energy. The four most important WoS subject categories in the data set used in this study are Energy Fuels (n = 673, 65%), Green Sustainable Science Technology (n = 151, 15%), Meteorology and Atmospheric Sciences (n = 131, 13%), and Thermodynamics (n = 114, 11%), thereby reflecting the multiple foci of SEM.
We downloaded the bibliographic data of the 1032 papers including 36,635 cited references (CRs) from the WoS (selecting "Save to Other File Formats" and "Other Reference Software") and imported them into the CRExplorer. (The Java-based software can be downloaded for free from http://crexp lorer .net and a comprehensive handbook explaining all functions is also available.) It provides a graphical display of the NCR over the RPY and a tabular presentation of the NCR of all CRs. In our case there were only single occurrences of CRs before 1900.
Much of the processing can comprehensively and reproducibly be done by using the CRExplorer scripting language: With the script in Fig. 1 we imported the WoS file and got 8383 unique reference variants for the reference publication years 1900 to 1995. After that clustering and merging of equivalent CR variants was done with Levenshtein threshold 0.75 and taking volume and (starting) page number into account, thereby reducing the number of CR variants by 109. Then we removed all publications with only one citation, in order to reduce noise. In the end, we retained 1566 CRs. The results including the NCR and other indicators were exported to CSV files for further inspection and plotting of the spectrogram, which can be done by using the R package BibPlots (see: https ://cran.r-proje ct.org/web/packa ges/BibPl ots/index .html and https ://tinyu rl.com/y97bb 54z).
In the spectrogram, we looked for publication years with significantly higher NCR than other years, aided by the deviation of NCR from the 5-year-median of NCR (taking into account the two preceding and the two following years). For the papers that, by applying this methodology, seemed primarily responsible for the peaks a manual merging was done, if needed.
For the subfield of satellite-based SEM we took Cano et al. (1986) as marker paper.
As of 30 January 2019, it had 293 citing papers in the WoS until the end of 2018, which seems to be rather low compared to the choice of other marker papers in the literature, but taking into account that it only represents a small substream of SEM, which we are going to treat with just over 1000 citing papers, we expect no serious disadvantage.
One fifth of the papers citing Cano et al. (1986) (n = 60, 20%) as well as this marker paper itself have been published in the journal Solar Energy. The three most important WoS subject categories in the data set used in the second part of this study are Energy Fuels (n = 126, 43%), Remote Sensing (n = 78, 27%), and Meteorology and Atmospheric Sciences (n = 58, 20%), covering 87% of all publications. Fig. 1 CRExplorer script to perform RPYS on the WoS papers citing Liu and Jordan (1960) We downloaded the bibliographic data of the 293 papers citing Cano et al. (1986) including 11,549 CR variants coming from 6723 unique CRs from the WoS, imported them into the CRExplorer and treated them analogously to the marker paper for the entire SEM field. We retained 1350 unique CRs between 1922 and 2017, cited at least twice. Figure 2 shows the spectrogram of the RPYS-CO for the SEM marker paper (Liu and Jordan 1960) in terms of the NCR and their 5-year-median deviation for the whole analyzed time period, and Table 1 lists all publications, contributing substantially to the peaks of NCR and identified as relevant.

RPYS-CO using the first marker paper
The overall RPYS-CO picture can easily be divided by the maximum NCR per RPY into two periods with regard to the reference publication years, which we are going to discuss separately: the first one from 1915 leading to, but excluding, 1960, the publication year of the marker paper, containing peaks with at most NCR = 100; the second one from 1960 to 1995, with peaks between NCR = 100 and NCR = 900 (apart from the marker paper itself). Figure 3 shows the RPYS-CO spectrogram for the marker paper Liu and Jordan (1960) for the relevant time period before it was published.

Time period 1915 to 1959
In this time period, we were able to identify 9 peaks with relevant papers for the following RPYs: 1919RPYs: , 1922RPYs: , 1924RPYs: , 1929RPYs: , 1940RPYs: , 1942RPYs: , 1945RPYs: /46, 1948RPYs: , and 1953. Because of the generally low NCR values in this time period, we did not want to lose reference  variants of possibly relevant papers and therefore additionally looked at the CRs before the removal of only once referenced papers. But this did not reveal new relevant papers, instead it only confirmed the results from the reduced set. Now we can follow the path of SEM by looking at the peak papers and drawing partially from explanations given in the citing papers. In this first period, two independent streams of research flew together: meteorology and engineering.
Meteorologists tried to develop a climatology of irradiance, emphasizing daily mean values, with no or little application to solar energy in mind. Solar irradiance varies deterministically with the sun's position on the sky dome (hemisphere) and irregularly with changing cloudiness. The relation of the latter with sunshine has initially been measured by Kimball (CR1) and later subjected to statistical analysis by Angström (CR2, CR4, and CR5), leading to a linear relation between the duration of bright sunshine and average solar energy available on a horizontal surface at ground level, the so-called Angström equation. This has been generalized to the Angström-Prescott (CR6) equation by introducing a daily clearness index, quantifying all stochastic meteorological influences, as a measure of the atmospheric transparency (Paulescu et al. 2016). Linke (CR3) in 1922 published his turbidity factor for the attenuation of the sun's radiation by water vapor and aerosols in the The marker paper is printed in bold type  Two engineers, Hottel and Woertz (CR7), came up with the first serious study on solar energy in 1942: the fundamental relationships given in their classic paper have since then been used for decades to model solar collectors. Hottel and Whillier (CR13) evaluated them concerning the flat-plate solar collector performance (Stanciu et al. 2016) and formulated the Hottel-Whillier-Bliss equation on its heat flow and available heat balance, based on considerations of the thermal usability of solar irradiation, coming from Whillier's PhD thesis (CR11) under Hottel's supervision at MIT. This latter work concerned also the relation between radiation on different time scales, showing a close interdependence of the frequency distributions of the so-called clearness index on a monthly, daily, and hourly basis (Vijayakumar et al. 2005). Because information on daily sunshine duration was no longer sufficient, he later proceeded to "The determination of hourly values of total solar radiation from daily summations" (CR14) by statistical means, a subject still of great importance for SEM, where still an ever more time-resolved knowledge of solar irradiance is needed.
The modeling of solar irradiance components through parameterization of atmospheric phenomena is an equally important area of work in SEM. It was begun in the 1940s by Haurwitz (CR8-CR10), focusing on cloudiness, cloud density, and cloud type (Chowdhury 1990). (We do not include a publication from the 1948 peak by Penman with NCR = 8, i.e. more than CR10, because it is only concerned with evaporation by solar radiation.) Figure 4 shows the RPYS-CO spectrogram for the marker paper (Liu and Jordan 1960) after its publication year. Further 10 peaks could be identified from Fig. 4 for the following RPYs in the second time period: 1960, 1963/1964, 1966, 1969, 1971, 1976/1977, 1979/1980, 1981-1983, 1986-1988, 1990/1991. The first two peaks are mainly caused by engineers: After the marker paper itself (CR16) the same authors gave generalized curves to predict the "Long-Term Average Performance of Flat-Plate Solar-Energy Collectors", making use of Hottel's and Whillier's methods (CR13) and building upon the knowledge of two parameters only: (1) the monthly-average daily total radiation on a horizontal surface and (2) the monthly average day-time ambient temperature (CR17). In 1961, J. K. Page presented "The estimation of monthly mean values of daily total short-wave radiation on vertical and inclined surfaces from sunshine records for latitudes 40 N-40S" (CR19) on a "UN Conference on New Sources of Energy" in Rome but the conference proceeding was published in 1964. Much later, he advised advanced publicly funded projects like HELIOSAT-3 (Mueller et al. 2004).

Time period 1960 to 1995
The meteorologist F. Kasten (CR22) developed turbidity models as one essential ingredient for radiation model calculations and also functioned as an advisor in later solar energy projects.
Attempts to check and confirm the diffuse-to-total radiation correlation by Liu and Jordan (1960) against measurements have been undertaken for several locations in the world: Southern Israel by Stanhill (CR20), New Delhi by Choudhury (CR18), and Canada by Ruth and Chant (CR27) and Tuller (CR29).
Cooper (CR23) and Spencer (CR25) are the only representatives in Table 1 of those researchers concerned with solar geometry, i.e. sun-earth angle values over time-which is essential for all modeling of radiation. In this respect, our method could only capture these first works, but not later standard works like Michalsky (1988).
Robinson's "Solar Radiation" (CR21) was a meteorological standard publication, but not that much focused as Kondratyev's monograph "Radiation in the Atmosphere" (CR24), whose influence lasted until Iqbal's standard work "Introduction to Solar Radiation" (CR41) from 1983.
In the 1970s and early 80s, one focus of research literature was on time-resolved diffuse radiation models from the scale of months down to hours, mostly from the viewpoint of engineering like Duffie and Beckman's volumes "Solar Energy Thermal Processes" in 1974 (CR26) and "Solar engineering of thermal processes" in two editions in 1980 (CR37) and 1991 (CR49), but also Hottel (CR28), Orgill and Hollands (CR31), Klein (CR32), and Erbs, Klein and Duffie (CR40). Only Hay (CR30) and Iqbal (CR38) represent geography resp. meteorology. Empirical radiation modeling, in particular with respect to tilted surfaces (e.g., of solar panels), was done by Hay (CR36) and meteorologists as Temps & Coulson (CR33) and Klucher (CR35).
At the end of the 1970s, the focus switched also to stochastic modeling, outstandingly covered by Collares-Pereira and Rabl (CR34) with their time series analysis and production of the first synthetic time series, that were widely used in the community. In the same vein, Bendt presented his "frequency distribution of daily insolation values" (CR39). The timescale was later even narrowed down to minute data by Suehrcke and McCormick (CR44), and Graham, Hollands & Unny (CR45) were able to simulate daily values of the clearness index from monthly mean values by using probability distribution functions.
In 1987, Skartveit and Olseth (CR42) presented a diffuse fraction model, that became essential part of later works, e.g. in HELIOSAT (Dürr and Zelenka 2009). CR43, i.e., Perez et al. (1987), also focused on the diffuse part of total irradiance and accomplished a major improvement in its error-prone computation, in order to "estimate short time step (hourly or less) irradiance on tilted planes" (Perez et al. 1987), which has received high recognition in the community. (See Discussion and Conclusion for considerations to use CR43 as a second marker paper.) Duffie and Beckman, together with their coauthor Reindl, were mainly responsible for the last high peak, taken into account in our RPYS-CO analysis, in 1990: they evaluated statistical models for hourly radiation on the tilted surface (CR47) and could significantly improve on the time resolution of statistical diffuse radiation models in CR46, whose abstract we now quote (with our emphases in italics): "The influence of climatic and geometric variables on the hourly diffuse fraction has been studied, based on a data set with 22,000 hourly measurements from five European and North American locations. The goal is to determine if other predictor variables, in addition to the clearness index, will significantly reduce the standard error of Liu-and Jordan-type correlations (…). Stepwise regression is used to reduce a set of 28 potential predictor variables down to four significant predictors: the clearness index, solar altitude, ambient temperature, and relative humidity." (Reindl et al. 1990, abstract).
We can in a sense close the circle to our first marker paper after exactly 30 years by mentioning CR48, i.e., Perez et al. (1990), as a successful attempt to apply diffuse fraction models to questions of daylighting in buildings, transferring irradiance to illumination, and again connecting the fields of meteorology and radiation to energy and engineering, the twofold focus of SEM. Figure 5 shows the spectrogram of the RPYS-CO for the second marker paper Cano et al. (1986) in terms of the NCR and their 5-year-median deviation for the whole analyzed time period, and Table 2 lists all publications contributing substantially to the peaks of NCR and identified as relevant, which had not already been listed in Table 1.

Fig. 5
Overall RPYS-CO graph for Cano et al. (1986) with NCR (grey bars) and 5-year-median deviation (blue line). (Color figure online) Again, the overall RPYS-CO picture can easily be divided into two time periods by the NCR values per RPY which we are going to discuss separately: the first one from the 1920 s leading to the mid-1970 s with peaks below NCR = 50 or even NCR = 20, and the second one from 1978 to 2018, with peaks increasing from NCR = 100 to NCR = 350, which are going to be in the focus of the following examination.

Time period 1922 to 1977
This period shows many peak papers we already discussed for the earlier SEM: CRs 3,4,6,8,12,15,16,17,21,24. The most prominent peak papers are CR4 (NCR = 31) and CR6 (NCR = 18) by Angström and Prescott, respectively, as could be expected for these foundational papers. But on the other hand the SEM marker paper CR16 (Liu and Jordan 1960) is referenced only 10 times, confirming the aforementioned more specialized focus of the literature on satellite methods, making a different marker paper necessary. After CR16 four new peak years emerge with associated papers: In 1964, Fritz et al. (CR101) from the U.S. Weather Bureau with Satellite Measurements of Reflected Solar Energy and the Energy Received at the Ground gives in a way the overall title for the whole satellite era of SEM. Interestingly, he in advance describes a method for comparing satellite with pyrheliometer measurements even before "data from accurate well-calibrated satellite instruments become available" (Fritz et al. 1964, abstract)-better than the current TIROS III data of questionable quality. In 1972, Priestley and Taylor (CR102) provided a radiation-based formula to assess heat flux and evaporation from the earth's surface, which, e.g., was much later used to evaluate satellite-derived databases as HelioClim-2 (Bois et al. 2008). In 1974, Lacis and Hansen (CR103) described a parametric method for computing the solar energy absorption on the earths's surface and in the atmosphere varying with the altitude-an important ingredient for generating solar radiation maps from satellite data (cf. CR135). In 1976, Paltridge (CR104) introduced one of the most important topics of the satellite-based methods-the short-wave albedo (diffuse reflection) of water clouds-in his case not yet measured by satellites, but by instrumented aircraft (cf. CR135).
The only two peaks before the second marker paper introduce the two main methodological streams-the statistical and the physical approach-, which we are able to follow through much of the total time period, assisted by the long review by Schmetz (1989) (CR115) and two shorter reviews by Noia et al. (1993a, b) (CR120, CR122).
Hay and Hanson (CR105) did a simple empirical regression between ground global radiance and satellite visible radiance, whereas Tarpley (CR106) from NOAA, the GOES operating agency, in addition took into account the cloudiness of the sky through a parametrization of the cloud index and atmospheric transmittance. The cloud index is proportional to the reflectivity of the atmosphere, and ranges from 0 at clear sky conditions to 1 at overcast (total cloud coverage). On the other side Gautier et al. (1980) (CR107) introduced their well-known "simple physical model to estimate incident solar-radiation at the surface from GOES satellite data" and evaluated it with ground measurements from Quebec.
The University of Cologne meteorologists Möser and Raschke (CR108, CR109) developed a more complex radiative transfer model for use with METEOSAT data. They worked out the more prominent influence of cloud coverage for solar irradiance at ground level than of absorption by aerosols, water vapor, and ozone. The same group later applied the model with encouraging results to data of the International Satellite Cloud Climatology Project ISCCP (CR112) and was able to improve the method in 1990 (CR118). A physical model, proposed in 1987 in CR111, uses a unique expression for both clear and cloudy conditions, and moreover the formula of CR103 for atmospheric components.
The last peak paper on physical modeling was published in 1992 as CR119. Pinker, a meteorologist of the University of Maryland, applied her improved model from the 1980s to ISCCP data on a global scale.
Now we come to one of the most important statistical methods, presented in the second marker paper Cano et al. (1986) (CR110), and slightly modified by CR113, CR114, and CR116: it forms the basis for the HELIOSAT project of the Ecole des Mines de Paris, Sophia Antipolis (France), which aimed at efficiently exploiting METEOSAT images in the visible channel for solar energy applications. In so far it shares the two-fold motivation of the marker paper for SEM. The authors first deduced a ground albedo map from a sequence of satellite images and then estimated the cloud index for individual images.
Both approaches-the statistical and physical-were in a fruitful competition, which lead to more and more improvements along the way. In 1993, Hay (CR121) reviewed various modeling approaches for satellite-based estimates of solar radiation at the earth's surface, including operational techniques, which to some extent were built upon the "HELI-OSAT method", before Beyer et al. (1996) (CR123) from the University of Oldenburg group were able to implement significant improvements. They accounted for some intrinsic nonlinearities of the atmosphere by using a modeled clear sky radiation (with absence of clouds) and gave a geometric correction due to the sun-satellite geometry, which is especially important for the handling of clouds in higher latitudes.
In the late 1990s, two international and wide-ranging databases for evaluation purposes had been launched: the Baseline Surface Radiation Network (BSRN), a new radiometric network of quality controlled data initiated by the World Climate Research Programme (WCRP) (CR125) and the "AERONET-A Federated Instrument Network and Data Archive for Aerosol Characterization" (CR126) initiated by NASA and using the revised optical air mass tables and approximation formula of CR117. At the same time, there were the following evaluations of the satellite-based methods proposed so far, based on different data from earth-based measurements: Perez from the University of Albany, NY, Zelenka from the Swiss Meteorological Institute, and co-workers undertook a comparison of GOES-8-derived radiation data with a network of 12 "ground-truth" stations in the North-eastern US concerning the site-and time-specific irradiance (CR124) and investigated the "Effective Accuracy of Satellite-Derived Hourly Irradiances" (Zelenka et al. 1999) (CR127) showing that the latter are the most accurate option beyond 25 km from a ground station. Perez and Ineichen from the University of Geneva (CR128) investigated the relationship between satellite-borne radiometer counts (from GOES and METEOSAT) and ground measurements, studying the influence of different clear sky models especially for high latitude regions with low sun elevations. Building on CR127 and CR128 they proposed in CR131 an improved and simpler version of the original model of the marker paper CR110, which is in a sense a counterpart to HELIOSAT.
From around the turn of the century, most of the peak papers have a clear bias towards European authors working on enhancing and operationalizing the HELIOSAT method: The Oldenburg group used a statistical method to determine cloud motion vector fields for the purpose of short-term forecast of solar irradiance (CR129), which later served as one example of the application of satellite data, another being the input for the simulation of solar energy systems (CR132). The Sophia Antipolis group presented an improved clear-sky model in CR130, which has been developed in the framework of the new digital European Solar Radiation Atlas (ESRA) that became an important ingredient of the method "HELIOSAT-2", which was no longer dependent on empirically defined parameters tuned by pyranometer measurements (CR133). Authors from the European project "HELIOSAT-3", aiming at the efficient planning and operation of solar energy systems, published a new type of solar irradiance scheme, more specific a clear-sky module called SOLIS, based on radiative transfer models using atmospheric parameter information retrieved from the current METEOSAT generation and other satellites (CR134).
The significant papers of the last three peaks in 2009, 2011, and 2013 are concerned with the benefit of satellite-based methods for climate monitoring: In CR136, Müller, now at the German Weather Service, and co-authors brought the physical and the statistical approaches together: the solar surface irradiance retrieval in EUMETSAT's Climate Monitoring Satellite Application Facility is based on radiative transfer calculations using satellite-derived parameters as input, mostly calculated using the statistical HELIOSAT method. In CR137, the Sophia Antipolis group reported on the HelioClim databases containing daily and monthly means of solar surface irradiance, covering Europe, Africa, and the Atlantic Ocean, that had been calculated with HELIOSAT data. Finally, CR138 gave a validation and stability assessment of radiation for the dataset of CR128 over Europe for the years 1983-2005.

Discussion
We studied the early history of SEM and the growing influence of satellite-based methods in more recent years by applying RPYS-CO to two highly cited and relevant marker papers by Liu and Jordan (1960) and Cano et al. (1986), matching the recommendations of a long-term expert. We inspected RPYs with peak citation numbers in the corresponding graphs and tables as calculated by the CRExplorer. Thereby, we were able to identify many important papers before and after the respective marker papers. They give an adequate view of most of the essential contributing streams of research in SEM, as, e.g., measurement, empirical and statistical modeling of direct and diffuse solar irradiance on the horizontal and the tilted plane on time scales from years to minutes. The topics solar geometry, radiative transfer calculations through the atmosphere, and spectrally resolved treatment of sun light can be identified as underrepresented in the results of the first RPYS-CO. But the latter of these three topics in particular gained interest only in later years and should be studied with other marker papers and the former two topics are included in the results of the second RPYS-CO-the first one in the necessary treatment of sun-satellite-geometry and the second one in the physical models.
Moreover, the second RPYS-CO has been able to trace the two methodological streams of satellite-based surface radiation retrieval from the beginning and to identify the main contributors and their competing models through the years. Finally, the operational application of the calculation schemes in near-real-time services and long-term databases, important for solar energy systems, are covered, focusing on the very recent or more distant past. The currently hot topic of forecasting solar irradiance especially for photovoltaic power production is treated only once in the peak papers (CR129), but the required time frame of several days in advance cannot be realized using satellite data only. Studying solar forecasting with RPYS-CO would only be promising with yet another marker paper.
It could be argued that an RPYS-CO on one or a few marker papers only produces a bias by favoring a limited number of research groups, maybe even enforced by the effect of self-citations. But in the given case, the first marker paper is obviously so well chosen as to unearth a diverse set of methodologies and approaches in the world-wide SEM community, coming from the two main domains meteorology and solar engineering and covering measurement, modeling, and evaluation. Moreover, self-citations of the repeatedly occurring (co-)authors among the peak papers play no role in our study because of their sporadic appearance.
In case of the second marker paper, stemming from a European research group heavily involved in method development, a bias on the HELIOSAT method applied to European satellites is to be observed and could be expected and may be influenced to a small extent by self-citations. But as our discussion of Table 2 makes clear, it by no means suppresses the detection of publications of competing research groups or approaches.
CR43 (Perez et al. 1987) has been suggested as an alternative SEM marker paper by the expert, but as it turned out, the RPYS-CO on both papers only confirms the results for Liu and Jordan (1960) alone (Scheidsteger and Haunschild 2019). An even stronger confirmation results from another RPYS-CO conducted for the two top most cited papers in our list of CRs together, representing the two pillars of SEM, meteorology and engineering, resp., i.e., CR34 (Collares-Pereira and Rabl 1979) with 238 citations and CR40 (Erbs et al. 1982) with 208 citations as can be seen from Fig. 6: all peaks and peak papers get reproduced, except from one in 1948 (CR10). This is even true for the later years after 1991 until 2013, thereby encouraging the detailed study of this period, based on these two more recent papers. But for the purpose of this paper we did not investigate the later peak structure in depth. (As a side note, the citation counts of 672 (CR34) and 515 (CR40) in the whole WoS compared to 1169 for CR16 (all numbers as of 9 April 2020) underline the good choice of the latter as marker paper.) In total, the outcome of our study meets our expectations: All relevant historical roots of SEM research were disclosed by the RPYS-CO analysis of the first marker paper and Fig. 6 Comparison of the RPYS-CO for the SEM marker paper Liu and Jordan (1960) (solid line) and the RPYS-CO for the two top cited papers (dotted line) in the corpus of papers citing Liu and Jordan (1960) concerning the second one we are overall consistently guided through the first about 30 years of the satellite era by visiting the main contributions in terms of methodology, evaluation, and operational application. Therefore, we recommend RPYS-CO for similar investigations by researchers to get valuable insights in the development of their field of work or as a tool for studies in the history of science.