Cardiovascular and cancer risk factors analysis for 2001–2020 from the global research output and European newspapers

Cancer and cardiovascular disease (CVD) are now two of the leading components of the global burden of disease, especially in high- and upper-middle-income countries. Causes of the diseases that are amenable to intervention are multiple: tobacco control closely followed by obesity treatment, including promotion of a healthy diet and physical exercise, remain the global priorities. We interrogated the Web of Science (WoS) from 2001 to 2020 to determine the numbers of papers describing research into 14 different possible risk factors causing the two diseases. These ranged in relative importance from tobacco and being overweight to the consumption of excessively hot drinks (linked to oesophageal cancer), pollution (linked to lung cancer particularly) and also non-interventional genetic risks. The risks varied between different continental regions, and obesity has increased as a risk factor for CVD in some of these regions. Because many of these factors are subject to human behavioural choices, we also investigated how such research was being presented to the European public through newspaper reportage. About 40% of the factors that influence the cancer burden can be attributed to particular causes, and more than 85% of those factors influencing CVD can also be so attributed. They are led by tobacco use as a risk factor for cancer, but this is slowly declining in most high-income settings. For CVD, the major risks are metabolic, such as high systolic blood pressure and high body-mass index, but also from tobacco use. Research outputs on some of these different factors in the continental regions correlated positively with their influence on the disease burdens. The selection of European newspaper stories was biased towards those risk factors that could be considered as being under the control of their readers. Reports of research in the mass media have an important role in the control of both cancer and CVD, and should be regarded by public health authorities as a useful means to promulgate health education. This paper is based on one presented at the ISSI conference in Leuven in July 2021 (Pallari and Lewison, in: Glänzel et al (eds) Proceedings of the 18th international conference on scientometrics and informetrics, 2021), but has been extended to cover CVD as well as cancer. The geographical analysis of risk factors and research publications has also been modified.


Introduction
Contributors of different non-communicable diseases (NCDs; World Health Organization group II) to the total global disease burden, measured in Disability-Adjusted Life Years (DALYs), 1990-2019. CVD cardiovascular diseases. Data from the Institute for Health Metrics and Evaluation (IHME), University of Washington, Seattle There is an abundance of research on the epidemiology of both CVD and cancer, and many papers have looked at the risk factors associated with particular cancers, such as breast (Zhang et al., 2011), cervical , colorectal (Ma et al., 2014), gastric (Klingelhofer et al., 2021), lung (Yu et al., 2015) and melanoma (Mirza et al., 2021). However, there is a paucity of bibliometric studies on the epidemiology of cancer. The only one that we could find that related to all cancers was published 15 years ago (Ugolini et al., 2007), and was reviewed later that year (Boffetta, 2007). There the matter rested until November of last year ) when a team from Xiamen University examined Chinese publications, but only ones in the Surveillance, Epidemiology, and End Results (SEER) database, so their coverage was inevitably very incomplete. In CVD, there has been a series of studies on the genetics of cardiovascular disease, beginning in 2009 (Irvin et al., 2016;Merino et al., 2019;Morrison et al., 2014;Psaty et al., 2009;Smith et al., 2010). However, we could find only the paper by Levois and Layard (1995), which discussed possible publication bias in the environmental tobacco-smoke and coronary heart disease literature. Therefore it appears that a comprehensive and up-to-date bibliometric examination of risk factors in cancer and cardiovascular research has not appeared in the literature. Our study attempts to fill this gap.
In this study, we examined the world output of papers on CVD and cancer research that involved epidemiology as applied to cancer risk factors, and classified them by 14 subject areas, some of them corresponding to the modifiable cancer risk factors listed in Table 1. They are listed in Table 2. Most of the papers concerned an increase in the risks, but some described means of lowering it, such as taking more exercise or eating a good diet (e.g., one rich in fruit and vegetables). These papers were classified by the continents of their authors, see Table 3. They are based on the continental and regional divisions used by the IHME.
We also analysed the research papers behind the stories in 31 European newspapers reporting the epidemiology of CVD and cancer risk factors. These were all classified by inspection of the reported story. We classified the stories by the countries of the citing newspapers rather than those of their researchers, and by the risk(s) identified. The mass media are important as a way to provide public health information, and may draw attention to lesser-known causative factors for CVD and cancer. Several of these differ from the usual suspects, such as age (Macdonald et al., 2018), alcohol (Christensen et al., 2019;Pratt et al., 2014), and meat (Kowall & Stand, 2019). Thus the media have been used on several occasions to warn the public about the risks of cancer (de Vito et al., 2014;Donell et al., 2004;Lee et al., 2001;Stryker et al., 2008Stryker et al., , 2009). However, there do not seem to be any equivalent studies for heart disease or stroke.

Methodology
The papers about the risks of CVD and cancer were taken from the Web of Science (WoS, © Clarivate Analytics) from the 20 years, 2001-2020. They were limited to articles and reviews, but with no language restriction. First, we selected all papers on CVD and cancer research by means of two complex filters, which had been developed for two earlier projects. The one for CVD included 68 specialist cardiology journals, and 137 title words or phrases, but with an exclusion of the following title words: Alzheimer* or antimalaria* or dementia or Falciparum or malaria* or optic* or retina*. This filter had a precision, p, of 0.95 and a recall, r, of 0.90 (Huffman et al., 2013). The filter for cancer was based on 323 title words and 185 specialist cancer journals, and had a precision, p, of 0.95 and a recall, r, of 0.98 (Begum et al., 2018). We then applied an epidemiology filter, consisting of 25 title words, such as:  and a set of 35 epidemiology journals. They were selected from the list of all the journals used for CVD and cancer research where the papers had the selected title words, and whose names contained the strings EPIDEM or PREVENTION or RISK. This third filter was calibrated with reference to the outputs of researchers in epidemiology departments (Lewison, 1999), and had a precision of 0.78 and a recall of 0.89, so it over-estimated the numbers of epidemiology papers by 12%. The application of this third filter identified a true total of 150,307 papers, or 13.2% of the total in CVD research. The actual number of CVD epidemiology papers identified in the WoS was 168,344, and these formed the database and were the ones whose characteristics were determined. For cancer, the true total was 171,366 epidemiology papers, or 10.5% of all cancer research papers in 2001-2020. In order to select the main causes, or risk factors, we used the "compare" tab tool on the IHME website (IHME, 2020) to identify the ones that were responsible for many CVD or cancer DALYs (Disability-Adjusted Life Years) or could be clearly defined by means of title words in their papers. [The settings were as follows: Display = Risk; Location = Global; Year = 2010; Age = All ages; Sex = Both; Level = 2]. We then inspected a large number of the two sets of epidemiology papers and made a provisional list of the distinctive title words of the papers that could be used to identify those on each selected cause. This process was iterative, as the lists of title words were inevitably initially incomplete, and so it was repeated until we had created a comprehensive list of the title words to be used, some of which are reported in Table 4, below. We proceeded to download the long list of journals that had been used to publish the epidemiology papers. Journals whose names indicated that they were in one of the 14 selected subject areas (see Table 2) were then listed and used to select a second set of papers. We examined those papers that were so identified, but which were not identified by the provisional list of title words. This helped us in two ways. First, it suggested title words that could be added to the provisional list. And second, it showed which of these specialist journals were not appropriate because many of the papers that they generated were not relevant.
[They were often about the increased risks of other diseases because the patients already had CVD or cancer. However, we retained papers where the risk of disease was increased because the patients were suffering from other diseases]. The list of title words was then finalised for each subject area, complemented by the specialist journals, and applied to the two epidemiology files so as to generate sets of papers, both in the world and in nine continental regions, see Table 3. Table 4 lists the title words for five of the selected subject areas. They were designed to be used with the WoS software, and the hyphen indicates that the words must be next to each other in the titles. The WoS automatically includes plurals as well as singular forms, e.g., vitamins as well as vitamin but wild cards (*) are needed to cover different forms such as infected, infection, infects. Some papers would have been classed in more than one of the 14 causes, and others in none of them. The * denotes any character.

Research papers on epidemiology
Epidemiology has increased markedly with time as a percentage of CVD research papers, see Fig. 2. It has almost doubled, from 8.5% in 2001 to 15.5% in 2020. However, the corresponding percentage for cancer research has hardly changed. It increased only from 8.9% in 2001 to 10.2% in 2020, although it reached 12.4% in 2014. [The subsequent drop was probably an artefact, because the WoS increased its journal coverage in 2015 to include more from countries outwith Western Europe and North America].
There was a big variation in the relative commitment to epidemiology as a subject for research within CVD and cancer. Figure 3 shows that, among the 17 countries with at least 20,000 CVD research papers, Sweden ranked highest and Germany the lowest. However, if we extend the coverage to countries with > 300 CVD research papers, then Iceland becomes the highest-ranked, with 38% in both disease areas, followed by some African countries-Uganda with 33% and 25% in CVD and cancer, Ethiopia with 32% and 23%, and Cameroon with 32% and 12%. Bangladesh is also prominent in the epidemiology of CVD with 32% of its total research output. At the lower end of the scale is Russia, with percentages just below those of Germany. Figure 4 shows the numbers of papers in each of the 14 subject areas, plotted on a logarithmic scale. There is a difference of almost three orders of magnitude between the largest (genetics) and the smallest (hot drinks and heat, and Ultra-Violet radiation). Hot drinks seem to be locally or regionally important (mainly for oesophageal cancer) but do not feature among the main causes in Table 1. UV radiation (UVRA) is primarily a problem in Australia and New Zealand, see Tables 5 and 6, and attracts research on its effects on cancer, but not on CVD. On the other hand, the effects of obesity (OBES) on CVD are more frequently researched than their effects on cancer, despite claims by Cancer Research UK, a leading funder of cancer research, that obesity is the second most important risk factor (after smoking) (Cancer Research UK, 2021).
The next two tables show the relative commitment of the nine continental regions to research on the different risk factors for the two diseases. Australia and New Zealand (ANZ) are primarily concerned with the effects of UV radiation on both diseases, of heat  Table 2), 2001-2020 on CVD, and of lifestyle choices on cancer. Radiation (RADI) is of major interest in Eastern Europe for its effects on cancer. There is relatively little research in East Asia (EAS) on lifestyle choices (mainly the effects of exercise, or lack of it). Their effects on CVD, but not on cancer, are however a focus for research in Latin America (LAC). Sub-Saharan Africa's research output is quite low, but is focussed on infection (INFE), especially the link between AIDS and cancer, and the effects of heat on cancer. The effects of smoking (TOBA) on both CVD and cancer are well-established, and the heaviest relative concentration of research is in South Asia (SAS), principally India. This region is also researching the effects of dental problems (DENT), and their treatment, on cancer. Although this is a relatively small subject area, it is growing rapidly (at over 10% per year over the study period), and is also of research interest in Latin America, especially in Brazil. The effects of poverty and economic inequality (ECON) are primarily researched in Australia and New Zealand, and in North America (NAM), both on CVD and on cancer.

Newspaper reportage on epidemiology of CVD and cancer
The database of these newspaper stories was compiled for an earlier study (Pallari & Lewison, 2017) as part of the evaluation of European research in these disease areas. The database was a product of a consistent and systematic search enquiry that was set up in English and translated into 19 languages and applied to 32 newspapers from 21 European countries. There were no language restrictions on the news stories, although most of the identified research studies were in English. For more information on the methodology, please see Pallari et al. (2019). Using the aforementioned database, we examined the subset of those news stories that described epidemiological studies for these two diseases, and classified them by inspection of their titles and synopses into one (or occasionally more) of the 14 risk factors listed in Table 2. [A few stories were about other ones]. Figures 5 and  6 show the distributions for CVD and cancer, respectively, and for the two geographical groups of newspapers. Lifestyle changes, particularly exercise and being sedentary, together with dietary choices, are the main subjects. They are comparable in prominence for CVD, but diet has a bigger role in stories about cancer. The effects of drugs, both positive and negative, are important for CVD in Western Europe, but not in Eastern Europe, where the influence of infections occupy more space. Perhaps surprisingly, pollution gets more attention in Western than in Eastern Europe; for CVD it is noise from aircraft and motor traffic that dominates. Many of the stories give contradictory messages, for example on the effects Under the heading of genetics, comparisons were sometimes made between the relative risks suffered by men and women. Obesity as a risk for both diseases was the subject of relatively more stories in Western Europe than in Eastern Europe. The risks from socioeconomic deprivation were featured for CVD in Western Europe, but for cancer in Eastern Europe.

IHME data on the risk factors in the nine world regions
The IHME data on the causes of cancer in different continents are available, and some results for 2010 are shown in Table 7. They are based on the selection of Advanced Settings > DALYs > 2010 > All ages > Both sexes > Level 2 (CVD and Neoplasms). For each selected geographical area (Location), and risk factor, the percentages of Risk factor attribution (RFA) were read from the diagram. For example, for Location = Eastern Europe, and Risk = Tobacco, RFA for CVD = 25.6% and for Neoplasms = 27.2%. These data show that there are some big differences between continents on which causes are the most important with regard to the development (or the reduction) of CVD and cancer.
The individual causes (from tobacco use to pollution and occupational causes) add to more than 100% because of double counting with some causes combining to increase the risk of the two diseases. Tobacco as a cause of cancer varies greatly, from 31% in North America to only 8% in sub-Saharan Africa. Alcohol has the biggest effects in Australia & New Zealand and Europe (CEE and WEU), but much less in the Middle East & North Africa (MEM), all of whose countries are Muslim-majority where alcohol is not freely available. Obesity has the largest effect in North America, and for CVDs in the Middle East & North Africa, and the least in Asia (EAS and SAS). Within EAS, China suffers mainly from its diet, and from pollution. So these data tend to confirm some subjective views of the world.
This table shows that genetics plays a much larger part in cancer than in CVD, where its role is quite small apart from in Sub-Saharan Africa. For CVD, the main causes are diet (especially in South Asia and Eastern Europe), followed by pollution (mainly in the same two regions). Tobacco is the main causative factor for cancer, greatest in North America, and least in Sub-Saharan Africa. It also has a noticeable effect on CVD, especially in Eastern Europe (notably in Montenegro, 34.6%) and East Asia.
It is possible to make a comparison of these risk factors in the nine continental regions with research data (taken from Table 4) for four of the subject areas listed in Table 1, tobacco, alcohol, diet and obesity. This can show whether the researchers are focusing their attention on the areas deemed to be of importance to the control of cancer in the eight continents. The correlation is positive for all four; it is highest for obesity and lowest for diet, probably because dietary data are not so readily available, or indeed so reliable, as data for the other three risk factors.

Discussion
In this work, we report (what we think is for the first time) a comparison between the causes of CVD and cancer, the amount of research on each of the main risk factors, and the extent of their news coverage in Europe. News coverage is important because several of the causes of these diseases are to some extent under readers' control, and therefore may be influenced by the stories that they read. It was clear from these data that the journalists had selected research stories, not as an unbiased sample but rather ones that they thought would interest their readers. Thus, of the 74 stories about genetics and cancer, the largest number (n = 27, or 36%) were about breast cancer, which is always a popular subject (Lewison et al., 2008). The 14 risk factors are somewhat different from each other. Genetics (GENE) is the dominant subject area in most of them, but infection (INFE) is top in both Africa and Latin America because of the papilloma virus and its effect on cervical cancer, which is less well screened and treated there compared with Western Europe and North America (Vaccarella et al., 2017). Pollution (POLL) is evidently a subject for a lot of research in Eastern Europe, as is smoking (TOBA). There is relatively little research on this in China; this may reflect the policy of the Chinese government, which derives very substantial tax revenues from cigarettes (Li & Lewison, 2020). The interest of the countries of Eastern Europe in radiation (RADI) is probably attributable to the long-term health effects from the Chernobyl nuclear accident in April 1986 (Ivanov et al., 2012;Rivkind et al., 2020). It is also notable that obesity (OBES) is of particular interest in Australia & New Zealand and the rich countries of North America and Western Europe. Australia is also the most concerned with research on ultraviolet radiation (UVRA) because of its high incidence of solar-induced skin cancer (McLoone et al., 2014;McMeniman et al., 2020). We added the subject areas of dental issues and hot drinks because of the belief that dental health has wider effects on bodily health and the risk of various manifestations of cancer (Chen et al., 2020;Kawasaki et al., 2020;Wu et al., 2021), and the problems, particularly in China, with oesophageal cancer which can result from the drinking of very hot tea (Li & Lewison, 2020;Yu et al., 2018).
The impact of such modifiable risk factors and the disease burden of CVD and cancer, however, may not be entirely understood at the population level until there is a clear communication and awareness strategy from governments. Such a distinction between policy and action was made in a recent study that showed that government strategic plans focused on the treatment of diabetes rather than its prevention in Cyprus, Iceland, Luxembourg, Malta and Montenegro (Cuschieri et al., 2021). Furthermore, the importance of translating burden of disease studies for use in policy-making was recently raised with efforts concentrated in producing a knowledge translation framework (Pallari et al, 2020). To support these efforts the European COST Action burden of disease network was set up. It is intended to integrate technical initiatives in the development and use of these metrics into findings relevant to stakeholders. It will also strengthen the capacity for cross-country collaboration, as reported by Devleesschauwer (2020). To this end, and to effect such a change in health communication and policy-making, data visualisation tools may enhance the engagement of decision-makers and relevant stakeholders, as a disconnect between research and public health has been reported (Lundkvist et al., 2021).

Strengths and limitations
The study involved the analysis of voluminous research outputs at the global level, the comparative assessment using burden of disease data from IHME and examination of news-media reports. We were fortunate to have developed such a database of newspaper stories from so many different countries, albeit confined to Europe. Additionally, we have drawn comparisons between the countries concerned from the research outputs and newspapers with data on the causes of CVD and cancers as part of an effort in linking these data and identifying gaps in public health communication and informing policy-makers on how to reduce the burden caused by these diseases in these countries.
The study has several limitations. Because of time, the two files of epidemiology papers were "virtual" ones, and the analysis was based only on the WoS software. [Our analysis was confined to this database as it provided a good coverage of research on the two diseases, with careful curation of journals, and our own analytical software has been developed for it over many decades]. The details of the papers were not actually downloaded to file, which would have allowed us to examine the fractional counts of countries, which provide a fairer assessment of their individual contributions to the research. It would also have allowed us to link the various causes, e.g., diet, alcohol use, to the different manifestations of cancer that are particularly affected, which might help researchers to focus their work better. We have done this only for two examples, but there must be many others besides the link between smoking and lung cancer, which is one of the best-established. The allocation of papers to causes that we performed has similarities with the topic modelling process in which documents are classified in multiple subject areas based on text mining. However, instead of using natural language processing techniques for this paper, we visually inspected samples of the papers and used our judgment to classify them through a complex iterative process.

Implications for further research
There is some evidence that CVD and cancer researchers in different continental regions are studying the subjects that are the most important for their influence on the risks. This is only an association, and it would be necessary to carry out further studies such as a survey of some of the leading researchers to establish whether they are aware of the relative importance of the different causes, and whether this knowledge had influenced their choice of research topics. It would also be useful to ask funders of such research whether they try to support the research that was most likely to reduce the disease burden in their countries, or whether their allocation of funds was based only on their perceptions of the potential quality of the work. Additionally, mapping out the factors based on a country analysis may provide useful results that are of value to local policy-makers.

Conclusions
Epidemiological research into the causes and prevention of CVD and cancer only represent small percentages of the total, despite this being more cost-effective than treatment once disease is established in a patient. However, such research appears to be targeted at the most important causes in different continental regions. Although the prevention of these two diseases depends heavily on governmental public health campaigns (notably to control smoking), the lifestyle choices of members of the public also make a difference. These can be influenced by information about new research findings that are reported in the newsmedia. Within Europe, it seems that these are chosen to reflect the topics that are likely to have the most interest for their readers, which is helpful.