Introduction

Potentially inappropriate prescribing (PIP) in older people is associated with increased morbidity, lower quality of life, increased use of health care services, and increased health care costs1,2,3. PIP, therefore, poses a clinical, humanistic and economic problem for older adults, their carers and health care systems. Furthermore, it is a prevalent global health issue in all settings of care that is likely to grow as the world population ages4. Although PIP is considered a highly prevalent problem worldwide, its prevalence varies widely due to differences in country contexts, health care settings, populations and measurement tools5. Additional information about the magnitude of the problem from relevant systematic reviews is presented in the Discussion section, illustrating that PIP is a global issue of major concern.

PIP encompasses the prescribing of potentially inappropriate medications (PIMs) and potential prescribing omissions (PPOs)6. PIM use refers to the prescribing of ineffective medications or medicines with higher risks than benefits (especially when safer therapeutic alternatives exist) and the prescribing of medications without a clinical indication or at the wrong dose, frequency or duration of treatment7. A PPO involves the omission of a clinically indicated medication6. The appropriateness of prescribing can be assessed using criterion-based (explicit) or judgment-based (implicit) tools8. Explicit tools are easily applied, reliable and reproducible but do not consider individual patient characteristics. On the other hand, implicit tools are time-consuming to use and have low reliability and reproducibility as they depend on clinician judgment but are person-specific and consider patient preferences9. The Beers criteria7,10,11,12,13,14 and Screening Tool of Older Person's Prescriptions (STOPP)15,16, which are explicit tools, and the Medication Appropriateness Index (MAI)17, which is an implicit tool, are among the most commonly used criteria to quantify prescribing appropriateness18. Furthermore, the Beers7,10,11,12,13,14 and STOPP15,16 criteria served as a basis for the development of most other validated tools19.

The clinical and economic consequences of PIP can be more devastating for older adults residing in regions and countries with fewer financial resources and worse health status, which contributes to deepening health inequalities globally. One such region is Central and Eastern Europe (CEE), which encompasses the following countries: Albania, Bosnia and Herzegovina, Bulgaria, Croatia, Czechia, Estonia, Hungary, Latvia, Lithuania, Montenegro, North Macedonia, Poland, Romania, Serbia, Slovakia, Slovenia, and the territory of Kosovo. In all countries in CEE, the healthy life expectancy (HALE) at age 60 (years) is lower than those in other European Union (EU) countries and other more developed countries, such as Australia, New Zealand, Canada, the Republic of Korea, Singapore and Japan (range 14.9–17.8 versus 18.2–20.4 years, respectively), except for the United States of America (16.4 years)20. Additionally, all countries in CEE have a lower standard of living, expressed as gross domestic product (GDP) per capita, purchasing power parity (PPP) (current international $), than the EU average, the Organisation for Economic Co-operation and Development (OECD) average and the high-income economies' average (48436.3, 48482.1 and 54602.9013, respectively)21. According to the World Bank country classification, all non-EU countries in CEE and Bulgaria are upper-middle-income economies. In contrast, the rest of the EU countries in CEE are high-income economies22.

PIP has been extensively explored over the past three decades, and a number of systematic reviews have been published on this topic. However, only a few systematic reviews have investigated the prevalence of PIP. Additionally, some of these systematic reviews have focused only on single countries (two systematic reviews by Bhagavathula et al.23,24), specific measurement tools (the systematic reviews by Hill-Taylor et al., Opondo et al., Praxedes et al., Thomas et al., and Storms et al.25,26,27,28,29), or specific sources of data (the systematic review by Guaraldo et al.30). Only three systematic reviews had similar inclusion and exclusion criteria but focused only on specific settings: community (Tommelein et al.31), primary care (Liew et al.32) and long-term care (LTC) (Morin et al.33). Furthermore, these reviews included only a few studies conducted in countries in CEE. Morin et al.33 included no studies from CEE, Liew et al.32 included only one study from CEE, and Tommelein et al.31 included five studies from CEE. Thus, whether the findings from these reviews, which focused on wealthier countries, can be generalized to the CEE region that encompasses former communist states with less developed medication safety programs is uncertain because of differences in country contexts, the availability of resources, and health systems.

We believe that carrying out an up-to-date comprehensive systematic review across a range of settings can inform policy-makers about the issue of PIP in the CEE region and subsequently reduce global health disparities and accelerate the development of medication safety measures in this region. Therefore, we aimed to systematically review the PIP prevalence in older adults in all care settings in countries in CEE.

Methods

We conducted the review according to the registered protocol (PROSPERO: CRD42020152713; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=152713)34 and reported according to the Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline35, and the Preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidance36,37,38 (see Supplementary Tables S1–S4). At all stages of the review process, we contacted the study authors via email to obtain or confirm relevant information. We did not use automation tools in our review.

Search strategy and selection criteria

We searched Embase (Embase.com; 14 June 2019) and MEDLINE (Ovid; 16 June 2019). Search strategies were adapted from two Cochrane systematic reviews39,40 and tailored to each database and specific interface (full search strategies are provided in Supplementary Tables S5 and S6). No filters or limits were used. Additionally, two authors independently checked the reference lists of the included studies and the reviews on similar topics. Duplicate records were removed using EndNote 20 and manually. We conducted a 'top-up' search in August 2022, and we listed the potentially eligible studies that were not incorporated into the review in the 'Studies awaiting classification' table.

We included studies that used validated explicit or implicit tools to measure the PIP prevalence in older adults aged 60 years and over (the United Nations standard)41 in all care settings in countries in CEE. We excluded studies focused on a single disease or condition, terminally ill patients and specific medications/classes of medications (because their results are not applicable to the older population as a whole). If a study reported that some participants were younger than 60 years, we attempted to contact the authors to obtain separate data for older adults (post hoc decision; see Differences between the protocol and review in Supplementary Table S7). All study designs were eligible except for case‒control studies and case series. Regarding interventional studies, participants could not be selected based on the presence of PIMs/PPOs, and only the PIP prevalence before the intervention was considered. Only primary studies published as full papers in peer-reviewed journals were included. We did not apply any language or date restrictions.

Data collection and analysis

Two review authors independently screened the titles and abstracts to exclude clearly ineligible studies. The same two authors then independently screened the full texts of the remaining potentially relevant studies. All disagreements were resolved by discussion without the need for a third reviewer. The reviewers were not blinded to the names of the authors, their institutions or the journal of publication. Multiple reports of the same study were linked together. We used a software program when abstracts and/or articles required translation into English.

Two reviewers independently extracted data using a standardized data extraction form created and piloted specifically for this review. A third reviewer read all records in detail to check the collected data for accuracy and ensure that no relevant information was missed. This author also resolved all errors and inconsistencies, contacted the study authors and mediated consensus on disagreements. When necessary, we consulted a fourth reviewer. We collected data on the following: record details (authors, year, journal, funding sources, conflicts of interest, aims, conclusions), study characteristics (study design, sampling, recruitment, response rate, setting, country and location, number of study centers, study period, methods of data collection, sources of data, ethical approval, informed consent), participants (number, age, sex, inclusion/exclusion criteria, comorbidities, medication use), outcomes (measurement instrument, measurement instrument adaptation, timing of outcome measurements), and miscellaneous information (contact information, correspondence required and responses, comments from the reviewers). We presented key characteristics and findings of individual studies in a 'Characteristics of included studies' table, in which studies were grouped by setting and country.

The risk of bias was assessed using the Joanna Briggs Institute (JBI) Prevalence Critical Appraisal Tool42, which contains nine items: representativeness of the sample; appropriateness of recruitment; adequateness of the sample size; appropriateness of the description of the study subjects and setting; coverage bias; validity of the measurements; reliability of the measurements; appropriateness of statistical analysis; and adequateness of the response rate. Two authors independently applied the tool to each included study and resolved all disagreements by discussion without the need for a third reviewer. The overall risk of bias was judged as high if at least one domain was at high risk or if three domains were at unclear risk. Regarding nonreporting biases, two authors independently assessed study-level selective reporting by comparing the outcomes reported in the results to those previously specified in the aims and methods sections; protocols were not available for any of the included studies. We resolved disagreements by discussion.

The eligible outcome was the PIP prevalence measured by validated explicit or implicit tools. The PIP prevalence was defined as the proportion of persons with one or more PIMs and/or PPOs at a specified point or period in time. The PIP prevalence expressed as a proportion of prescriptions was only reported in the text of the review and excluded from the data synthesis and certainty of evidence rating. When multiple outcomes within a study were available for inclusion (the same outcome measured by different tools and/or at different time points), we reported all of them in the 'Results of individual studies' table but selected a median estimate for data synthesis. Missing prevalence estimates and confidence intervals were computed from the data collected from the studies.

We grouped all outcomes in a single analysis (not prespecified in the protocol). We did not restrict the synthesis to a subset of studies, and we did not prioritize the reporting of some study findings over other findings.

Meta-analysis was not appropriate because the measurement tools were too dissimilar across studies. Therefore, to provide a quantitative assessment, we used the statistical synthesis without meta-analysis approach—summarizing effect estimates method. In this method, each included study is represented by one outcome (in our study, in the case of multiple outcomes, the median estimate was used), and the median, interquartile range and range were calculated across studies. A limitation of this synthesis method is that equal weight is given to all studies, not accounting for differences in the sample sizes. We provide a visual display of the PIP prevalence distribution by box-and-whisker plots. We also present the results from the synthesis in the 'Summary of findings' table. Statistical synthesis was performed using IBM SPSS Statistics 27. In addition, to assess the medications that are most frequently involved in PIP, we extracted the three most frequent criteria of PIP from each study and provided a brief narrative summary.

We investigated heterogeneity visually using box-and-whisker plots. We explored the following potential sources of heterogeneity using study-level variables: study quality (studies at low risk of bias and with some concerns, and studies at high risk of bias; post hoc), study setting (acute, community, LTC, and outpatient (which includes both community-dwelling and LTC residents)), and study period (before 2010 and from 2010 onward; post hoc). We could not assess the influence of several prespecified potential modifiers, including age due to differences in reporting (mean, median, missing information), country due to a small number of studies, and measurement tools due to substantial diversity.

We decided post hoc to assess the quality of evidence related to the studies included in the data synthesis using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach43,44,45 and created a 'Summary of findings' table. Two review authors independently judged the certainty of the evidence, with disagreements resolved by discussion. The GRADE approach specifies four categories of quality of evidence: high, moderate, low, and very low. We started at high quality because cross-sectional studies are the most appropriate research design to assess prevalence42. We considered downgrading the quality of evidence for each of the five GRADE domains (risk of bias, inconsistency, imprecision, indirectness, and publication bias) by one level or two levels in cases of severe problems. All our decisions are provided in the Discussion section and the footnotes of the 'Summary of findings' table.

We performed post hoc sensitivity analyses to investigate whether our decisions changed the results: 1) using the smallest and the largest outcomes instead of the median outcome for each study, and 2) excluding one study with a subset of eligible participants.

Results

Search results

We identified 1890 records (1440 by searching electronic databases: 1000 from Embase, 440 from MEDLINE, and 450 from browsing reference lists) from our search up to 16 June 2019. After removing 398 duplicates, we screened 1492 records for eligibility and excluded 1412 records based on the title or abstract. We assessed the full text of the remaining 80 records and listed the excluded studies at this stage in the 'Characteristics of excluded studies' table (see Supplementary Table S8). Although we successfully contacted the authors of two studies, they could not provide the data on PIP prevalence (study eligibility criterion); thus, we excluded them46,47. Ultimately, 27 studies (28 records) met the inclusion criteria. One article reported two studies48. On the other hand, three papers described one study49,50,51. See the PRISMA flow diagram (shown in Fig. 1).

Figure 1
figure 1

PRISMA flow diagram.

Our 'top-up' search yielded 637 records, of which 477 remained after duplicates were removed. Eight of the 477 screened records were potentially eligible and are listed in the 'Studies awaiting classification' table (see Supplementary Table S9).

Characteristics of included studies

Study characteristics are summarized in the 'Characteristics of included studies' table (see Table 1). All studies were cross-sectional, except that of Stuhec et al.52, which was an uncontrolled before-after study. Different care settings were equally represented across studies—acute49,50,51,53,54,55,56,57,58, community48,59,60,61,62,63,64 and outpatient52,65,66,67,68,69,70 settings in seven studies, and LTC setting in six studies48,71,72,73,74,75. Only three studies were conducted in upper-middle-income countries: Serbia (2)62,75 and Albania (1)54. The rest of the studies were conducted in high-income countries: Croatia (5)56,57,58,68,70, Slovenia (5)52,60,66,67,74, Czechia (4)53,59,69,72, Poland (3)61,63,64, Slovakia (3)49,50,51,55,73, Romania (2)48, Hungary (1)71 and Lithuania (1)65. There were no studies from Bosnia and Herzegovina, Bulgaria, Estonia, Latvia, Montenegro, North Macedonia or the territory of Kosovo. Two studies were conducted internationally53,59, but only the data from a country in CEE were included in the review. Twelve studies were conducted up to 201049,50,51,53,55,58,59,61,63,64,67,69,70,74, and fifteen were conducted from 2010 onward48,52,54,56,57,60,62,65,66,68,71,72,73,75.

Table 1 Characteristics of included studies.

The 26 studies included 1,139,693 participants, ranging from 58 to 431,625 participants. One study provided results only on prescriptions (5086)61, and one study applied a part of the tool to patients and the other part to prescriptions (1,315,624)68. In studies that reported sex, the majority of participants were female (range 52.1–83.7%)48,49,50,51,52,53,55,56,58,59,60,61,62,63,64,65,66,68,71,72,73,74,75 except in two studies (43.6 and 49.3%)54,57. One study each included participants aged over 6054, 7070, and 100 years64, and the remaining studies included participants aged over 65 years. One study included participants aged over 50 years, but the authors provided separate data for participants aged 65 years and older71, which enabled us to include this study in the review.

Data were collected from different sources/combinations of sources—medical records, claim databases, pharmacy databases, prescriptions, medication review documentation, interviews, questionnaires, and patient assessments. The use of a standardized data collection form was reported in only four studies57,59,62,74. In most studies, only prescription medications were considered48,52,53,54,55,58,60,61,62,65,66,67,68,69,70,73,75.

Polypharmacy, or the use of multiple medications, was defined differently across the studies: in most studies, it was defined as more than four or five medicines49,50,51,53,55,56,57,58,59,60,62,64,66,72,73,74 and in a few studies, it was defined as more than three 65,71 or six medicines 54,63. The polypharmacy prevalence ranged from 32.6 to 91.7%49,50,51,53,54,55,56,57,58,59,60,62,63,64,65,66,71,72,73,74. It was not reported in seven studies48,61,67,69,70,75, and only adults with polypharmacy were included in two studies52,68.

Overall, the prevalence of PIP was reported 52 times in 27 studies, with between one and four outcomes per study. There were differences in the concepts measured (PIMs, PPOs, and both), the measurement tools used (different domains; different versions; different adaptations; combinations), and the measurement time points (admission, discharge, admission/discharge). The predominantly measured concept was PIM use; PPOs were assessed only seven times—four times separately and three times together with PIMs. Only explicit tools were used to detect PIP, namely, the Austrian consensus panel list78, 1997 Beers criteria10, 2003 Beers criteria11, American Geriatrics Society (AGS) 2012 Beers criteria12, AGS 2015 Beers criteria13, Comprehensive protocol79, 2012 CZ expert consensus criteria80, EU(7)-PIM list81, French consensus panel list82, Ghent Older People's Prescriptions community Pharmacy Screening (GheOP3S) tool83, McLeod criteria84, PRISCUS list85, Screening Tool to Alert doctors to Right Treatment (START)15, START criteria version 216, STOPP15, and STOPP criteria version 216. Additionally, composite tools, i.e., combinations of two or more criteria, were used in the included studies (five times). The tools used most often were different versions of the Beers criteria (21 times; and four times as a part of the composite criteria; version 2003 was used most often, 14 times and two times as a part of the composite criteria) and different versions of the STOPP criteria (seven times and two times as a part of the composite criteria). Thirteen studies used only one tool49,50,51,52,58,60,61,63,64,68,69,70,72,73,74, three used only a composite tool48,71, and the remaining eleven used more than one tool and, in some cases, more than one version of the same tool53,54,55,56,57,59,62,65,66,67,75. The full versions of the tools were used in only six studies52,53,60,62,73,75. In the remaining studies, the tools were adapted. Certain sections of the tools or individual items were excluded, most often medications that were not available on the pharmaceutical market and criteria requiring some clinical or therapeutic information (such as diagnosis, dose, dosage, and duration of treatment).

Risk of bias in included studies

Only six studies were at low risk of bias or with some concerns59,62,65,66,68,74 (shown in Fig. 2, Supplementary Fig. S1 and Table 2). In over half of the studies48,49,50,51,52,53,54,55,56,57,58,61,63,71,72,73,75, the sample frame was not appropriate to address the target population (the country's older population), as it included only persons from one or several organizations. In contrast, in most studies48,49,50,51,53,54,55,56,57,58,59,61,62,65,66,67,68,69,70,71,72,73,74,75, participants were recruited appropriately by including everyone from the sampling frame or using random probabilistic sampling; convenience sampling was used in only a small number of studies52,63. The sample size was inadequate in almost half of the studies48,52,53,54,57,58,64,71,72,73 (determined by following the JBI Prevalence Critical Appraisal Tool recommendations). Approximately two-thirds of the studies48,49,50,51,52,53,54,55,56,57,59,62,65,66,68,71,72,73,74 described the study sample and setting in sufficient detail. All studies48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75 used valid methods (i.e., validated instruments) to assess the outcomes because this was part of the inclusion criteria. In most studies49,50,51,55,56,58,63,64,65,66,67,68,69,70,71,72,73,74,75, it was not clear if the condition was measured in the same, standard, reliable way for all participants. The statistical analysis, i.e., prevalence reporting, was appropriate in almost all studies48,49,50,51,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75. Most studies48,49,50,51,53,54,55,56,58,61,65,66,67,68,69,70,71,73,75 used claims databases and medical records of all patients, and therefore the response rate and coverage bias assessment were not applicable to them. Furthermore, a small number of studies59,62,72,74 with an adequate response rate (80% or higher) had an unclear risk of coverage bias.

Figure 2
figure 2

Risk of bias: reviewers' judgements about each risk of bias item across all included studies. Presented as percentages.

Nonreporting bias

Reported outcomes were consistent with the stated aims and methods in all studies, except for the study of Kosinska et al.61. In this study, an additional measurement tool10 was mentioned in the abstract, but the outcome value was not clearly reported. We attempted to contact the study authors to clarify this, without success.

Prevalence findings

The results of individual studies that measured the PIP prevalence in patients are presented in Table 2. The two studies that measured the PIP prevalence for prescriptions instead of patients (whose results were not used in data synthesis) reported prevalence rates of 7.4%61 and 2.0%68. The results of the data synthesis, in which 26 studies were included, showed that the median PIP prevalence in older adults residing in the CEE region was 34.6% (minimum 6.5%, maximum 95.8%, interquartile range 25.9–63.2%; 1,139,693 participants; very low certainty of evidence)48,49,50,51,52,53,54,55,56,57,58,59,60,62,63,64,65,66,67,68,69,70,71,72,73,74,75 (see Table 3 and Fig. 3a).

Table 2 Results of individual studies.
Table 3 Summary of findings.
Figure 3
figure 3

Box-and-whisker plots of prevalence of potentially inappropriate prescribing (a) for all outcomes, (b) separately by the overall risk of bias, (c) separately by the setting, (d) separately by the study period. LTC Long-term care.

Benzodiazepines were among the top three most frequently used PIMs among almost all studies and all tools. The omission of statins for primary prevention in diabetes mellitus was among the top three PPOs in all studies48,53,60,62 using the START criteria version 115. However, this item was removed from the revised START criteria version 216 due to the lack of evidence. Only one study (Stojanovic et al.75) used other tools to assess PPOs, the START criteria version 216 and the GheOP3S tool83, which both detected a lack of vaccination as the biggest issue.

Heterogeneity assessment

An informal visual examination of heterogeneity suggested that the PIP prevalence is similar between studies at high risk of bias (20 studies)48,49,50,51,52,53,54,55,56,57,58,60,63,64,67,69,70,71,72,73,75 and those at low risk of bias or with some concerns (six studies)59,62,65,66,68,74. Furthermore, visual examination of the box-and-whisker plots showed that the PIP prevalence may be higher in LTC (six studies)48,71,72,73,74,75 and outpatient settings (seven studies)52,65,66,67,68,69,70 than in acute (seven studies)49,50,51,53,54,55,56,57,58 and community care settings (six studies)48,59,60,62,63,64. Finally, when informally exploring heterogeneity, we found that the prevalence might be higher in studies from 2010 onward (15 studies)48,52,54,56,57,60,62,65,66,68,71,72,73,75 than before 2010 (11 studies)49,50,51,53,55,58,59,63,64,67,69,70,74 (see Fig. 3).

Sensitivity analyses

The PIP prevalence remained consistent with the primary analysis when we reanalyzed the data using the smallest outcome from each study. However, when we used the largest outcome from each study, the PIP prevalence increased. Furthermore, the PIP prevalence remained almost unchanged when we excluded the study with a subset of relevant participants71 (see Supplementary Table S10).

Discussion

This systematic review is the first to estimate the PIP prevalence in older adults across all settings and medications in one region, countries in CEE. We identified that the issue of PIP in older adults was not comprehensively studied in the CEE region, particularly in upper-middle-income countries. Among twenty-six studies48,49,50,51,52,53,54,55,56,57,58,59,60,62,63,64,65,66,67,68,69,70,71,72,73,74,75, the median prevalence of PIP in older adults in the CEE region was 34.6% (interquartile range 25.9–63.2%, 26 studies, 139,693 participants, very low certainty of evidence), determined by data synthesis using the summarizing effect estimates method. Thus, our findings suggest that PIP in older adults is a highly prevalent problem and our informal visual examination of heterogeneity showed that the prevalence of PIP was higher in LTC48,71,72,73,74,75 and outpatient settings52,65,66,67,68,69,70 than in acute49,50,51,53,54,55,56,57,58 and community care settings48,59,60,62,63,64.

Our results are in agreement with those obtained in reviews that used similar inclusion/exclusion criteria and showed PIP prevalences of 22.6% in community-dwelling older persons from Europe31, 33.3% in older persons in primary care settings worldwide32, and 43.2% in older persons residing in LTC settings worldwide33. The results of the review by Morin et al. showed that the prevalence of PIP in LTC residents varied across regions: 49.0% in Europe, 26.8% in North America and 29.8% in other countries33. The variance of the PIP prevalence across countries was also described in the review by Liew et al.: the United Kingdom, Belgium, Australia, and New Zealand had higher PIP prevalences (35.9–59.2%) than the United States, Canada, the Netherlands, and middle-income countries (23.2–29.9%)32. Furthermore, we observed a large variation in the prevalence of PIP across studies (from 6.5 to 95.8%), which is consistent with previous reviews; in the review by Morin et al., the PIP prevalence ranged from 5.4 to 95%33, and in the review by Tommelein et al., it ranged from 0.0 to 98.8%31. We found that benzodiazepines were the most frequently prescribed PIMs, which is in agreement with the findings of Morin et al.33 and Tommelein et al.31. Only the systematic review by Tommelein et al.31 discussed the most prevalent PPOs and found, as we did, that the omission of statins for primary prevention in diabetes mellitus was the most prevalent. The systematic review by Liew et al.32 did not report which medications were most frequently involved in PIP.

Our systematic review is more comprehensive than the above-stated reviews31,32,33 regarding the CEE region because we included 22 studies from CEE that were not reported in these reviews. However, we excluded a study by Primejdie et al.46 reported in the review by Tommelein et al.31 because the author could not provide complete outcome data. Furthermore, our review differs from these reviews in several important methodological aspects: (1) the multiplicity of outcomes: when multiple outcomes per study were available, Tommelein et al.31 and Morin et al.33 did not select one outcome or use a statistical method that accounted for the dependency; on the other hand, Liew et al.32 used multilevel modeling to address the dependency among multiple prevalence estimates from each study; (2) data synthesis: they pooled data using a random-effect method, which we considered inappropriate in our review; (3) risk of bias: they assessed risk of bias with an adapted version of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies87 (Morin et al.33), 'a slightly adapted quality assessment scale from the Cochrane Collaboration group' (Tommelein et al.31) and the Newcastle‒Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses 88 (Liew et al.32); and (4) certainty of the evidence: they did not rate the certainty of evidence (although Liew et al.32 stated in their protocol89 that they would use the GRADE approach43,44,45). However, despite the differences between our review and these reviews, we agree with their conclusions that the PIP prevalence in older adults is high. Additionally, our systematic review showed an increasing trend of PIP over the years, which is in line with the findings obtained by Liew et al.32 and Morin et al.33. We agree with these authors that the increasing prevalence of PIP over time might be due to the increased comprehensiveness of measurement tools, which are able to identify more prescribing problems.

Our review supports findings from the other two systematic reviews31,33 that the long-term use of benzodiazepines and the use of long-acting benzodiazepines are still highly prevalent among older adults. Benzodiazepine use in older adults is associated with cognitive impairment, sedation, delirium, dependence, withdrawal syndrome, and psychomotor impairment that increases the risk of motor vehicle accidents and falls90,91. Two especially important negative outcomes of benzodiazepine use in older adults are falls and fall-related fractures because they are common and important causes of morbidity, mortality, hospitalization, and admission to LTC facilities. Therefore, greater and continued efforts are needed to rationalize benzodiazepine prescribing.

PIP prevalence estimates vary widely across studies for several reasons. The included studies were heterogeneous regarding the inclusion criteria, participants and contexts. Regarding health status, participants varied across studies due to the different inclusion and exclusion criteria that were applied: some studies included higher-risk individuals (e.g., persons with polypharmacy), and some included healthier individuals (e.g., without cognitive impairment). Additionally, countries in CEE differ in the following aspects, which might have changed over time: health and social care systems; legislation and regulations; pharmaceutical pricing and reimbursement models; prescribing practices; the availability of medications considered PIMs in pharmaceutical markets; the availability of medication safety policies, strategies and practices; the availability of medication review and deprescribing services; the availability of interdisciplinary care models; and the availability of health care professionals who are educated and trained in various aspects of geriatrics and geriatric pharmacotherapy. We also noted variation in the types of medication regimens to which the instruments were applied, with most studies using only prescription medicines, which may also impact the PIP prevalence. However, the most important difference between the studies was the considerable variation in outcome measurements, which precluded meta-analysis. Thus, we suggest using validated measurement tools with all their items, which would enable more meaningful comparisons between studies and meta-analyses.

Studies on PIP in older patients residing in the CEE region were conducted across care settings, increasing the generalizability of our findings. However, our findings may be more applicable to high-income countries in CEE because we identified only three studies from upper-middle-income countries. None of the included studies used implicit tools to measure the PIP prevalence. Thus, the results of this review are not applicable to this type of outcome measurement.

We downgraded the certainty of evidence from high to very low for several reasons. First, most studies were at high or unclear risk of bias in one or more risk of bias domains; thus, we downgraded the quality of evidence by one level. Second, although the appropriateness of the sample size was part of the risk of bias assessment, we decided to downgrade the quality of evidence by one level for imprecision. Third, variation in the prevalence estimates across studies was considerable, and consequently, we downgraded the quality of evidence for inconsistency by one level. Finally, we did not downgrade the quality of evidence for the following: minor issues with indirectness (most studies were from high-income countries, and only explicit tools were used) and the possibility of publication bias.

The strength of this review is that we followed the methods outlined in the Cochrane Handbook for Systematic Reviews of Interventions version 6.1 (updated September 2020)92. Furthermore, when potential conflicts of interest existed because the review authors were involved in the studies we considered for inclusion, we excluded these authors from screening, data extraction and the risk of bias assessment.

Although we tried to limit bias at every stage of the review, some limitations remain. First, the risk of publication bias may be considerable due to our decision to include only studies published as full papers in peer-reviewed journals. We thought this would be the most reproducible and transparent approach due to a large volume of gray literature with unverified quality in this area and the absence of study registers and protocols. Second, two authors could not provide the necessary outcome data, and we excluded these studies46,47. Finally, another potential limitation is that we did not fully incorporate the studies from our 'top-up' search into the review.

Conclusions

These results suggest that PIP in older adults is a prevalent problem throughout the CEE region. However, our findings must be interpreted with caution due to the very low certainty of the evidence.

Our review's findings could be used to raise awareness among policymakers, health care professionals, and the general public about the prevalent issue of PIP in older adults, which should be addressed in the near future at the national and international levels. Public health authorities should bring together all stakeholders to tackle this problem, primarily by raising awareness and educating health care professionals and the public about the problem of PIP in older adults and about the validated tools that should be used to minimize this issue and its negative consequences.

More research is needed to strengthen the existing evidence and increase the generalizability of the findings. Further studies should be of high-level quality, i.e., where applicable, the sample size should be calculated, probabilistic sampling should be used, a representative sample should be obtained, the response rate should be calculated, and the differences between responders and non-responders should be examined. Additionally, studies should be conducted in different care settings and countries, particularly in upper-middle-income countries where the evidence is scarce. Finally, studies should be clearly reported using appropriate guidelines.