Live kidney donation: are concerns about long-term safety justified?—A methodological review

Live kidney donors are exhaustively screened pre-donation, creating a cohort inherently healthier at baseline than the general population. In recent years, three renowned research groups reported unfavourable outcomes for live kidney donors post-donation that contradicted their previous studies. Here, we compared the study design and analysis of the most recent and previous studies to determine whether the different outcomes were due to methodological design or reflect a real potential disadvantage for living kidney donors. All six studies on long-term risk after live kidney donation were thoroughly screened for the selection of study population, controls, data quality, and statistical analysis. Our detailed review of the methodology revealed key differences with respect to selection of donors and compared non-donors, data quality, follow-up duration, and statistical analysis. In all studies, the comparison group of non-donors was healthier than the donors due to more extensive exclusion criteria for non-donors. Five of the studies used both restriction and matching to address potential confounding. Different matching strategies and statistical analyses were used in the more recent studies compared to previous studies and follow-up was longer. Recently published papers still face bias. Strong points compared to initial analyses are the extended follow-up time, large sample sizes and better analysis, hence increasing the reliability to estimate potential risks for living kidney donors on the long-term. Future studies should focus on equal selection criteria for donors and non-donors, and in the analysis, follow-up duration, matched sets, and low absolute risks among donors should be accounted for when choosing the statistical technique.


Introduction
Live donor kidney transplantation is the treatment of choice for patients with end-stage renal disease (ESRD). The benefits of this treatment include pre-emptive transplantation, superior organ quality, and increased graft survival [1] and have led to an increase in live kidney donations and consecutive transplants. Despite this increase, the growing demand for donor kidneys cannot be matched, which has led to an increase in the number of extended-criteria live donors with minor comorbidities, such as well-regulated hypertension or higher body mass index (BMI) [2]. As a result, more than 20,000 transplants from live kidney donors are performed annually worldwide, and this number has remained stable over the past decade [3,4].
Live kidney donors are individuals who willingly undergo major surgery to improve the well-being of someone else. It is of the utmost importance to minimize risks, such as the intra-operative risk of bleeding [5,6] and mortality [7], and maximize donor safety during and after donation as well as in the long-term. Live kidney donors are exhaustively screened by a multidisciplinary team of transplant professionals and anaesthesiologists prior to donation, resulting in a cohort that is inherently healthier at baseline than the general population. Therefore, selecting non-donors with baseline health similar to accepted donors is difficult and may affect estimates of any potential risks attributable to donation. In addition to the problem of adequate selection, an extended follow-up period for live kidney donors is important for revealing the risks of donation on their long-term health [8,9].
Three renowned research groups recently uncovered unfavourable outcomes for live kidney donors following donation compared to non-donors, including an increased risk of cardiovascular and overall mortality [8], increased risk of ESRD [8,9], and increased risk of gestational hypertension and preeclampsia [10]. The number of events and absolute risks are low. Previous publications from these research groups [Oslo University Hospital, Johns Hopkins Medical Institutions, and the Donor Nephrectomy Outcomes Research (DONOR) Network] did not demonstrate unfavourable outcomes detrimental to live kidney donors, as they reported a lower risk of longterm cardiovascular [11] and overall mortality [11][12][13] and lower risk of cardiovascular events [13]. This is remarkable because studies from the same research group largely included the same donor population. We compared the study design and analysis of the three most recent studies and the previous studies from the same research groups.

Literature search
We searched for studies that reported negative outcomes following live kidney donation using MEDLINE, Embase, CENTRAL (the Cochrane Library 2013), OvidSP, and Google Scholar.

Literature screening
We selected studies published in the last 5 years with an impact factor [15 or high citation rate [20. We found three studies by three different research groups [8][9][10]. Previously, studies from these research groups reported favourable outcomes following live kidney donation [11][12][13] in the same donor cohort. The discrepancies in outcomes of these studies have been highly debated within the transplant community.

Outcome
In light of the impact of these studies on the transplant community, we compared the methodology used in the studies and the likely impact on outcomes. The six studies were thoroughly screened by two authors (SJ and JNMI) in regard to the selection of the study population, data quality, and statistical analysis.

Outcome and selection of study population
The Norwegian studies by Mjoen et al. were published in 2012 [11] and 2014 [8] and report on a single centre experience with contradictory results ( . However, data on kidney function was not available for non-donors, while donors with low renal function were excluded from the analysis. Though similar donor and non-donor groups were studied, the other two research groups from the US and Canada reported on different outcomes, including long-term mortality [12], ESRD [9], death and major cardiovascular events [13], and gestational hypertension and preeclampsia [10]. The previous US analysis selected a donor cohort from the mandated national Organ Procurement and Transplantation Network (OPTN) registry. A total of 80,347 donors between 1994 and 2009 with a median follow-up of 6.3 years (maximum 12 years) were included in this study, excluding 36 donors for whom age was not recorded or were \18 years old. For the more recent US analysis the selection period was extended to 2011, increasing the donor cohort by 15,870 donors to a total of 96,217 donors with a median follow-up of 7.6 years (maximum 15 years). Both studies derived their comparison group of non-donors from National Health and Nutrition Examination Survey (NHANES) III participants [12]. NHANES participants were matched 1:1 to live donors with replacement to a predetermined maximum permissible radius. If information on the live donor's BMI or systolic blood pressure was not available, a match was selected with healthy BMI (20-30 kg/m 2 ) or systolic blood pressure (100-140 mmHg). Sampling with replacement was performed when a matched participant was the only fit despite ideal and radius matching. In the 2012 Canadian analysis, a donor cohort was selected from live kidney donors who donated between 1992 and 2009 in Ontario, Canada, and were permanent residents of Ontario [13]. The 2015 study included female live kidney donors who   Table 3, women with a previous diagnosis of gestational hypertension or preeclampsia were excluded from the analysis. Furthermore, the index data was extended to ±2 years to account for era effects. Each non-donor could be selected only once, resulting in 380,955 potential female non-donors (52 % of the original sample), though matched sets could be found for only 85 donors.

Data quality
Data for donors and non-donors were collected from preexisting registries or databases (Table 2). Data were collected prospectively in national registries for live kidney donors in Norway, the US, and Ontario. In addition, the

Statistical analysis
All studies used both restriction and matching to address potential confounding except for the 2012 analysis by Mjoen et al. [11] ( Table 3) The Cox regression analyses for all outcomes including ESRD were adjusted for six confounders: age, gender, year of inclusion, blood pressure, BMI, and smoking. A second adjusted model was created after multiple imputation of blood pressure, BMI, and smoking. This latter model was used for the primary analyses. In contrast, the US and Canadian research groups did not alter the restriction and matching methods for their recent analyses. Although all outcomes were reported differently, as percentages, hazard ratios, or odds ratios depending on the statistical methods used. The US research group performed a Kaplan-Meier analysis in both studies but used a bootstrap method to properly estimate the variance of repeated sampling of nondonors in their most recent study [9]. The crude incidence of ESRD was 9 out of 1901 donors and 17 out of 32,621 non-donors, resulting in 36 cases of ESRD in the nondonor group after matching with replacement. Persons aged C65 years, African Americans, and Mexican Americans had an increased risk of ESRD, whereas Caucasian nondonors had no risk of ESRD. In the 2012 study by the Canadian research group [13], differences in baseline characteristics between donors and non-donors were assessed using standardized differences. If these differences were [10 % they would reflect a meaningful imbalance. A two-sided log-rank test stratified on matched sets was used to compare differences in death and cardiovascular outcomes between donors and non-donors. Furthermore, a Cox regression stratified on matched sets was used to estimate hazard ratios with 95 % confidence intervals. In the 2015 study by Garg et al. [10], generalized linear models with generalized linear estimating equations were used to compare the characteristics of donors and non-donors at the index date, and generalized linear mixed models with a random intercept and random effects logistic regression models were used to compare pregnancy characteristics and outcomes. These methods account for the correlation structure within matched sets and in women with more than one pregnancy during follow-up.

Discussion
Our detailed review of the methodology of the different studies on long-term risk after live kidney donation revealed key differences with respect to the comparability of donors and non-donors in regard to selection, data quality, follow-up, and statistical analysis ( Table 4).

Selection of the study population
Donors are a pre-screened healthy selection of the population. This is a key issue to account for when selecting the comparison group of non-donors. Furthermore, the extended donor selection criteria during the past decade [2] complicate restriction rules when including non-donors. Both Norwegian studies are a good example of choosing a more appropriate comparison group when studying the same donor population. In the 2012 study by Mjoen et al. [11], the full Norwegian background population was a comparison group without restriction according to the live kidney donor selection criteria. Therefore, the risk attributable to donation could be underestimated despite matching 1:3 on age, gender, and year of birth to account for confounding. In their 2014 study, Mjoen et al. [8], used the healthiest donors from the earlier study. In addition, more healthy non-donors were derived from a Norwegian population-based cohort study [14]. The restriction rules for donors and non-donors did not entirely lead to a match on renal function, cardiovascular disease, and subjective perception of health, leading to the possible overestimation of risk detrimental to donors because of healthier non-donors. The US studies used more extensive restriction rules and matching for NHANES III participants compared to the healthier donors. NHANES III participants were derived from 81 counties in the US based on geography and the proportions of minority populations using probability proportionate to size sampling. Young children, persons aged C65 years, African Americans, and Mexican Americans were subgroups that were oversampled and were not representative of the donor population, the majority of which is Caucasian (75 %). Both studies used a similar restriction and matching strategy. The entire NHANES III cohort comprised 20,024 adult participants. The excluded group (n = 10,660) also contained participants who would be eligible for living donation, presumably making the nondonor group somewhat healthier than the donor population. The 9364 eligible NHANES III participants were significantly younger, more educated, had a higher proportion of women and Caucasians, and had a lower proportion of smokers than the donor population. This difference may have led to an overestimation of risk attributable to donation, which was however not demonstrated in the study by Segev et al.. The 2014 study by Mjoen et al. did demonstrated an increased mortality risk for donors. In the more recent US study, the strict selection of healthier non-donors made them less likely to develop ESRD. The donor population had significantly higher systolic blood pressure, BMI, and fraction of smokers at baseline, which are all factors associated with an increased risk of ESRD [15]. Thus, the risk attributable to donation was likely overestimated. In a recent study by Grams et al. [16], a proportion of the same aforementioned US donor population consisting of 52,998 live kidney donors was analysed based on their 15-year projected risk of ESRD, which was previously reported by Muzaale et al. [9]. In this recent study, the risk of ESRD among live kidney donors was compared to a meta-analysis of 4,933,314 participants in seven general population cohorts who would be eligible for living kidney donation according to 10 demographic and health characteristics. The average follow-up for these cohorts was 6.4 years and their 15-year risk projections for ESRD were compared among US live kidney donors. The donors had a 3.5-5.3-times higher projected 15-year risk than nondonors. As pointed out by Steiner [17], the previous US study by Muzaale et al. [9] reported an 8-times higher incidence of ESRD among donors than non-donors. This finding supports the notion that the risk attributable to donation was overestimated in that study.
Both Canadian studies used a similar restriction and matching strategy. The extended live donor eligibility criteria over the years have caused the broad exclusion criteria to encompass participants who would be eligible for living donation, making the non-donor group healthier. Furthermore, any person who failed to see a physician at least once in the 2 years before the index date was not included in the analysis in order to ensure that everyone who was included in the analysis had access to health care. This restriction could have led to the exclusion of the healthiest non-donors who did not require any medical attention in the past years and who would be highly eligible for living donation. Nevertheless, this exclusion criterion of healthier nondonors in the Garg et al. [10] study had no effect on the study results in a sensitivity analysis.

Data quality
The strength of the data collection in all studies was that all data were collected mostly from national prospective registries. The Canadian studies even verified donor data with the donors' medical records. However, there were some limitations in the data collection in regard to donor and non-donor medical outcomes and missing additional information on outcomes. Donors could be more aware of their health than non-donors, leading to differential misclassification because all outcomes except for death could have been registered earlier. This could have led to more registered outcomes among donors and an overestimated risk attributable to donation. Non-donor data from the population-based studies included data from surveys, giving a subjective rating of HUNT 1 and NHANES III ? or -bias present/not present in study participants' health. These non-donor data were not verified with medical records, but were used for restriction, which could have led to an underestimation of risk among non-donors. The 2014 US study prioritized live kidney donors who developed ESRD on the deceased donor transplant waiting list [18]. Pre-emptively placing live kidney donors on the deceased waiting list possibly resulted in more registered donors with ESRD. This was seen in the higher crude incidence of ESRD among donors (99 out of 96,217) compared to non-donors (crude incidence 17 out of 9364). Non-donors who registered pre-emptively on the deceased waiting list were not identified as having ESRD, which caused a delay in the registration of ESRD for non-donors. However, their follow-up was longer than that of donors, and most nondonors would either receive a transplant or initiate dialysis shortly thereafter. Errors in the estimation of outcomes occurred in donors who emigrated; given the large sample sizes in both US studies, this is accepted to have had no material effect on the outcomes of the studies. Moreover, it will not affect the other studies given the high donor participation.
In the Garg et al. 2012 study, data on blood pressure is lacking [19], though previously the same authors demonstrated an increase in blood pressure [20], which increases the risk of cardiovascular events and mortality [21]. Lely et al. [22] pointed out that the severity and gestational age at which preeclampsia and gestational hypertension were diagnosed was not provided in Garg et al.'s 2015 study. Given that the rate of premature birth was not increased, only mild or at-term preeclampsia likely occurred [22]. Although there is an increased risk of preeclampsia and gestational hypertension in donors, the absolute risk is low and the severity of the complications, such as premature birth, are less than expected from a gynaecological point of view.

Follow-up and statistical analysis
Differences between donors and non-donors in regard to comparability and follow-up should be accounted for during the analysis to overcome confounding. Restriction and matching is the first step, but matched sets and comparability should also be taken into consideration during the analysis. In Mjoen et al.'s 2014 study, the starting date of the follow-up for donors occurred decades earlier, causing an increased duration of follow-up, as pointed out by Boudville et al. [23], leading to a maximum follow-up time of 43.9 years for donors compared to a maximum of 24.9 years for non-donors. Boudville et al. suggested that secular changes in individuals' health and health care made the baseline characteristics not fully comparable between the groups and could have resulted in a higher incidence of ESRD among donors. The authors tried to correct for this bias by adjusting for year of inclusion. Furthermore, Boudville et al. [23]. raised some concerns about statistical overfitting of the models used. For Cox proportional hazard models, a rule of thumb is to have at least 10 events per added confounder [24]. For the outcome ESRD, there were 31 reported events, but the primary analysis adjusted for six confounders. Both factors could have led to an overestimated risk attributable to donation. Furthermore, what stands out in the baseline characteristics of the donors and non-donors before any matching or adjustments were performed, as pointed out by Kaplan et al. [25], was the mean age difference of 46.0 ± 11.5 versus 37.6 ± 11.7 years, respectively. The higher age of donors could have been a plausible explanation for their increased risk of mortality. The Norwegian authors later replied that this difference was corrected by using coarsened exact matching in the survival analysis, which created strata of the potential confounders: age, gender, year of inclusion, blood pressure, BMI, and smoking. Donors and non-donors were matched based on these strata, after which the analysis was performed on non-coarsened data. After this matching the mean age of donors and non-donors was 46.0 versus 45.7 years, respectively [26]. For both US studies, Matuchansky [27] pointed out that a different NHANES cohort should have been selected instead of the participants from NHANES III; they proposed that participants in the ''continuous NHANES'' cohort beyond 1994, up to 2006, would have been a better chronological fit for their study cohort [27]. The US authors replied that the strength of NHANES III lies in its larger sample size, greater number of geographic areas, and availability of mortality linkage beyond 10 years. Furthermore, a limitation of ''continuous NHANES'' is that it cannot be used for survival comparisons [27,28]. By using their specific bootstrap, the authors stated that this technique does not lead to bias, and differences in follow-up were accounted for by their use of survival analysis [29]. As pointed out by Gill et al. [30], in an editorial accompanying the study by Muzaale et al. [9], the crude incidence of ESRD was extremely low for NHANES participants: 17 out of 9364. Taken together with the longer follow-up of non-donors, replacement of non-donors with long event-free survival in matched analysis may have underestimated the risk of ESRD in non-donors [30]. The matching technique was also discussed by Matas et al. [15], who stated that matching with replacement could magnify any differences between donors and non-donors [15]. Furthermore, how many times each control was used was not stated. The authors replied that this technique has been established and that a specifically designed bootstrap was created to estimate the variance [31].

Future perspectives
Live kidney donors are individuals who are not patients themselves, and submitting them to a surgical procedure stretches the Hippocratic oath taken by physicians. Although the absolute risks for donors following donation are very low, increased risks seem to exist among live kidney donors compared to non-donors. Risks both during and after donation are taken for granted by live kidney donors to help patients with ESRD [32]. Reduced risk of life-time dialysis, improved quality of life, and prolonged survival are gained by the recipients [1]. Furthermore, transplantation is far more cost-effective than dialysis [33,34]. Nevertheless, these benefits for kidney transplant recipients should not outweigh the risks for live donors after donation. Therefore, future studies should focus on long-term outcomes following donation in which the risks for donors are taken into consideration against the risks for comparable non-donors.

Conclusions
We conclude that recently published papers still face bias that could have led to a potential overestimation of risk attributable to donation. Even if risks are elevated among live kidney donors compared to non-donors, the absolute risks for donors following donation are very low and should therefore not discourage potential donors. Strong points of recent analyses compared to initial analyses are the extended time of follow up after donation, large sample sizes and better analysis, hence increasing the reliability to estimate potential risks for living kidney donors on the long-term. Key problems remain such as that donors are a pre-screened healthy selection of the general population, making it difficult to find an equal healthy unscreened comparison group. Specifically, not all required clinically relevant data are available for potential comparison groups. Selecting a healthier comparison group overestimates the risk attributable to donation. Future studies should focus on equal inclusion criteria for donors and nondonors, and in the analysis, follow-up duration, matched sets, and low absolute risks among donors should be accounted for when choosing the statistical technique. Ideally, long-term outcomes should uncover risk estimates for potential donors and how these risks would change if an individual becomes a live kidney donor.

Compliance with ethical standards
Conflict of interest The authors of this manuscript have no conflict of interest to disclose.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.