Ethnogeographic and inter-individual variability of human ABC transporters

ATP-binding cassette (ABC) transporters constitute a superfamily of 48 structurally similar membrane transporters that mediate the ATP-dependent cellular export of a plethora of endogenous and xenobiotic substances. Importantly, genetic variants in ABC genes that affect gene function have clinically important effects on drug disposition and can be predictors of the risk of adverse drug reactions and efficacy of chemotherapeutics, calcium channel blockers, and protease inhibitors. Furthermore, loss-of-function of ABC transporters is associated with a variety of congenital disorders. Despite their clinical importance, information about the frequencies and global distribution of functionally relevant ABC variants is limited and little is known about the overall genetic complexity of this important gene family. Here, we systematically mapped the genetic landscape of the entire human ABC superfamily using Next-Generation Sequencing data from 138,632 individuals across seven major populations. Overall, we identified 62,793 exonic variants, 98.5% of which were rare. By integrating five computational prediction algorithms with structural mapping approaches using experimentally determined crystal structures, we found that the functional ABC variability is extensive and highly population-specific. Every individual harbored between 9.3 and 13.9 deleterious ABC variants, 76% of which were found only in a single population. Carrier rates of pathogenic variants in ABC transporter genes associated with autosomal recessive congenital diseases, such as cystic fibrosis or pseudoxanthoma elasticum, closely mirrored the corresponding population-specific disease prevalence, thus providing a novel resource for rare disease epidemiology. Combined, we provide the most comprehensive, systematic, and consolidated overview of ethnogeographic ABC transporter variability with important implications for personalized medicine, clinical genetics, and precision public health.


Introduction
ATP-binding cassette (ABC) transporters are a superfamily of membrane proteins that, in humans, comprise 48 genes. ABC transporters catalyse the translocation of a wide spectrum of endogenous substrates across biological membranes, including amino acids, sugars, nucleosides, vitamins, lipids, bile acids, leukotrienes, prostaglandins, uric acid, antioxidants, as well as a multitude of natural toxins . In addition, ABC transporters mediate the export of a plethora of drug substrates, including calcium channel blockers, HIV protease inhibitors, vinca alkaloids, topoisomerase inhibitors, methotrexate, anthracyclines, and taxanes, into the extracellular space and are thus key modulators of drug resistance, particularly in oncology (Robey et al. 2018). Hence, ABC transporters are of specific clinical and regulatory interest for their involvement in drug-drug interactions (König et al. 2013;Marquez and Van Bambeke 2011;. Genetic variants in ABC transporters contribute to the interindividual variability in the risk of adverse drug reactions and treatment efficacy, and are key modulators of drug resistance. Arguably, the most studied are polymorphisms in ABCB1 (encoding MDR1, P-gp), which have been associated with methotrexate clearance (Kim et al. 2012a), response to antiretroviral protease inhibitors (Coelho et al. 2013), as well as with pharmacokinetics, response, and toxicity of imatinib (Dulucq et al. 2008;Ma et al. 2017). Similarly, variants in ABCG2 (encoding BCRP) were reproducibly associated with exposure and response to statins (Bailey et al. 2010;Chasman et al. 2012;Hu et al. 2011) and allopurinol (Roberts et al. 2017;Wen et al. 2015). In addition to their pharmacogenetic importance, genetic variation in 21 ABC transporters can cause congenital diseases, the most common of which is cystic fibrosis (OMIM 219700) caused by variants in ABCC7 (CFTR).
Importantly, while many studies have provided critical data about the clinical importance of ABC polymorphisms (Bosch et al. 2005;Fukushima-Uesaka et al. 2007;Honjo et al. 2002;Leschziner et al. 2006;Pramanik et al. 2014;Saito et al. 2002;Słomka et al. 2015), information about their population frequencies is limited and mostly derived from relatively small, heterogeneous cohorts. Furthermore, most studies only interrogated a few selected candidate variants and did not map the entire landscape of rare genetic variability that is characteristic for pharmacogenes (Bush et al. 2016;Fujikura et al. 2015;Gordon et al. 2014;Ingelman-Sundberg et al. 2018;Kozyra et al. 2017;Wright et al. 2018;Zhou and Lauschke 2018). Importantly, the increasing prevalence of Next-Generation Sequencing (NGS) projects on a population scale allows for the first time to systematically parse the inter-individual and inter-population variability in ABC transporter superfamily.
In the current study, we systematically parsed the inter-individual and inter-population variability in the ABC transporter superfamily by analyzing whole-exome and whole-genome sequencing (WES and WGS, respectively) data from 138,632 individuals across seven major human populations. Using this large data set, we provide frequencies of 51 ABC variants and haplotypes frequencies with demonstrated clinical relevance. In addition to these well-characterized variations, we identified 62,793 exonic variants, the vast majority of which were rare and have not been characterized. Computational analyses using five partly orthogonal algorithms predicted that 19,309 of these (31%) resulted in functional alterations of the respective transporter protein. To substantiate these estimates, we mapped the identified genetic variability onto experimentally determined or homology-modeled transporter structures and found multiple amino acid exchanges in residues important for substrate binding and transporter function. The present study constitutes the most comprehensive analysis of genetic variation in the ABC superfamily published to date and the identified genetic complexity might have important implications for the evaluation of drug transporter variability during drug development and the personalized prediction of drug disposition, response, and toxicity.

Data collection and definitions
Single-nucleotide variant (SNV) and indel frequency data across 48 human ABC transporters were collected from WES and WGS data from 138,632 individuals (12,020 Africans,17,210 Latinos,5076 Ashkenazi Jews,9435 East Asians,15,391 South Asians,12,897 Finns,63,and 3234 from other ethnic groups) acquired from the Genome Aggregation Database (Lek et al. 2016). Variants with MAF < 1% or MAF < 0.1% were defined as rare and very rare, respectively. Copy-number variation (CNV) data were extracted from the Exome Aggregation Consortium database using genomic information from 59,451 individuals and analyzed as previously described (Santos et al. 2018). Linkage disequilibria were computed by leveraging linkage from the 1000 Genomes Project using LDLink (Machiela and Chanock 2015). The Online Mendelian Inheritance in Man (OMIM) database was used to identify ABC genes associated with Mendelian disease, as well as their mode of inheritance (Amberger et al. 2015). One-way ANOVA was used to compare the difference between variant number across ABC subfamilies.

Variant effect predictions
To predict the functional consequences of missense variants, we used a panel of computational algorithms that analyze sequence conservation, as well as variant effects on physicochemical amino acid properties, solvent accessibility, and structural features. Specifically, we selected SIFT (Ng and Henikoff 2001), Polyphen2 (Adzhubei et al. 2010), Mutatio-nAssessor (Reva et al. 2011), VEST3 (Carter et al. 2013), andEigen (Ionita-Laza et al. 2016), as they showed the best predictive performance in three independent benchmarking data sets (Li et al. 2018a). Variants were categorized as deleterious when the ≥ 50% of algorithms predicted effects on transporter function. In addition, all frameshifts, in-frame deletions or insertions, start-lost, stop-gained, or canonical splice site variants were regarded as putatively deleterious. For Mendelian disease analyses, ClinVar (Landrum et al. 2014) was used to remove benign variants from diseaseassociated ABC genes.

Structural analysis
We analyzed the impact of genetic variation on ABC transporter structures for the entire ABCA, ABCB, and ABCC transporter families (35 proteins in total). Experimentally determined crystal structures were available for 18 ABC transporter proteins and were extracted from PDB (Berman et al. 2000) and the available literature. The remaining 16 transporter structures were modeled based on homology using Phyre2 (Kelley et al. 2015). The structure of ABCA13 could not be modeled reliably and was thus excluded. PyMOL (version 2.1.1) was used to map the genetic variability data onto the corresponding transporter structures.

Genetic variability of the human ABC transporter superfamily
We systematically analyzed the genetic variability profiles of all 48 members of the human ABC transporter gene superfamily using NGS data from 138,632 individuals. In total, we identified 62,793 variants in exons, the majority of which were missense (n = 33,340; 53%), followed by synonymous (n = 14,503; 23%) and UTR variations (n = 10,495; 17%; Fig. 1a). Importantly, the vast majority of variations (n = 61,876; 98.5%) were rare with minor allele frequencies (MAF) < 1%, whereas only 917 (1.5%) variations were common (Fig. 1b). In addition, we found 1003 deletions or duplications spanning at least one ABC exon, jointly referred to as CNVs, as well as 32,333 intronic variants. The latter were, however, not systematically covered and thus excluded from further analyses.
Notably, the number of genetic variations differed considerably between ABC subfamilies and genes. Overall, the number of variants in the ABCA family of lipid transporters was significantly higher than in other ABC subfamilies (p = 0.002; fold difference = 1.9; Fig. 1c). Of all members of the human ABC superfamily of genes, the lipid transporters ABCA13 (n = 4310), ABCA7 (n = 274), and ABCA4 (n = 2224) harbored the highest number of variants, whereas > 10-fold less variations were found in ABCD1 (n = 496), ABCE1 (n = 407), and ABCB7 (n = 271; Fig. 1d). However, when the number variants were normalized by gene length, no significant differences were identified between the subfamilies (Supplementary Figure 1A). In contrast, variability varied more than sevenfold between different ABC genes with ABCB9 (n = 802.4 variants/kb) and ABCB8 (n = 537.4 variants/kb) being most polymorphic, whereas ABCB7 was most invariant (n = 120.1 variants/kb; Supplementary Figure 1B). To directly compare the evolutionary constraint, we compared the observed number of missense and loss-of-function variants in ABC genes with the expected numbers based on the genetic background variability. Missense variations in ABCC9, ABCA2, and ABCE1 were most depleted, whereas, surprisingly, CFTR was least conserved and harbored 30% more missense variations than expected by chance ( Supplementary Figure 2A; Supplementary Table 1). Based on genetic constraints on loss-offunction variations, 4 genes, including ABCA2 and ABCE1, as well as ABCB7 and ABCD1 were considered as haploinsufficient, whereas little constraint on loss-of-function variations was detected in the remaining 44 ABC transporters (Supplementary Figure 2B; Supplementary Table 1).
In addition to SNVs, 46 of the 48 ABC transporter genes (96%) harbored CNVs, in which multiple exons up to the entire were deleted or duplicated (Fig. 1e). Overall, most CNVs were detected for ABCC6 (230 CNVs), ABCC1 (178 CNVs), and ABCA6 (81 CNVs), whereas no CNVs were identified in ABCB7 and ABCD1. While these CNVs are very likely to result in functional alterations, all deletions and duplications were found to be very rare with minor allele frequencies < 0.1%.

Worldwide frequencies of human ABC transporter polymorphisms with putative clinical relevance
Next, we systematically analyzed the global and populationspecific frequencies of clinically important variants in ABC transporters linked to drug response or ADR risk. Specifically, we considered all variants as putatively clinically relevant for which an association with drug-response phenotypes or related traits, such as overall or disease-specific survival upon chemotherapy, have been reported. In ABCB1, we assessed the population frequencies of 10 SNPs ( Table 1). The missense variant rs2032582 and the synonymous polymorphisms rs1045642 constitute arguably the most extensively studied ABCB1 variants and have been associated with risk of adverse reactions upon fluoropyrimidine therapy (Gonzalez-Haba et al. 2010) as well as toxicity to taxanes (Kim et al. 2012b) and anthracyclines (Ji et al. 2012;Wu et al. 2012). These variants are in strong linkage disequilibrium (Horinouchi et al. 2002) and have been shown to be associated with altered mRNA levels and protein folding (Cascorbi 2006). Rs2032582 constitutes a triallelic variant of amino acid position 893 with the reference sequence encoding an alanine and variants giving rise to a serine or threonine, respectively (Supplementary Figure 3). Ala893 is the predominant allele in Africans and East Asians, whereas in South Asians, Ser893 is most abundant (frequency 60.9% compared to 34.8% for Ala893). Thr893 is less prevalent ranging in frequencies between 0.4% in Africans and 13.3% in East Asians. Further ABCB1 variants of clinical relevance are the missense variants rs2229109 and rs9282564, which are associated with increased risk of relapse of acute lymphoblastic leukemia (Gregers et al. 2015) and paclitaxel toxicity (Bergmann et al. 2012), respectively. Both variants are most frequently found in Europeans (MAF = 4.3% and 10.8%) and least prevalent in Africans (MAF = 0.7% and 1.6%) and East Asians (MAF = 0 and < 0.1%). Linkage analyses revealed one haplotype block of four SNPs (rs1128503, rs4148737, rs12720066 and rs1045642) with moderate-linkage disequilibrium, which could have potentially important implications for clinical associations of these variants (Supplementary Figure 4A).
In the ABCC subfamily, we analyzed the populationspecific frequencies of 25 SNVs that were correlated with chemotherapy outcomes or toxicity (Table 2). Interestingly, frequencies of risk variants for anthracycline-induced 1 3 cardiotoxicity (ACT) were highly population-specific and differed > 100-fold between populations. The cardioprotective synonymous variant rs246221 in ABCC1 (Semsei et al. 2012) was most common with frequencies between 20.3% and 65.2% in South Asians and Africans, respectively. By contrast, East Asians did not harbor the risk variants rs8187710 (ABCC2) and rs45511401 (ABCC1), which are common in all other populations with frequencies up to 5.6% and 15.7%, respectively. Notably, rs45511401 is in linkage disequilibrium with the intronic ACT risk variant rs4148350 (R 2 = 0.153; Supplementary Fig. 1 Overview of the genetic germline variability in the human ABC transporter family. a In total 62,793 exonic variants and 1003 copy-number variations (CNVs) were identified across all 48 human ABC genes in 138,632 individuals. b The vast majority of exonic ABC variants were rare with 98.5% occurring in less than 1% of alleles worldwide. In addition, 51.1% of all variants were only found in a single individual. c ABCA genes harbour significantly more variations than members of other ABC subfamilies (p = 0.002; ANOVA). These differences were mostly related to gene length (compare Supplementary Figure 1). d Stacked column plot depicting the number of variants across variants classes for all 48 ABC genes. e The number of CNVs that affect at least one exon are shown   Figure 4B), indicating that both associations might to some extent be traced back to the same genetic signal.

Number of variants
Multiple ABCC variants associated with irinotecan (rs3740066 in ABCC2, rs4148405 in ABCC3 as well as rs3749438 and rs10937158 in ABCC5) or taxane (rs12762549 in ABCC2 as well as rs2238472 and rs2125739 in ABCC6) toxicity or response were overall less populationspecific and differed only by < 3-fold across populations with the exception of rs17501331 in ABCC1, which was not identified in East Asians (MAF = 0%) but reached frequencies of 11.7% and 10.3% in Ashkenazim and Europeans. By contrast, variants associated with response to platinum-based therapy differed substantially between ethnicities, including rs717620 (MAF between 21.4% in East Asians and 5.8% in Africans), rs17222723 (MAF between 12.8% in Ashkenazi Jews and < 0.1% in East Asians), and rs1051640 (MAF between 18.7% in Jews and 5.6% in East Asians). MRP8 encoded by ABCC11 is an export pump for nucleotide analogues (Oguri et al. 2007) and is associated with pemetrexed resistance (Uemura et al. 2010). The variant rs17822931 that results in proteasomal degradation of MRP8 (Toyoda et al. 2009) differs > 30-fold between populations with relatively low frequencies in Africans (MAF = 2.8%), whereas the variant constitutes the dominant genotype in East Asian populations (MAF = 87%).
The ABCG2 gene, encoding the BCRP transporter, harbors two important missense polymorphisms, which have been consistently implicated in response and toxicity of TKIs (Table 3). Rs2231142 results in increased risk of gefitinib toxicity (Cusatis et al. 2006) and increased rates of major molecular response to imatinib (Jiang et al. 2017). Similar effects on response and overall survival were found for rs2231142 (Chen et al. 2015b;Kim et al. 2009a), which is not linked with rs2231137 (Supplementary Figure 4C). Notably, both variants were most prevalent in East Asian and Latin Americans, whereas their frequencies were substantially lower in all other populations analyzed. Only a few associations of pharmacological or toxicological phenotypes with genetic variants in ABC transporters beyond ABCB1, ABCG2, and the ABCC subfamily have been presented to date (Supplementary Table 2).

Functional consequences of rare genetic variation in human ABC transporters
Next, we aimed to estimate the functional importance of rare ABC variations for which no experimental analyses or clinical association data were available. To this end, we used five partly orthogonal algorithms to predict the functional consequences. Of all 37,467 variants affecting the amino acid sequence of the encoded polypeptide, 19,309 variants (51.5%) were predicted to result in functional alterations of the respective ABC transporter ( Fig. 2a; see methods). While functional effects can comprise both, variations that result in increased or decreased transporter function, previous studies showed that computational algorithms are significantly better at predicting loss-of-function effects compared to gainof-function effects (Flanagan et al. 2010). We thus refer to variants with putative functional impacts as "deleterious" throughout this manuscript; however, we would like to alert the reader that the inclusion of some variants that result in increased transporter function cannot be excluded. Most deleterious variants were found in ABCA13 (n = 1183), ABCA7 (n = 953), and ABCA4 (n = 865), whereas ABCE1 (n = 60) and ABCB7 (n = 43) harbored least (Fig. 2b). The multi-drug resistance transporters ABCB1 (n = 344), ABCC1 (n = 453), and ABCG2 (n = 315) harbored medium numbers of variants with functional consequences.
Notably, only 14.8% (30 of 203) of common ABC missense variants with MAF > 1% were putatively deleterious, compared to 45.7% (15,152 of 33,137) for rare variations. The burden of functional genetic variability differed drastically between genes with an average diploid human genome harboring on average 1.8 and 1.2 variants with functional effects in ABCB5 and ABCB1, respectively, whereas 29 transporters were highly conserved with < 0.1 functional variants per individual genome (Fig. 2c). In some transporters, including ABCB1 and ABCG2, rare variations explained less than 10% of the genetically encoded functional variability. In contrast, rare variants are estimated to account for all variants with functional consequences in half (24 out of 48) of all human ABC transporter genes. Interestingly, the fraction of genetically encoded functional variability correlated significantly with the genetic constraint on the respective genes (r = 0.4; p = 0.005), suggesting that high evolutionary pressure tends to select against common variations that alter ABC transporter function. Overall, each individual was found to harbour 9.8 variants in the ABC gene family that entail functional alterations, of which 21% were attributed to by rare genetic variants (Fig. 2d).

Genetic ABC transporter variability is highly population-specific
The genetic landscape of the ABC transporter superfamily differed considerably between human populations. Of the putatively deleterious variants, only 24% were shared between two or more ethnicities, whereas 76% were population-specific (Fig. 3a). Most population-specific variants were found in Europeans (6815), whereas least were found in Ashkenazim (136). These differences are likely, at least in part, due to the unequal distribution of available sequencing data and the differences in genetic heterogeneity between the populations (Fig. 3b). The ratios of population-specific variants differed between ABC genes from 70% in ABCA7 to  (Fig. 3c). The observed population specificity is estimated to translate into inter-ethnic differences in ABC transporter function. The largest differences in variants with putative functional impacts across populations were identified for ABCA10 where Africans harbor 2.4 putatively functional variations per individual compared to 0.3 in Europeans (Fig. 3d). Similar differences were observed for the breast cancer risk gene ABCC11 (1.8 in East Asians compared to 0.5 in Africans), as well as the multi-drug resistance genes ABCB1 (1.4 in South Asians compared to 0.2 in Africans) and ABCG2 (1.3 in East Asians compared to 0.1 in Europeans). In contrast, inter-ethnic variability in ABCC1 was less pronounced (0.16 in Europeans compared to 0.02 in East Asians). Overall, across the entire ABC transporter family Africans harbored most variations with putative functional impacts (13.9 deleterious variants per individual), whereas least variations were observed in South Asians (9.3 deleterious variants per individual; Fig. 3e).

Structural consequences of genetic ABC variability
Next, we characterized the distribution of genetic variability across ABC transporter domains by mapping the identified genetic variants onto the tertiary structures of

Europeans
South Asians Ashkenazim East Asians Africans D Fig. 3 The genetically encoded functional variability of ABC transporters is highly population-specific. a The majority of genetic variations (76%) with putative functional impacts on ABC transporter function are population-specific. b Most of these population-specific variations were identified in Europeans. Numbers in bold indicate the total number of identified population-specific variations, while numbers in brackets denote the number of sequenced individuals for the respective population. c Stacked column plot showing the fraction of putatively functional variants specific to Europeans (red), Africans (orange), East Asians (yellow), South Asians (light green), Ashkenazi Jews (dark green), Finns (blue), and Latinos (purple). The fraction of variations that are found in at least two populations are shown in grey. d The number of ABC variants with functional consequences per individual is shown across populations. e Column plot depicting the functional ABC transporter variability when all putatively deleterious ABC transporter variants are aggregated. Note that African individuals harbour most functionally relevant ABC variants per individual, whereas functional variability in South Asians was overall lowest the respective. We used experimentally determined crystal structures for all transporters of the ABCA, ABCB, and ABCC families for which such information was available (n = 18), while the remaining 16 structures were predicted using homology modeling. Typical ABC transporters consist of two α-helix transmembrane domains (TMDs) and two cytoplasmic nucleotide-binding domains (NBDs) that catalyse ATP hydrolysis (Fig. 4a). In addition to this backbone, some transporters have additional domains. ABCA transporters have two large extracellular domains (ECDs), while transporters of the ABCB and ABCC subfamilies contain an additional N-terminal TMD0 domain with unclear functional relevance. Furthermore, seven ABC genes of the ABCB subfamily encode only half-transporters (one NBD and one TMD domain) that require homo-or heterodimerization for transporter activity.
When stratifying by domains, we found that genetic variability differed substantially between transporters (Fig. 4b).  The lowest numbers of variants per residue were found in the TMD0 domains of ABCB transporters with 0.21 variants/ amino acid. In contrast, the NBD2 domains of ABCB and ABCC transporters are more variable (0.35 variants/amino acid). For individual genes, the TMD1 (0.05 variants/amino acid) and NBD1 domains (0.07 variants/amino acid) of ABCB7 were most conserved, while the TMD1 and TMD2 domains of ABCC7 (0.65 variants/amino acid) and ABCA7 (0.56 variants/amino acid), respectively, were > 10-fold more variable.
Finally, we aimed to corroborate our computational variant predictions using structural mapping approaches by focussing on the pharmacogenetically most important ABC transporter, MDR1 (also known as P-gp; encoded by ABCB1), for which high-resolution crystal structures are available (Kim and Chen 2018) (Fig. 4c). The clinically important missense variation A893S/T is located in the second intracellular loop of TMD2, which interacts with NBD1, and is necessary for structural stability. The S400N polymorphism is localized directly adjacent to the critical tyrosine at position 401, which coordinates the ATP in its binding pocket in NBD1 by direct van-der-Waals interactions with the adenine of the bound ATP molecule. Q1107P resides within the NBD2 Q-loop, which is necessary for ATPase activity and stabilizes the NBD dimer. No common variants were identified in any transmembrane helix or extracellular domain. However, we found a variety of rare variations in structurally important residues, including variants at the catalytic glutamate residue 556, which is required for ATP hydrolysis (Sauna et al. 2002), as well as various amino acid exchanges in the functionally critical NBD1 and NBD2 Q-loops (Zolnerciks et al. 2014).

Ethnogeographic distribution of pathogenic ABC alleles can inform about Mendelian disease epidemiology
We previously showed that the frequency of loss-of-function variants in SLC transporter genes implicated in recessive Mendelian disorders are suitable proxies to estimate population-specific disease risk (Schaller and Lauschke 2019). Here, we analyzed whether similar associations could be identified for ABC transporter genes. To this end, we comparatively analyzed the frequencies of lossof-function variants, defined as frameshifts, start-lost or stop-gain variations or variants that affected critical splice site residues, in ABC transporter genes with or without implication in hereditary disease (Fig. 5).
Overall, 17 of 48 ABC genes are linked to autosomal recessive Mendelian disorders (Supplementary Table 3). Reduced CFTR (ABCC7) function is associated with cystic fibrosis (CF; OMIM 219700). We calculated homozygosity frequencies for ABCC7 loss-of-function variants of 1 in 1850 and 1 in 4300 in Ashkenazim and European individuals, whereas frequencies in individuals of Africans and Asian ancestry were 1 in 24,000 and < 1 in 40,000, respectively. Impaired function variants in ABCC6 are associated with pseudoxanthoma elasticum (PXE; OMIM 264800). In our data set, we find the highest aggregated ABCC6 loss-of-function frequency in individuals of East Asian ancestry (0.5%), resulting in estimates of affected individuals of 1 in 42,530. Similarly, high carrier rates were identified in Europeans (0.4%; 1 in 52,000) and Finns (0.4%; 1 in 82,000), whereas risk allele prevalence was significantly lower in all other populations. Congenital generalized hypertrichosis (OMIM 135400) is a rare disease with varying presentations and comorbidities that is speculated to be, at least in part, caused by loss of ABCA5 function (DeStefano et al. 2014). While global prevalence rates have, to our knowledge, not been reported, the disease was originally described in individuals of Mexican ancestry (Pavone et al. 2015), aligning with our finding of highest ABCA5 loss-of-function frequencies in Latino populations (0.7%; 1 in 20,500).
In conclusion, these data provide an overview of the frequency of ABC loss-of-function variants in the general population that can be used to estimate population-specific Mendelian disease risk, thus providing valuable information for epidemiological rare disease research and clinical geneticists. Fig. 4 Structural analysis of putatively deleterious genetic variants of ABC transporter superfamily. a Illustration of the tertiary structures of ABCA, ABCB, and ABCC transporters. As representative examples, the structures of ABCA1 (PDB identifier 5XJY), ABCB10 (ABCB half transporter; PDB identifier 4AYT), ABCB11 (BSEP; ABCB full transporter), and ABCC7 (CFTR; PDB identifier 5UAK) are shown. Transmembrane domains (TMDs) are shown in red, nucleotide-binding domains (NBDs) are depicted in blue and turquoise, Walker motifs are colored in salmon and the N-terminal Lasso motif is depicted in yellow. b Overview of the genetically encoded structural variability stratified by ABC subfamily and domain. c Schematic topology models as well as 3D protein structures of MDR1 encoded by ABCB1. Different domains in the topology models are shaded based on the identified number of deleterious variants per amino acid in the respective domain. MDR1 constitutes two pseudo-symmetrical TMDs and NBDs encoded in a single polypeptide, colored in orange and blue, respectively. Detailed 3D structure of key protein domains with functionally relevant variants (sticks in cyan or magenta) and substrates (sticks in yellow) are shown as insets under the topology model. In the 3D model, all putatively deleterious variants with MAF > 0.1% are shown as light red spheres, whereas the corresponding part of the secondary structure motif is highlighted in salmon in case of variants with MAF < 0.1%. Note that N21D localizes to the lasso motif for which no crystallographic data were available and the variant is thus not shown. ECD extracellular domain, TMD transmembrane domain, NBD nucleotide-binding domain ◂

Discussion
The ABC superfamily of transporters is of importance for drug response and toxicity, and genetic rare disease research. ABC transporters translocate a wide spectrum of endogenous substrates and medications. Consequently, identification of ABC transporters that interact with a drug candidate constitutes a critical step in drug discovery and development (Benadiba and Maor 2016;Yee et al. 2018). Previous clinical studies implicated genetic germline polymorphisms in at least 12 ABC genes with risk of adverse drug reactions or altered chemotherapy efficacy (Tables 1, 2, 3 and Supplementary Table 2). In addition, genetic variations in 21 ABC genes are causative for Mendelian disorders. Therefore, understanding the genetic landscape of ABC transporters constitutes a potentially important area for the personalization of oncological therapy and risk allele epidemiological study of relevant Mendelian diseases.
In this study, we detected a total of 62,793 exonic variants, the vast majority (98.5%) of which are rare and functionally poorly understood. In addition to these single-nucleotide variants and indels, we identified 1003 ABC alleles in which at least one exon was deleted or duplicated. Notably, somatic ABC gene CNVs have been implicated in acquired drug resistance. Studies of drug-resistant cell lines derived from human neoplasms identified amplifications of at least 13 ABC transporter genes, including ABCB1, ABCC1 and ABCC4 (Yasui et al. 2004). Conversely, deletions of the multi-drug resistance transporters predicted response to neoadjuvant therapy in breast cancer patients (Litviakov et al. 2016). Notably, while drug resistance is primarily characterized by somatic amplification events, the majority of CNVs in our data set were deletions and it will be interesting to observe whether patients with germline deletions of pharmacologically important drug transporters are predisposed to favorable therapeutic responses using drugs, which are substrates of the deleted transporter.
There is an increasing body of evidence describing differences in drug response, ADRs and clinical outcomes from chemotherapy based on genetic differences between ethnic groups (Phan et al. 2011). For instance, Caucasian colon cancer patients were at significantly higher risk to develop diarrhea, nausea, vomiting, and stomatitis during adjuvant 5-fluorouracil-based chemotherapy compared to African Americans (McCollum et al. 2002). Moreover, the risk of dose-limiting ADRs due to taxanes or platinum therapy was significantly lower in Caucasian lung cancer patients compared to patients of Asian descent, whereas response rates consistently showed inverse correlations Lara et al. 2009Lara et al. , 2010. This variability is likely to be at least in part caused by differences in the allelic distribution for genes involved in the disposition of the respective chemotherapeutics.
Mounting evidence suggests that the targeted interrogation of candidate pharmacogenetic polymorphisms is not sufficient to accurately predict the drug response of a given patient Ingelman-Sundberg 2016, 2018

Loss-of-function frequency (in %)
Loss-of-function frequency (in %) ABCC7 ABCC7 ABCC7 ABCC7 ABCC2 ABCC6 ABCA5 Fig. 5 Genetic variability in ABC genes associated with genetic disorders can inform about population-specific disease risk. The genewise aggregated frequencies of loss-of-function (LoF) variants (frameshifts, start-lost, stop-gain, and splice site variants) are shown for ABC genes with known associations with congenital diseases (a) as well as for non-disease-associated genes (b) 1 3 rather than allele status of specific ABC variants is a predictor of clinical outcomes, thus corroborating that NGS-based approaches can add value to personalized cancer prognostics (Xiao et al. 2020). One plausible interpretation of this observation is that multiple ABC variants with individually smalleffect sizes act modulate bioavailability of orally administered substrates and/or intra-tumoral drug concentrations in concert, thereby impacting treatment efficacy. These findings have important implications for cancer pharmacogenomics and incentivize studies into the underlying mechanisms.
Interestingly, mapping of clinically impactful variants onto the 3D structure of MDR1 revealed a preferential localization in NBDs. Generally, the NBDs in MDR1 are highly conserved compared to the substrate-binding domains, indicating that NBDs might be more sensitive to functional alterations, whereas impacts of variations in the substrate-binding domain or translocation channel seem to be less pronounced (Wolf et al. 2011). The two synonymous variants indicated here (G412G and I1145I), although not resulting in amino acid exchange, have been suggested to affect transporter function by disrupting the cotranslational folding process via introduction of rare codons (Kimchi-Sarfaty et al. 2007). The triallelic variation at position A893, which localizes to a less conserved transmembrane helix, has not been reported to affect transporter function in vitro (Kimchi-Sarfaty et al. 2002). Thus, functional effects associated with this variant might be due to the strong linkage with G412G and I1145I (Fung and Gottesman 2009).
Overall, we found that the ABC transporter superfamily was highly population-specific and inter-ethnic variability is commensurate with other genetically diverse pharmacogene families, including CYPs (Zhou et al. 2017), SLCOs (Zhang and Lauschke 2019) and UGT s (Kaniwa et al. 2005). Overall, 74.9% of all variants that were predicted to affect the functionality of the respective ABC transporter were specific to a single population and the overall load of functional genetic variability differed considerable between the analyzed populations. Inter-ethnic variability was furthermore reflected in differences in population-specific prevalence of ABC-associated Mendelian diseases with autosomal recessive inheritance. For instance, frequencies of CF are around 1 in 2500-3500 newborns of Caucasian ancestry, whereas only 1 in 17,000 and 1 in 31,000 children of African and Asian ancestry are affected, which closely aligns with predictions based on loss-of-function carrier rates (1 in 1850 in Europeans, 1 in 24,000 in Africans, and < 1 in 40,000 in East Asians). Similarly, PXE has been reported to have a prevalence around 1 in 50,000 Dutch individuals (Kranenburg et al. 2019), compared to our estimates of 1 in 52,000 in Europeans based on ABCC6 loss-of-function allele frequencies. Interestingly, ABCC6 was also the ABC gene that was found to harbour most CNVs, which is aligned with the previous studies describing genomic deletions in this locus in PXE patients (Costrop et al. 2010;Katona et al. 2005). Combined, these data suggest that population-scale sequencing data provide an important tool to predict Mendelian ABC disease risk. Notably, however, this approach is only suitable for diseases in which heterozygous loss of gene function is phenotypically silent, thus excluding autosomal dominant or X-linked modes of inheritance. Taken together, our analyses revealed striking ethnogeographic differences in ABC variability profiles that might explain at least part of the observed variability in chemotherapy response and incidence of Mendelian disorders between populations. Furthermore, the population-scale genomic data set presented here promises to provide a powerful resource for the evaluation of genetic ABC disease epidemiology.
In summary, we comprehensively profiled the genetic variability of the human ABC transporter superfamily and revealed a surprising extent of rare and population-specific variations. Computational evaluations of the functional impacts of these variants indicate that these variants contribute considerably to the variability in ABC transporter function with potentially important consequences for chemotherapeutic treatment regimens. Thus, these data incentivize the consideration of sequencing-based genotypes for patient stratification, particularly in the current era of clinical trial globalization. Furthermore, we expect that a deeper understanding of the functional consequences of ABC transporter variability might be useful to improve public health strategies and flag patients at risk of not responding appropriately to treatment with ABC substrates.