Prediction of cardiovascular risk by Lp(a) concentrations or genetic variants within the LPA gene region

In the middle of the 1990s the interest in Lp(a) vanished after a few badly performed studies almost erased Lp(a) from the map of biological targets. However, since roughly 10 years the interest has begun to grow again mainly for two reasons: first, genetic studies using easily accessible and high-throughput techniques for genotyping of single-nucleotide polymorphisms (SNPs) have allowed large studies in patients with cardiovascular disease and controls to be performed. This strengthened the earlier findings on a copy number variation in the LPA gene and its association with cardiovascular outcomes. Second, new therapies are on the horizon raising strong and justified hope that in a few years drugs will become available which tremendously lower Lp(a) concentrations. This review article should provide an introduction to the genetic determination of Lp(a) concentrations and considerations whether Lp(a) concentrations or genetic variants are important for the prediction of cardiovascular risk.


Introduction
An astonishing characteristic of Lp(a) is the more than 1000-fold range of concentrations between individuals from less than 0.1 mg/dL to more than 300 mg/dL [1]. Lp(a) concentrations are not influenced much by age, sex, fasting state, inflammation [2,3] or lifestyle factors such as diet or physical activity. However, the concentrations are under strict genetic control by the LPA gene locus and here especially by a size polymorphism of apo(a) caused by a variable number of so called kringle IV (K-IV) repeats in the LPA gene [1,4]. Each of these up to more than 40 repeats has a size of 5.6 kB which results in a highly polymorphic and informative copy number variation (CNV).
The substantial differences in Lp(a) concentrations between individuals are to a large extent genetically determined. Family studies revealed a heritability estimate of Lp(a) concentrations of about 90% [5,6]. Lp(a) is there-This article is part of the special issue "Lp(a) -Update 2018" Florian Kronenberg Florian.Kronenberg@i-med.ac.at 1 Division of Genetic Epidemiology, Department of Medical Genetics, Molecular and Clinical Pharmacology, Medical University of Innsbruck, Schöpfstr. 41, 6020 Innsbruck, Austria fore the lipoprotein with the strongest genetic control. The discovery of the size polymorphism of apo(a) in serum [7] and K-IV CNV in the LPA gene [4,[8][9][10] resulted in the identification of the LPA gene as the major gene for Lp(a) levels.
Individuals expressing a low number of K-IV repeats resulting in so-called small apo(a) isoforms (up to 22 K-IV repeats) have on average markedly higher Lp(a) concentrations than individuals carrying only large apo(a) isoforms (more than 22 K-IV repeats) [1].

Genetic regulation of Lp(a) concentrations and risk for CHD
The evidence is quite strong that high Lp(a) concentrations are associated with an increasing risk for coronary heart disease (CHD) [1]. The Copenhagen City Heart Study observed for individuals from a general population with concentrations between 30 and 76 mg/dL (corresponding to the 67 th -90 th percentile) a 1.6-fold increased risk for incident myocardial infarction compared to individuals with Lp(a) concentrations below 5 mg/dL (corresponding to the lower 22% of the population). This risk increased to 1 [11]. Panel b shows the association between the number of K-IV repeats in the LPA gene and Lp(a) concentrations: individuals with small apo(a) isoform have markedly higher median Lp(a) concentrations than individuals with large apo(a) isoforms [23]. Panel c shows the preponderance of small apo(a) isoforms in patients with CVD when compared to controls [27]. Since a low number of K-IV copies (11-22 copies) is associated with high Lp(a) levels and high Lp(a) levels are associated with CVD, it follows that a low number of K-IV copies has to be associated with CVD if the association of Lp(a) with CVD is causal. Figure is taken and adapted with permission from reference [29]. Lp(a) lipoprotein(a), apo(a) apolipoprotein(a), LMW low molecular weight, HMW high molecular weight, KIV kringle IV, CVD cardiovascular disease, CHD coronary heart disease, OR odds ratio, 95%CI 95% confidence interval percentile) ( [11]; Fig. 1a). The concentration threshold for an increased risk has been discussed controversially and a European Atherosclerosis Society (EAS) consensus statement proposed 50 mg/dL [12]. Most importantly, such a threshold corresponds to the 80 th percentile of the concentration distribution in a Caucasian population and means that 20% of the population have probably an increased risk for CVD due to elevated Lp(a) concentrations. From a standpoint of public health relevance, this makes Lp(a) to a very important risk factor for CVD. Besides numerous studies on CHD, during recent years several studies found even a strong association between high Lp(a) concentrations and aortic valve calcification, stroke [13], and stenosis [14][15][16][17][18][19][20][21], heart failure [22] as well as peripheral arterial disease [23][24][25]. This makes Lp(a) a risk factor for many CVD endpoints although the strength of the association might differ between the various entities of endpoints.
When a biomarker is changed in diseased patients, the important discussion starts whether this biomarker is a risk factor or a risk marker. If it is a risk factor, this parameter is causally related to disease and it might become an interesting drug target. If it is a biomarker, this parameter might be interesting for diagnostic purposes because it is changed secondarily to the disease (often also called reverse causation), it draws the attention to a disease but it would not make sense to develop drugs which influence that parameter (see recent discussion on this issue in reference [26]). This discussion took place for Lp(a) in the middle of the 1990s and Mendelian randomization studies illustrated in Fig. 1 provided a strong support for causality. Genetic variants that are strongly associated with high Lp(a) concentrations (Fig. 1b) show also an increased risk with CVD ( Fig. 1c) which underscores the causal link between high Lp(a) concentrations and CVD. The first time applying such an approach used data on the small apo(a) K

Easy to analyze in routine lab
The two most widely used SNPs (rs10455872 and rs3798220) tag only half of the small apo(a) isoforms resulting in a large frequency of false negatives Ignores expression status Less good for risk prediction Genetic counseling required a Lp(a) lipoprotein(a), K-IV kringle IV, apo(a) apolipoprotein(a) a Whether genetic counseling is required depends of the laws of the respective countries isoforms determined by the number of K-IV repeats: small isoforms with up to 22 K-IV repeats (which were called F, B, S1 and S2 in earlier times) were associated with a significantly increased risk for CHD in 6 different populations [27]. A later meta-analysis including 7382 CHD cases and 8514 controls identified a 2.08-fold increased risk for carriers of small apo(a) isoforms [28]. This strong association probably makes Lp(a) the most important genetic risk factor for CVD if we keep the high frequency of small apo(a) isoforms in the population in mind [29].
What methods are available for CVD risk estimation? Table 1 provides an overview on the methods available and their advantages and disadvantages to investigate the association between Lp(a) concentrations and their genetic determinants with clinical outcomes.

Measurement of Lp(a) concentrations
From a clinical standpoint, the measurement of Lp(a) concentrations is the method of choice to estimate the risk associated with this atherogenic lipoprotein since we assume that we are measuring the biologically active lipoprotein. However, the measurement of Lp(a) is trickier compared to other lipoproteins. A multitude of immunochemical Lp(a) assays have been developed over time and the use of different calibrators and antibodies have resulted in a wide range of Lp(a) measurement values not readily comparable between different assays [30]. The major problem is the repetitive K-IV structure which causes the size polymorphism with dozens of isoforms. Most of the antibodies used in the various assays are not exactly characterized and are probably directed against the repetitive K-IV type 2 structure which ranges from 2 to more than 40 repeats. This may result in a measurement bias where serum concentrations of small isoforms, which are usually associated with elevated levels, are underestimated, while serum concentrations of large isoforms, usually associated with low levels, are overestimated. However, we have to distinguish between the relative and the absolute bias of the measurement in relation to the apo(a) isoforms. Thorough comparisons between an isoform-sensitive and an isoform-insensitive assay by Marcovina and colleagues [31] revealed that the relative bias can become quite high with an overestimation of 25-35% in carriers of large isoforms. However, this translates to an absolute bias in most of the samples of a few mg/dL. The relative bias for most of the carriers of small isoforms is around 10% which translates also only to a few mg/dL. However, in the rare cases with up to 16 K-IV repeats, the relative and the absolute bias can become quite pronounced with an underestimation of the concentration and therefore the risk for outcomes. The proportion of these individuals in the general population is rather low: only 1.9% of the population turned out to have such small isoforms when we investigated more than 24,000 individuals with western blot analyses (unpublished data). In summary, when interpreting the data from Marcovina and colleagues [31], we would expect that for most of the individuals this does not really cause a major underestimation or overestimation of the risk. Mainly in the grey zone around the proposed threshold of Lp(a) concentrations, this might become an issue. However, we have to keep in mind that this threshold is also under discussion whether it should be at 30 or 50 mg/dL. Moreover, it has to be added that these data are proven for the assays Marcovina and colleagues have investigated [31]. Whether the absolute and relative biases are similar for other assays needs to be seen and some of the assays have been compared to the assay from Marcovina and colleagues.

Investigation of the K-IV repeat number by western blot and DNA analysis
The apo(a) isoform size can be best investigated in serum or plasma by western blot using an SDS agarose gel electrophoresis. It has the advantage that only those isoforms will be visible which are indeed expressed. This is one of the peculiarities of Lp(a): although 95% of the subjects are heterozygous on the DNA level, only 50-70% of the individuals show two isoforms in serum. In the remaining 30-50% only a minor fraction is indeed homozygous (which cannot be visualized in the western blot) but the majority of them express only one isoform in serum. The exact number of K-IV repeats can be measured with a precision of ±1-2 K-IV repeats. However, the method is laborious and only available in a few laboratories. Until today, it is still the best method to be used for Mendelian randomization studies to support causality between Lp(a) and various outcomes (Fig. 1). Three methods to investigate apo(a) isoforms in terms of number of K-IV repeats on DNA level are available (Table 1). Pulsed-field gel electrophoresis (PFGE)/southern blotting of genomic DNA [9,10] is extremely laborious, requires a special DNA preparation to yield high-molecular DNA but allows determination of the number of K-IV type 2 copies in separated alleles. Both alleles will also be separated by fiber-fluorescence in situ hybridization (FISH) that allows the number of repeats to be counted under fluorescence microscopy [32]. In contrast to these two methods, quantitative polymerase chain reaction (qPCR) is doable in high-throughput [11] but provides the total number (sum) of the K-IV type 2 copies of the two alleles of the investigated genome. This means that individuals with one short and one large allele and individuals with two medium-sized alleles end up in the same risk category which might result in an underestimation of the risk for the person with one short and one large allele. This is in line with the largest single-center case-control and prospective study up to now describing that individuals in the lower quartile of the sum of copy number in their genome had an adjusted hazard ratio for myocardial infarction of 1.50 compared to those in the highest quartile of copy number. This estimate is lower than that from studies performing the calculations based on the expressed isoforms in western blot with a relative risk of 2.08 [28]. The K-IV type-2 CNV measured by qPCR explained only 25% of the variability in Lp(a) levels [11], which is also markedly lower than in other studies of European populations using methods such as apo(a) isoforms by western blot or separated alleles by pulsed-field gel electrophoresis. These observations might be related to the circumstance that qPCR is not easy to standardize with a relatively high coefficient of variation which makes comparison of results between laboratories difficult.
Based on the peculiarity that roughly 30 to 50% of all individuals express only one apo(a) isoform in serum although they have two isoforms at the DNA level, points to some limitations of the DNA analysis. It raises the question from a clinical standpoint why one should be interested in an isoform that is not expressed. This raises also the question whether it is better to simply consider Lp(a) concentrations instead of the genetic determinants of the concentrations for risk prediction. However, earlier data suggested that Lp(a) from small apo(a) isoforms have a higher atherogenic potential [33]. This finding needs to be further evaluated. Furthermore, certain frequent disease conditions such as chronic kidney disease result in pronounced changes of Lp(a) concentrations. In these patients small apo(a) isoform were demonstrated to be more predictive for CVD risk than Lp(a) concentrations [34,35] (for review see [1,36]).

Investigation of single nucleotide polymorphisms (SNPs)
One of the reasons for a revival of the Lp(a) field ten years ago is based on the identification of SNPs that are strongly associated with CHD risk. Clarke   SNPs (rs10455872 and rs3798220) to be strongly associated not only with Lp(a) concentrations but also with CHD [37]. These SNPs are easy to analyze in a routine laboratory and might have contributed to their popularity. However, these two SNPs have a minor allele frequency of 7 and 2%, respectively and about 15% of a typical Caucasian population carry at least one minor allele of the two SNPs. They were claimed to identify carriers of small apo(a) isoforms [37]. However, investigations in a large general Caucasian population sample of 5999 subjects revealed that roughly half of all subjects expressing a small apo(a) isoform are not carriers of the minor alleles of one or two of these SNPs [38]. This means that about half of the individuals who have a high risk for CHD based on an expressed small apo(a) isoform will not be detected when only these two SNPs are genotyped. These individuals might get the wrong message when only the SNP result and not also the Lp(a) concentration will be investigated and considered for risk counselling (Fig. 2). Moreover, the situation might be different for various ethnicities. For example, rs3798220 was not found in Africans. Allele frequencies in East and Southeast Asians ranged from 2.9 to 11.6%, and were very low (0.15%) in CAD cases and controls from India. The variant was neither associated with small K-IV CNV alleles nor elevated Lp(a) concentrations in Asians [39].
In a recent genome-wide association analysis including 13,781 individuals we observed 2001 SNPs in the wider LPA gene region which were significantly associated with Lp(a) concentrations [40]. Many of the SNPs were strongly correlated to each other and an in-depth analysis of the region identified 48 SNPs to be independently associated with Lp(a) concentrations. In addition to this region the SNP in the APOE region that is responsible for the apoE2 allele (rs7412) was also associated with Lp(a) concentrations. In a further step, we made a look-up in the data from the CAR-DIoGRAMplusC4D consortium. From all 49 genetic variants that were shown to be independently associated with Lp(a) concentrations, 40 were present in summary-level data retrieved from the CARDIoGRAMplusC4D consortium. Seven SNPs were even significantly associated with CAD on a genome-wide scale. This means that a SNP score with more SNPs than rs10455872 and rs3798220 might be better suitable for risk prediction [40].

SNPs in the kringle-IV type 2 region-a journey to a white spot on the genetic map
The K-IV type 2 is a region that is difficult to resolve by conventional DNA analysis methods. Variants within the K-IV type 2 region cannot be detected in common sequencing projects, leaving up to 70% of the LPA coding region currently unaddressed. We recently developed an ultra-deep sequencing protocol and an easy-to-use variant analysis pipeline to create a first map of genetic variation in the K-IV type 2 region. We found dozens of loss-offunction and splice site mutations, as well as >100 partially even common missense variants. This provides novel candidates to explain the large ethnic and individual differences in Lp(a) concentrations [41]. One of these variants is a splice site variant (G4925A) in preferential association with the smaller apo(a) isoforms [42]. It has an exceptionally high carrier frequency of 22.1% in the general population and explains 20.6% of the Lp(a) variance in carriers of small apo(a) isoforms. It reduces Lp(a) concentrations dramatically by more than 30 mg/dL. Accordingly, the odds ratio for CVD was reduced from 1.39 for wildtype carriers of small isoforms to 1.19 in carriers of small isoforms who were additionally positive for G4925A [42]. Functional studies pointed towards a reduction of splicing efficiency highlighting splicing efficiency modulation by antisense oligos or transsplicing [43] as a potential novel Lp(a)-lowering approach [42].

Which approach is now the most suitable for clinical risk assessment?
The measurement of Lp(a) concentrations by a well-performing and standardized assay is usually first choice and sufficient to get an estimate for the expected risk for clinical outcomes. The Lp(a) concentration might reflect best the biologically active lipoprotein. Furthermore, since a protein and not a genetic polymorphism is measured, no genetic counseling is required in most countries.
There is at least some evidence (but we would wish to have more) that two persons who have the same high Lp(a) concentration but the one has a large isoform and the other has a small isoform, that the latter has a higher risk which would mean that the smaller isoform (with fewer K-IV repeats) might be more atherogenic. Therefore, the measurement of the isoform with western blot might provide a further argument for an increased CVD risk. From all methods that investigate apo(a) genetic polymorphisms, it provides the most comprehensive information and is second to the Lp(a) concentrations very suitable for risk prediction. The advantage can probably be explained by the fact that we see only the isoforms which are indeed expressed in serum or plasma and not those which are available on the DNA level but which are suppressed for whatever reasons.
The measurement of the apo(a) isoforms adds also certain information in acquired diseases which are associated with a secondary increase in Lp(a) concentrations. A typical example is chronic kidney disease: we observed especially in patients treated by hemodialysis that they show an increase in Lp(a) concentrations and this relative increase by the disease is especially observed in patients with large apo(a) isoforms when compared to controls with the same isoform categories [44]. Therefore, the risk for CVD events might be highest especially in those patients with small isoforms (which we and others observed indeed [34,35,45]) since their exposure to high Lp(a) concentrations lasted already their entire life. In contrast, many patients with large isoforms might have experienced the increase in Lp(a) only recently with the development of chronic kidney disease and the risk might no longer be sufficiently predictable simply by the Lp(a) concentrations [36].
The use of SNPs for risk estimation will have to find its place in the future. Using the famous two SNPs (rs10455872 and rs3798220) described 10 years ago might be less efficient since too many false negative results might be found as illustrated in Fig. 2 (individuals with small apo(a) isoforms and high Lp(a) concentrations despite the non-mutated variants of these SNPs) [38]. A SNP score with a large and growing package of SNPs might be more informative [40]. Using SNPs or SNP scores for individuals should always be done in combination with the Lp(a) concentrations. The same holds true for apo(a) isoforms for which the Lp(a) concentration has to be known anyway for the laboratory process which requires that roughly the same amount of Lp(a) is put on the SDS agarose gel.

Apo(a) isoform-sensitivity of assays
The use of an apo(a) isoform-sensitive assay might result in an underestimation of the association between Lp(a) concentrations and clinical outcomes. As mentioned above, such assays underestimate serum concentrations of small isoforms, which are usually associated with increased CVD risk and overestimate Lp(a) concentrations of large isoforms which are usually associated with a lower CVD risk. Since small isoforms are overrepresented in CVD patients compared to a control group [27,28], this phenomenon can result in a dilution of the Lp(a) concentration differences between CVD patients and controls. Therefore, negative studies on the association between Lp(a) concentrations and outcomes should exclude that an isoform-sensitivity of the assay has contributed to these results.

Storage effects on Lp(a) concentrations
Some assays are strongly influenced by the effect of storage conditions. This is especially the case for long-termed stored samples from epidemiological studies. Extreme examples are a nearly linear decrease in Lp(a) immunoreactivity of 46% during 6 months of storage [46] or 75% lower Lp(a) values after 600 days of storage compared to fresh samples [47] in two different studies. We observed with our assay in 310 samples on average only a small decrease of Lp(a) of 4.83% during a 25 months storage period but this was higher for individuals with small isoforms compared to those with large isoforms [48].
An effect of sample storage could also be a reason for negative findings in epidemiological studies which used long-term stored samples und using an assay which is even more strongly influenced by sample storage conditions compared to our assay [46,47]. Some of the earlier studies which used assays not appropriately performing in stored samples were also included in later meta-analyses and might have contributed to an underestimation of the association between Lp(a) concentrations and cardiovascular disease [13] compared to another large single-center study using one well-validated assay [11].

Samples size in clinical and epidemiological studies
It is a widely observed phenomenon in epidemiology that the first group to report a significant test result (the winner) will also report an effect size much larger than is likely to be seen in subsequent replication studies [49]. This phenomenon of "winner's curse" is often seen in Lp(a) research which is caused by an underpowered study with a pronounced right-skewed distribution of the Lp(a) concentrations where a large proportion of the population has low concentrations. Since the concentrations of Lp(a) are strongly determined by genetic variants with major effect sizes and less by other conditions, usually large sample sizes are required to prove a statistically significant difference which is not caused by chance findings. As discussed earlier and demonstrated by simulations, small sample sizes can cause any result between a patient and a control group [44] or any correlation between Lp(a) concentrations and another parameter [3] if the sample size is small enough. To avoid these false findings, the sample size has to be large enough and if possible, the results should be controlled for genetic variants such as the apo(a) size polymorphism. And a replication of the findings (as usually expected for genetic epidemiological studies) in a different population is of further advantage.

Conclusions
High Lp(a) concentrations are an important risk factor for cardiovascular disease. There is still some urgent need for a better standardization of Lp(a) assays. However, if an assay is performing well, it is sufficient to use that assay for CVD risk assessment. The measurement of the number of K-IV repeats can add some information under certain circumstances and has an important role for epidemiological studies and especially Mendelian randomization studies. Other methods to measure the K-IV repeat number are of academic interest. SNPs will have to find their role probably more as a SNP score and there is currently no major added value to be used beside or instead the measurement of Lp(a) concentrations.

Compliance with ethical guidelines
Conflict of interest F. Kronenberg declares to have received speaker honoraria from Kaneka, Miltenyi Biotec and Amgen. He is member of advisory boards from Kaneka and Amgen.
Ethical standards This article does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.