| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Circulation. 2007;116:1714-1724.)
© 2007 American Heart Association, Inc.
Contemporary Reviews in Cardiovascular Medicine |
From INSERM UMR S 525 and Université Pierre et Marie Curie, Paris, France.
Correspondence to Dr François Cambien, INSERM U525, Faculté de Médecine Pitié-Salpêtrière, 91 blvd de lHôpital, 75634 Paris cedex 13, France. E-mail cambien{at}chups.jussieu.fr
Key Words: cardiovascular diseases epidemiology genetics lipoproteins risk factors
| Introduction |
|---|
The objective of the present review is not to provide an exhaustive account of the numerous studies conducted on the genetics of CVD (eg, Arnett et al3), but to introduce a few basic notions required to understand the language of genetics and genomics (see Appendix) and illustrate with a limited number of examples the important insights provided by genetic research into the causes and mechanisms of CVD. We will also discuss the new GWA strategy and why this approach is likely to have a considerable impact on biomedicine and human disease understanding. Finally, we will try to explain the unsuccessful search for genetic markers of risk and why phenotypic biomarkers are likely to be clinically more useful.
| The Basis of Genetic Variation |
|---|
The most common type of human sequence variation consists of differences in individual base pairs termed single nucleotide polymorphisms (SNPs). Other sequence variations comprise variable numbers of short or long repetitions of the same motif in tandem such as mini- and microsatellites,6 insertions or deletions of various lengths, and structural variants that affect large chromosomal regions.7 The vast majority of these sequence variations are located in nonfunctional regions of the genome and have no phenotypic impact; these are said to be neutral and are usually termed markers. However, when variations occur within coding sequences or regulatory regions, they may affect the protein sequence or the level of gene expression and translate into observable phenotypic effects.
| Mendelian Versus Complex Inheritance |
|---|
From an epidemiological perspective, rare deleterious mutations (eg, those that cause familial hypercholesterolemia [FH]) confer an important risk of coronary heart disease (CHD) in mutation carriers, but their impact at the population level is low. Conversely, polymorphisms such as the apolipoprotein E (APOE) polymorphism, because they are frequent, may have a population impact that is far from negligible despite a weak effect at the individual level. This duality, which relates to the epidemiological notions of absolute, relative, and attributable risks, has important medical and public health implications but is less crucial when the interest lies in the identification of pathophysiological pathways.
Mutations Responsible for Mendelian Diseases
Mutations are usually identified by linkage analysis conducted in families with several affected members over different generations. Regions that potentially harbor a disease-causing gene are identified by testing of the cosegregation of the disease with genetic markers that tag specific regions of the genome. This strategy uses genetic markers (ie, panels of microsatellites or large sets of SNPs regularly spaced throughout the genome) and tests whether particular alleles are cotransmitted with the disease at a higher frequency than expected by chance. The success of linkage studies depends on the availability of phenotypically well-characterized families that include a sufficiently large number of informative affected individuals. When a disease-linked region of the genome has been successfully mapped by linkage analysis, finding the responsible gene and sequence variation is not trivial because the region may sometimes encompass tens or hundreds of genes. However, thanks to the improved annotation of the human genome sequence and the possible design of dense SNP arrays that target the regions of interest, the discovery of the responsible sequence mutation may be accelerated by linkage disequilibrium (LD) mapping.8 Although exceptions exist (eg, within isolated populations derived from a small number of founders), mutations that are associated with Mendelian diseases are rare (much <1%) and their origin is recent. This explains why their presence may be restricted to some groups of individuals only (population isolates, families). In that case, they are said to be "private mutations".
Polymorphisms Involved in Complex Diseases
At the other end of the frequency spectrum of genetic variants, common polymorphisms (minor allele frequency >1%) are the focus of most contemporary genetic studies that target complex diseases. Common SNPs are estimated to number >10 million in the human genome.9 Because polymorphisms have common alleles, numerous combinations of susceptibility alleles at several loci in a particular individual are possible, and some of them may affect the risk of CVD in a way that cannot be predicted from the separate effect of each variant. This is the major obstacle to the characterization of the genetics of complex traits and the rationale for the proposal to explore systems of genes rather than single genes.10,11 An important feature of polymorphisms, compared with rare deleterious mutations, is that they have an ancient origin. This explains why they are usually found in most human populations albeit often with different allele frequencies.
Because complex diseases do not follow a clear pattern of Mendelian inheritance, the strategy used to identify their genes of predisposition is usually not based on family studies but on a radically different approach called "genetic association" analysis. This approach relies on the existence of LD among physically close polymorphic sites in the genome, which implies that even if a polymorphism causally involved in the disease process is not directly observed, its association may be captured by a measured proxy polymorphism in LD with it. This is the basis of association studies that test the statistical association between genetic markers (the term "marker" denotes that no a priori causal role is assumed) and the disease in the population. The principle of genetic association studies is described in Figure 1. Initially, association studies focused on markers of candidate genes. Thanks to various initiatives, in particular the "HapMap" Project,13 increasingly dense genome-wide panels of common SNPs are now available that provide a powerful resource of markers (or tag SNPs) (Figure 1) for association studies. Contemporary association studies often encompass sets of genes that encode components of biological systems, chromosome regions, or even the whole genome.
|
The HapMap (Haplotype Map) Project
The primary goal of the International HapMap Project13 (http://www.hapmap.org/) was to create a public resource of common SNPs to capture most of the common human genome sequence variability. A second objective was to characterize the LD structure of the genome on the basis of the analysis of these SNPs. Because of the strong LD displayed by most regions of the genome, the combination of alleles at neighboring SNPs, called haplotypes, generates much less diversity than would be expected if they were uncorrelated. Recent studies have shown that the human genome is organized into a succession of distinct haplotype blocks that are ancestrally conserved.14–17 By resequencing the genome of 270 individuals from populations with African, Asian, and European ancestry, the HapMap Project has identified a set of SNPs that tag most of the common haplotypes in the human genome.18,19 This resource is used to search for polymorphisms associated with susceptibility to common diseases. For this purpose, genotyping arrays built with tag SNPs that encompass the whole genome or specific regions of interest are used; Figure 1 explains the principle.
Variants of "Intermediate" to Low Frequency Associated With Non-Mendelian Traits
Between the rare mutations responsible for Mendelian diseases and identified by family studies and the common polymorphisms targeted in current association studies, genetic variants that have a low frequency (<1%) but a sizeable individual effect (eg, relative risk >3) probably exist in significant numbers. These variants are presently difficult to characterize because they do not generate evident familial patterns of disease that would make them identifiable by linkage studies, and they are missed in the current candidate gene or genome-wide sequencing strategies, which use a limited number of individuals for polymorphism screening. Rare functional variants are difficult to tag with common markers such as SNPs. Their systematic characterization is therefore out of the scope of studies that rely on LD such as GWA studies and will depend on the availability of new high-throughput sequencing technologies and large DNA banks of patients and controls. Rare variants associated with non-Mendelian traits may prove to be clinically important as they may confer a significant increase in risk and therefore constitute potential diagnostic and prognostic tools. Interest for these variants has recently grown after the discovery of a number of them in the PCSK9 and ABCA1 genes.
| Some Examples Related to Lipoproteins That Illustrate the Strength of Genetics to Unravel Mechanisms of Disease |
|---|
The APOE Gene
The heritability of plasma low-density lipoprotein (LDL)–cholesterol (LDLc) has been estimated to be >50%.20 Epidemiological data show a striking parallel between plasma LDLc levels and the risk of CHD that is observed over a wide range of LDLc levels. This is why common polymorphisms that affect plasma LDLc may contribute to the risk of CHD. Such associations have been reported for several genes involved in lipid metabolism,21 the best example being APOE22; apoE plays an important role in the transport of lipids to tissue and cells. It is present in several lipoproteins and binds with high affinity to the LDL receptor. The APOE gene is polymorphic with 2 common nonsynonymous (amino acid changing) polymorphisms that generate 3 alleles (haplotypes) termed
2,
3, and
4. These 3 alleles have variable frequencies across populations;
3 is the most common and
2 is the least common.23 The 3 corresponding encoded isoforms of the protein, E2, E3, and E4, have different functional properties; the E2 isoform is associated with lower, and the E4 isoform with higher, LDLc levels than E3. In a recent metaanalysis, E4 carriers, who represent >20% of the population, were shown to have a 40% higher risk of CHD compared with E3E3 homozygotes, whereas the relationship between E2 and risk was less obvious.24 This is an example of genetic variation that has an important effect at the population level but has little relevance in the assessment of individual risk, at least when considered alone.
LDL Receptor Gene
Despite the relatively low frequency of FH compared with the common forms of hyperlipidemias, its study has provided important insights into the mechanisms of cholesterol metabolism and opened new perspectives for the prevention of CHD.25 Mutations in the coding sequence of the LDL receptor gene (LDLR) may considerably reduce or abolish the function of the LDL receptor and lead to an important rise of circulating LDLc, which in turn is associated with a commensurate increase in CHD risk. More than 700 different mutations of LDLR responsible for FH have been reported, some of them clustered in particular populations.26 Mutations affect the function of the receptor in various ways according to their type and their position in the protein sequence, and an important heterogeneity is present in clinical manifestations even in individuals who carry the same mutation as a consequence of differences in genetic and environmental backgrounds. Currently, the clinical diagnosis of FH is based on personal and family history, physical examination, and laboratory findings. However, it has been suggested that the diagnosis of FH should be based on the identification of the genetic defect because statin therapy needs to be initiated in young carriers of a LDLR mutation even if their plasma LDLc is normal.27 However, no general agreement exists on this approach because the risk of CHD is about the same in phenotypically defined FH patients with or without mutation in the LDLR gene.28 The clinical benefit of the genetic diagnosis over the careful monitoring of LDLc levels, which is required anyway, is therefore questionable.
Familial Defective ApoB100
Familial defective apoB100 is another form of FH in which LDL binds defectively to the LDL receptor, which results in increased circulating LDLc levels and premature atherosclerosis.29 In contrast with the myriad of LDLR mutations that cause FH, the molecular defect responsible for familial defective apoB100 is a single mutation (R3500Q) in the gene encoding apoB, the main apolipoprotein in LDL that binds to the LDL receptor.30 Although the molecular diagnosis of familial defective apoB100 is theoretically easier than diagnosis of LDLR mutations that cause FH because a single variant is responsible for the trait, it is still the direct measurement of LDLc that appears the most appropriate to evaluate the risk of CHD and monitor the drug response in familial defective apoB100 patients.
Proprotein Convertase Subtilisin/Kexin 9
Recently, the careful study of families with several members affected by dominant forms of hypercholesterolemia despite absence of mutation in the LDLR gene and lack of the APOB3500 variant led to the mapping of a locus on chromosome 1p32 and the subsequent identification of missense mutations in the proprotein convertase subtilisin/kexin 9 (PCSK9) gene.31 PCSK9 was subsequently found to play a major role in the LDL/LDLR pathway, even if the exact mechanism of its influence remains incompletely understood. Mice in which the PCSK9 gene has been inactivated exhibit an increased hepatic LDLR level, accelerated LDL clearance, and an important reduction of plasma LDLc.32 The PCSK9 mutations associated with FH are gain-of-function mutations (variants that confer an increased or extra functionality) that possibly affect the autocatalytic property of the pro-PCSK9 protein and promote the degradation of LDL receptors in hepatocytes. In addition to these extremely rare mutations, several more frequent nonsynonymous variants of the PCSK9 gene are associated with an impaired function of the protein that results in a reduction of plasma LDLc caused by accelerated LDL clearance. These variants of "intermediate" frequency have a substantial impact on plasma LDLc and CHD risk. For example, it has been estimated from the Atherosclerosis Risk in Communities (ARIC) study that 3% of African Americans were carriers of PCSK9 nonsynonymous variants, which were associated with a mean reduction of 30% of LDLc and a parallel significant reduction of CHD risk.33 This effect is comparable to the lowering effect of statins on LDLc. The PCSK9 gene also carries common noncoding polymorphisms that affect plasma LDLc; their effect at the individual level is much weaker than that of the coding variants of "intermediate" frequency just discussed,34 but their impact at the population level may be nonnegligible.
ATP-Binding Cassette Transporter 1
Another striking example of a Mendelian disorder that has contributed to the discovery of new processes involved in lipid metabolism and atherosclerosis is Tangier disease, a very rare recessive deficit of high-density lipoprotein–cholesterol (HDLc) metabolism caused by mutations in the ATP-binding cassette transporter 1 (ABCA1) gene.35–37 ABCA1 encodes a protein that regulates the cellular efflux of cholesterol and phospholipids to an apolipoprotein transporter. Several mutations responsible for Tangier disease have been identified, all of which result in a complete or partial loss of function that leads to an accumulation of cellular cholesterol, low plasma HDLc levels, and increased risk of CHD. Apart from these very rare mutations, numerous coding variants of "intermediate" to low frequency in the ABCA1 gene may contribute to a significant fraction of the low HDLc levels in the population. In the Dallas Heart Study, 20 of 128 individuals in the bottom 5% of the HDLc distribution were carriers of nonsynonymous variants in the ABCA1 gene (unknown before as common SNPs) versus only 2 of 128 individuals in the top 5% of the HDL distribution.38 This finding was replicated in an independent study, and biochemical studies indicated that most of the variants associated with low HDLc were functionally important.38 The results that pertain to variants of "intermediate" or low frequency and the similar results for PCSK9 raise the interesting possibility that the contribution of rare variants to common traits may be more important than initially thought. Common polymorphisms in the ABCA1 gene, including several nonsynonymous changes, have also been identified by systematically resequencing the gene in a limited number of individuals, and some of these polymorphisms have been shown to be associated with plasma HDLc or apoA1 in the population at large.39
| Gene–Environment Interaction |
|---|
From a research perspective, the presence of interaction complicates the detection of relevant associations that may be masked if they are not investigated in appropriate conditions. Except in the domain of pharmacogenetics, very little progress has been made in our understanding of gene–environment interaction. This is partly related to the difficulty of accurate measurement of most environmental factors (drug intake is a clear exception) as compared with genetic factors, and to the generally low power of studies to analyze combinations of factors in presence of interaction. Prospective studies might be more appropriate than case-control studies to investigate gene–environment interactions because they are less prone to biases as a result of modifications in environmental exposure induced by disease onset.42 Lack of appropriate accounting for gene–environment interactions may explain some of the failure to replicate genetic associations. Whether the recently initiated projects of huge biobanks such as the UK Biobank (http://www.ukbiobank.ac.uk/) will help resolve the pending issues of gene–gene and gene–environment interaction remains to be shown. Actually, the pattern of interactions among factors that affect disease risk may be so complex that completely different approaches such as system genetics may be more helpful.10,11
| Pharmacogenetics |
|---|
CYP2D6 as an Example
For many drug-metabolizing enzymes, phenotyping tests were available prior to the possibility to directly assess their genetic variability at the molecular level. CYP2D6, with the extensive metabolizer and poor metabolizer inherited phenotypes, is an example. The poor metabolizer phenotype is associated with a considerable increase in the maximum concentration and area under the curve for a large number of drugs. These include the ß-blockers metoprolol, timolol, and propranolol, for which the same dose leads to a greater lowering of heart rate and blood pressure in subjects with the poor metabolizer phenotype. The genetic variability of CYP2D6 is under the influence of a large number of genetic variants, some of them common, that may be simultaneously present in an individual and whose distribution may considerably vary across ethnic groups.43 Many metabolizing enzymes exhibit a similar pattern of genetic variability.44
Because CYP2D6 polymorphisms affect the metabolism of so many drugs, a tendency currently exists in the pharmaceutical industry to stop the development of therapeutic agents that are metabolized by CYP2D6. One concern is that the generalization of this attitude to other drug-metabolizing enzymes might lead to the rejection of a large number of drugs that would be efficient and safe in subgroups of patients identified by genetic testing. A striking example is provided by the pharmacogenetics of warfarin, a drug that has been in use since the 1950s but would probably have been abandoned in early development nowadays as a consequence of its pharmacogenetic features.
The Pharmacogenetics of Warfarin
Warfarin and, more generally, vitamin K antagonists are widely used oral anticoagulants whose prescription is complicated by their narrow and highly variable therapeutic range. The dose requirement and risk of bleeding are influenced by intake of vitamin K, illness, age, gender, concurrent medication, body surface, and genetics. Besides the possible or demonstrated influence of a large number of genes,45 warfarins effect is influenced by 2 major genes, one involved in its biotransformation (CYP2C9) and the other involved in its mechanism of action (VKORC1). The gene that encodes CYP2C9, the main metabolizing enzyme of warfarin, is highly polymorphic with many alleles that exhibit different functional properties and different frequencies across populations. In individuals of European descent, CYP2C9*1 is the most common allele, whereas CYP2C9*2 and CYP2C9*3 have a frequency of 12% and 8%, respectively, and a reduced activity relative to CYP2C9*1, which implies that carriers of the CYP2C9*2 or CYP2C9*3 form (
40% of the Europeans) treated by warfarin would normally require a lower dose of the drug.46 In individuals of African and Asian origins, the CYP2C9*2 and CYP2C9*3 alleles are less frequent than in Europeans, but other functional alleles are found predominantly in these 2 ethnic groups that also affect the drug response. VKORC1, the other major gene that influences warfarin metabolism, is the vitamin K cycle enzyme that controls regeneration of reduced vitamin K. Warfarin exerts its pharmacological effect by inhibition of VKORC1. The VKORC1 gene carries several common polymorphisms in its regulatory regions, such as the –1639G/A polymorphism (or similarly –1173T/C, which is in strong LD with it), which strongly correlate with warfarin response. A regression model that incorporates polymorphisms of the 2 genes as well as age, height, and gender has been proposed that accounts for >50% of the variability of warfarin response in Europeans and may be used as a dosing algorithm in this population.47
Large-Scale Genotyping of Drug-Metabolizing Enzymes
It is now possible to design genotyping devices that allow the simultaneous testing of a large number of variants that affect drug metabolism.48 Such tools may be very useful in the early development of drugs, and no major technological obstacle exists to improving them to a point where they will allow testing of most SNPs that affect drug metabolism. However, some limitations may reduce the clinical applicability of such tools. First, SNPs only represent a part of the genetic variation that affect drug metabolism. Variable number of tandem repeat or structural polymorphisms may not be easily tagged by SNPs. Second, a major gene effect (where a single variant dominates all other effects) cannot always be assumed, and it may be difficult to translate a complex pattern of variation that involves many different SNPs into an accurate prediction of drug response. Third, the previous point is further complicated by the possible influence of nongenetic cofactors.
Evolutionary Aspects of Drug-Metabolizing Enzymes
The conjunction of a strong effect and a high frequency that distinguishes the variants that affect drug metabolism from most of those that affect disease phenotypes is likely to have an evolutionary explanation. Many drug-metabolizing enzymes are highly genetically polymorphic within and across species. A good example is offered by the CYP2D gene family. In mice, 9 CYP2D genes exist. In humans, only 1 CYP2D gene (CYP2D6) is present, and it is highly polymorphic. Because CYP2D enzymes have a high affinity for plant toxins, it has been proposed that they are essential for the survival of mice in their specific dietary environment. During hominization on the other hand, as a consequence of changes in food selection, the detoxifying potential of CYP2D enzymes became less essential for survival, and, with no selection pressure applied on CYP2D gene products, accumulation of mutations resulted in a high degree of polymorphism and ultimately in the degradation and loss of function of most CYP2D genes.43
| Going Further With GWA Studies |
|---|
|
The application of GWA studies to quantitative traits provides a powerful way to explore the genetics of various risk factors measured in epidemiological studies. For example, in a GWA study of 1464 type 2 diabetes patients and 1467 controls, associations of LDLc, HDLc, triglycerides, and apoA1 and apoB plasma levels with loci already known to influence these traits were found again, but a novel association of triglycerides with the gene encoding the glucokinase regulatory protein (GCKR) was also identified.57 In large population studies, quantitative traits can also be dichotomized (eg, by comparison of extremes of the distribution of the trait). An interesting illustration of this approach was recently provided with the discovery of a gene that influences the QT interval (QTi) diagnosed on the ECG. A QTi prolongation is an indicator of delayed ventricular repolarization that may become clinically manifest with the occurrence of syncopes and ventricular arrhythmias such as torsades de pointe, which may lead to sudden death. The QTi length has a heritability of 30%, and short as well as long QTis are associated with an increased risk of cardiovascular morbidity and mortality (see Dekker et al66). Recently, a GWA study was performed to identify genes whose variability may contribute to the QTi variability in the population at large.67 A multistage design was used, which started with the comparison of the 2 extremes of the QTi distribution in females followed by a step that refined the best regions of interest and finally replicated the most interesting result in the whole initial population and 2 independent studies. This strategy resulted in the identification of frequent polymorphisms in the nitric oxide synthase 1 adaptor protein (NOS1AP) gene that were consistently associated with QTi both in men and women. NOS1AP is a regulator of the neuronal nitric oxide synthase that had not been previously suspected to be involved in cardiac repolarization. Its genetic variability accounts for
1.5% of the variance of the QTi in the population at large. The finding, which has now been replicated in other studies,68,69 demonstrates the potential of the GWA approach applied to quantitative traits and opens a new area of research for the prevention of sudden death.
A major limitation of GWA studies is that they are very costly and time-consuming when applied to studies of large sample size. One proposed solution to circumvent this limitation is to conduct GWA studies on pooled DNA samples.70
DNA Pooling and GWA Studies
The general principle of SNP microarray technology is to produce a quantitative signal proportional to the number of copies of a given allele in the DNA sample analyzed. When the DNA sample is that of an individual, the signal is used to assign a genotype according to the number of copies (0, 1, or 2) of a given SNP allele. When samples correspond to pooled DNAs, the signal is proportional to the number of copies of a given allele in the pool. By judicious composition of the pools (eg, grouping cases and controls in distinct pools), it is possible to compare allele frequencies between pools. This economical approach to the estimation of allele frequencies in GWA studies has been shown to be efficient.71–76 All studies of sufficient sample size in which appropriate phenotypes and DNA are available could benefit from this approach, the major restriction being the ability to construct pools of high quality. Because of the considerably reduced cost as a result of DNA pooling, several complementary genotyping arrays may be used in the same study, and the new genotyping arrays that will become available in the future with more and better markers will be usable in the same study with no major cost restriction. DNA pooling results in some loss of sensitivity that may be reduced but not completely eliminated by multiplying the number of pools and hybridizing each pool to several arrays. However, the possibility to combine the results of numerous studies should compensate for this loss of precision and facilitate replication analysis. The most important limitation of the pooling strategy is that it restricts the analyses to the hypotheses that were prespecified through the design of the pools.
Overall, with the accumulation of results from GWA studies, we are witnessing a true revolution that will not only impact our understanding of the genetics of common CVD but will, through the discovery of the implication of completely unsuspected genetic sequences, undoubtedly affect our understanding of the causes and pathophysiology of these diseases and open new directions for their prevention and treatment.
| Genotypic Versus Phenotypic Biomarkers |
|---|
Phenotypic biomarkers such as LDLc, blood glucose, blood pressure, brain natriuretic peptide, C-reactive protein, or several pharmacogenetic tests integrate a large number of genetic and nongenetic influences. This is why they are very informative and convenient to use in a medical context. Conversely, the weak increase in individual risk conveyed by single genetic polymorphisms explains why they are not useful risk indicators or biomarkers for common CVD. Information from several polymorphisms would have to be integrated to become clinically useful, but such integration is not trivial in the presence of weak nonadditive effects and multiplicity of possible combinations of genotypes that increase the risk. Because most of these combinations may be rarely observed, even in the largest studies, assessment of their relationship with the disease is quite challenging.
On the other hand, genetics can help elucidate whether a molecule, such as a protein measured in the blood, is involved in the origin of disease. For example, circulating levels of inflammatory biomarkers such as C-reactive protein or various cytokines are known to be increased in the course of atherosclerosis and to predict future cardiovascular events. If the level of a molecule in the circulation reflects the evolution or the extension of atherosclerosis, it can be a very useful biomarker but its elevation may not reflect a causal mechanism. Genetic studies may help elucidate whether the elevation is primary or secondary to the disease process. Indeed, an association of genetic variants with both circulating levels of the biomarker and disease risk is an argument for a causal relationship. An investigation based on this reasoning failed to support a causal association between plasma C-reactive protein and the metabolic syndrome.77 Conversely, a causal role of interleukin-18 in atherosclerosis and its complications was recently suggested by this approach.78
| Conclusion |
|---|
Glossary of Terms Used in Genetics
| Acknowledgments |
|---|
None.
| References |
|---|