Genome-Wide Array Analysis of Normal and Malformed Human Hearts
Background— We present the first genome-wide cDNA array analysis of human congenitally malformed hearts and attempted to partially elucidate these complex phenotypes. Most congential heart defects, which account for the largest number of birth defects in humans, represent complex genetic disorders. As a consequence of the malformation, abnormal hemodynamic features occur and cause an adaptation process of the heart.
Methods and Results— The statistical analysis of our data suggests distinct gene expression profiles associated with tetralogy of Fallot, ventricular septal defect, and right ventricular hypertrophy. Applying correspondence analysis, we could associate specific gene functions to specific phenotypes. Furthermore, our study design allows the suggestion that alterations associated with primary genetic abnormalities can be distinguished from those associated with the adaptive response of the heart to the malformation (right ventricular pressure overload hypertrophy). We provide evidence for the molecular transition of the hypertrophic right ventricle to normal left ventricular characteristics. Furthermore, we present data on chamber-specific gene expression.
Conclusions— Our findings propose that array analysis of malformed human hearts opens a new window to understand the complex genetic network of cardiac development and adaptation. For detailed access, see the online-only Data Supplement.
Received December 9, 2002; revision received February 26, 2003; accepted February 27, 2003.
Congenital heart defects (CHDs) account for the largest number of birth defects in humans, with an incidence of ≈8 per 1000 live births. Heart formation requires complex interactions among cells originating from different cell lineages that arise from 3 distinct origins: cardiogenic mesoderm, neural crest, and proepicardium. Only a minority of CHDs are caused by single-gene defects, whereas most are thought to be multigenetic disorders. The heterogeneity of CHDs associated with single-gene defects, as demonstrated for NKX2.5 or TBX5 mutations,1,2 makes mechanistic understanding of gene function challenging and points to a complex genetic network with modifier genes, genetic polymorphism, and the influence of environmental factors.
Normal cardiac development in humans leads to a 4-chambered heart, defining the basis of normal cardiac physiology. The right heart belongs to the low-pressure pulmonary system and the left heart to the high-pressure body circulation. As a consequence of heart malformations, abnormal hemodynamic features can occur because of volume or pressure overload and lead to an adaptation process of the heart. So far, knowledge about the molecular pathways involved in this process has been mainly gained by animal studies.3–5
Taking the above into account, we attempt a genome-wide gene expression study of congenitally malformed hearts in humans, with the purpose to ultimately identify genes associated with dysdevelopment as well as genes involved in adaptation processes of the heart. Our present data suggest that patients with congenitally malformed hearts can serve as a model to study these transcriptional fingerprints.
We examined and compared genes dysregulated in defined congenitally malformed hearts with the molecular response to pressure overload leading to hypertrophy and the chamber-specific cardiac molecular portrait. Finally, we provide a cardiac-specific clone selection representing genes persistently expressed in normal and diseased myocardium.
Taken together, our data provide the first genome-wide transcriptional fingerprint of functionally known and unknown genes expressed in the human heart at different cardiac conditions. For detailed access, we provide a Data Supplement, in which the obtained gene expression profiles can be searched by gene names, accession numbers, and clone IDs.5a
All cardiac samples were obtained from the German Heart Center at cardiac surgery after short-term cardioplegia, after ethical approval by the institutional review committee and informed consent of the patient or parents. Tissue from normal human hearts was obtained from unmatched organ donors without cardiac diseases, where the hearts could not be transplanted because of organizational difficulties. All samples were directly snap frozen in liquid nitrogen after excision and stored at −80°C. HEK 293 cells were harvested and used as a normalization sample.
In total, 61 Human Unigene Set-RZPD 2 cDNA arrays from the German Genome Resource Center6 were used, each represented by 3 Hybond N+ membranes containing polymerase chain reaction (PCR) products of 74 695 different IMAGE clones spotted in duplicate. At least one IMAGE clone of 33 947 Unigene clusters has been resequenced (see the Data Supplement).
Sample Preparation and Array Hybridization
Total RNA of all cardiac tissues and HEK 293 cells was extracted using TRIzol reagent (Gibco BRL) according to the manufacturer’s protocol. Labeling was performed by reverse transcription of 8 μg total RNA with avian myeoblastosis virus reverse transcriptase (AMV-RT; Promega) in the presence of pd(T)12-18 (Amersham Pharmacia Biotech) and α33P-dCTP (Amersham Pharmacia Biotech). Unincorporated nucleotides were removed using ProbeQuant G-50 micro columns (Amersham Pharmacia Biotech), and cDNA was added to the hybridization solution together with salmon sperm DNA (Gibco BRL), placenta DNA, and pd(A)40 (MWG Biotech). Hybridizations were performed at 65°C for 16 hours. After washing, arrays were exposed for 24 hours and scanned using a Fuji Film Bas-1800 reader (Fuji photo film). Image analysis was carried out using the X-Digitize image processing software.7
Array Data Preprocessing
Data normalization was principally performed as described previously.8 After local background subtraction, all intensity values below 200 were reset to this level and the intensities of each hybridization were scaled to a constant sum. Visual inspection showed that the variance of the resulting log-transformed intensities was approximately constant across the intensity range. Because the hybridizations were performed on arrays from 2 different production batches, which significantly affected the measurements, we corrected the influence of the production batch as follows. We considered 2 virtual reference hybridizations, defined by the spot-wise medians of 10 hybridizations from the respective array batch. These 2 sets of hybridizations were performed with patient samples as well as normalization samples sharing equal phenotype profiles. The logarithms of ratios between the intensities of each hybridization of interest and those of the respective batch-specific virtual reference hybridization were shifted such that the median over the 40% of spots with highest average log-intensity became 0. Finally, the obtained log-ratio (base 10) values were averaged over duplicate spots per clone, resulting in values that are referred to as normalized expression levels in the text. To limit the analysis to meaningful data that did not show consistently low intensity or low variance across the samples, all additional statistical analysis was performed on the normalized expression levels of 8069 selected clones, whose intensities were among the 15% highest in at least 4 hybridizations and whose natural log ratios had a standard deviation across the samples of at least 0.5.
Differential Gene Expression Analysis
To account for the influence of major confounding factors, we used linear models incorporating the factors age, gender, disease, and tissue type as a statistical framework for identifying genes that are affected in particular phenotypes.9 We subjected the normalized expression levels of each clone to a linear model of the following form: yhijk=μ+Ah+Si+Tj+Dk+ε, where Ah is the effect of age h (young or old), Si the effect of the patient’s gender, Tj the effect of tissue j (right ventricle [RV] or left ventricle [LV], right or left atrium), and Dk the effect of disease status k (atrial septal defect [ASD] II, TOF, ventricular septal defect [VSD], right ventricular hypertrophy [RVdis], and normal). The models were fitted with the least-squares method using the function lm() in the statistical software R,10 with the constraints that the coefficients for the different levels of each factor sum up to 0. For all coefficients of interest, the null hypothesis of their being equal to 0 was tested under the assumption of independent errors following a normal distribution. Thus we obtained a probability value quantifying the statistical significance of differential expression for each gene and each phenotype comparison of interest. The reliability of the obtained results was additionally assessed by estimating the rates of falsely significant genes with a permutation approach. For this purpose, the linear models were fitted for 500 random permutations of the sample labels. For each coefficient of interest, the expected proportion of false positives among the significant genes (false discovery rate) was estimated from the randomized data.11
For comparison between normal RV and LV tissue, paired samples from 6 individuals were analyzed using t tests for paired observations.
To analyze which phenotypical differences among the tissue samples are most pronounced in terms of gene expression patterns, we used the class discovery method ISIS.12 ISIS searches for binary class distinctions among the set of tissue samples that are characterized by clear differential expression of a corresponding subset of genes.
For the standard visual display of gene expression matrices, genes were grouped by average linkage hierarchical clustering using the Euclidean distance as dissimilarity measure.
Real-time PCR assays were carried out using SYBR Green PCR Master Mix (Applied Biosystems) on ABI PRISM 7900HT Sequence Detection System (Applied Biosystems) according to the manufacturer’s protocol. Intron spanning primers (MWG Biotech) were designed using the Primer Express software (Applied Biosystems). Primer sequences are available in the Data Supplement. RT reactions were carried out via AMV-RT (Promega) with random hexamers (Amersham Pharmacia Biotech) with 1 μg total RNA. We analyzed the significance of the normalized ΔΔCT values with t tests and considered genes with P<0.05 as confirmed to be significantly differentially expressed. Data normalization was performed using HPRT as a housekeeping gene.
To allow the selection of a balanced patient population enabling the separation of disease- or tissue-specific expression patterns, we collected 150 samples by a standardized procedure during cardiac surgery over a time period of 3 years at the German Heart Center Berlin. Considering known confounding factors like age and gender, we selected 40 individuals for our additional analysis. All patients were clinically studied, and their hemodynamic state was worked up by cardiac catheterization and echocardiography before surgery (Figure 1).
In total, 6 normalization samples and 55 patient samples have been studied. The latter belonged to 4 categories (Figure 1), as follows: (1) right atrial (RA, n=2) and ventricular (n=9) samples of patients with classical tetralogy of Fallot (TOF), exposed to pressure overload resulting in right ventricular hypertrophy; (2) right ventricular samples of a patient population (RVdis, n=7) with right ventricular hypertrophy in response to pressure overload and a variety of cardiac malformations, which we analyzed together with right ventricular samples of TOF as one sample population (RVH) to elucidate the common biomechanical adaptation processes; (3) RA samples of hearts with VSD (n=4), which have not been exposed to biomechanical stress; and (4) normal RA (n=4), left atrial (LA, n=3), RV (n=6), LV (n=6), and interventricular septum (IVS, n=6) samples as well as RA samples of patients with ASD (n=8); which we profiled to identify chamber-specific genes and to provide a framework of interpretation. The normal RV, LV, and IVS samples were obtained from 6 different individuals.
Pursuing a genome-wide approach, we used the Human Unigene Set-RZPD 2 cDNA arrays. The set contains 74 695 different IMAGE clones belonging to 49 255 different Unigene clusters with 12 657 known genes. In total, ≈9 million measurements of gene expression were made. All gene expression data are available for download in the Data Supplement.
Gene Expression Pattern and Cardiac Phenotype
Characteristic expression patterns were obtained from a linear model analysis for the following phenotype comparisons: (A versus V), the comparison of atria and ventricle, where we based our analysis on 40 samples from different individuals (20 atria and 20 ventricle); (TOF in RV), where we analyzed all 22 RV tissue samples for genes associated with TOF (9 samples); (RVH in RV), where the same 22 samples were studied for effects of right ventricular hypertrophy (16 samples); and (VSD in RA), where we analyzed 18 RA samples regarding the effects of VSD (4 samples) (Table 1). The comparison between normal right and left ventricular tissue (RV versus LV), using paired t tests, was based on matched samples from 6 individuals.
To verify our observed gene expression results, we investigated expression levels of 21 randomly chosen genes with P<0.01 by real-time PCR and confirmed 18 of them to be differentially expressed (Figure 2). Comparison of the obtained fold-changes by array analysis versus real-time PCR confirmed our approach as conservative, revealing higher expression differences in the latter (Table 2).
To provide an overview of the molecular signatures, we hierarchically clustered all known genes represented by resequenced clones that appeared significantly differential (P<0.01) in at least one of the above comparisons [Figure 2, −log10(P) color-coded]. This overview allows a comparison between the transcriptional fingerprint of each of these genes in the different phenotypes. It appears that ≈25% of the CHD-associated genes (green or red in one of the first 3 columns) are not chamber-specific in the normal heart (yellow in the last 2 columns). Additional inspection shows a partially similar expression dynamic of the TOF portrait with that of RVH but an opposite expression dynamic compared with VSD. In addition, a large amount of genes characteristic for the molecular signature of VSD are not differentially expressed in any other analysis. Comparison of the expression profile of genes characteristic for TOF and RVH provides the possibility of subtracting both from each other and identifies genes specific for either TOF or RVH.
Association of Functional Gene Categories to Specific Phenotypes
To provide a global view of the association of functional gene categories with particular phenotypes, we used the statistical method of correspondence analysis13 (Figure 3). Clones were assigned to gene ontology categories according to the annotations in the LocusLink database.14 We applied correspondence analysis to the contingency table of numbers of differentially expressed genes (P<0.05) per gene ontology category and phenotype comparison. Gene categories with similar patterns of regulation across phenotypes are mapped close to each other. Furthermore, the biplot shows the associations between gene categories and phenotypes. A gene category lies far in the direction of a certain phenotype, if disproportionally many genes are differentially expressed in this phenotype. For instance, many genes contributing to the structural integrity of a muscle fiber (structural constituent of muscle) are differentially expressed between atria and ventricle.
Class Discovery Using ISIS
We used the class discovery method ISIS to see which class distinctions among the tissue samples are most pronounced in terms of gene expression profiles. The most outstanding binary class distinction identified, irrespective of sample annotations, was exactly the distinction between atria and ventricle samples.
Because the human heart consists of 2 general compartments, the atria and ventricle, we analyzed the molecular repertoire that each of these expresses (Data Supplement). In addition to well-known chamber-specific genes, like atrial and ventricular myosin light chains, this signature includes diverse previously unknown chamber-specific genes for muscle contraction, extracellular components, cell growth and differentiation, and energy metabolism.
The less force-developing atria were characterized by higher expression of genes encoding proteins associated with extracellular matrix or actin modulation, like CST3 and PCOLCE. The translation factor EEF1A and the DNA helicase REQL4 were highly significantly upregulated in atria. EEF1A-2/S1 protein is activated on myogenic differentiation and delays myotube death after apoptotic stress induction.15 KCNIP2, a potential target for ventricular tachycardia,16,17 is even more highly expressed in the human atria than in the ventricle (P=0.01), pointing to a potential role also in atrial arrhythmia. Genes with expression levels higher in ventricle than in atria belong mainly to 3 major functional classes: cytoskeleton-contraction, metabolism–energy turnover, and cell cycle–growth. Several of these genes are involved in ventricular myocardial disorders: TMP118 is mutated in human cardiomyopathies, FHL1 is downregulated in failing human hearts,19 and ANKRD2 is involved in the process of cardiac hypertrophy.20
The combination of computational analysis of protein similarity together with our genome-wide transcription footprint of the heart points to so far functionally unknown genes. For example, we found Unigene cluster Hs.355815, which has a 59% protein sequence similarity to the myotonic dystrophy-associated protein kinase β in rat, specifically expressed in ventricle.
In addition to atrial and ventricular specifications, we analyzed molecular differences between the normal high-pressure LV and low-pressure RV (Table 1, Figure 2). In general, we discovered genes encoding proteins involved in cell cycle, cell differentiation, and energy metabolism to be downregulated in the RV compared with the LV. An example of new information derived from our study is the LV-specific expression of Unigene cluster Hs.323099 with a sequence similarity of 93% to ALK3, which is essential for cardiac development.21 Hs.323099 maps to chromosome 10q22, the genomic region that was previously linked to autosomal cardiomyopathy.22 Furthermore, we did not observe any significant difference between LV and IVS.
Tetralogy of Fallot and RVH
We observed distinct molecular portraits of TOF and RVH with genes of various functional classes. Even though the right ventricular hypertrophy is part of TOF, we could clearly distinguish between these 2 gene expression profiles by regarding in the statistical analysis once TOF and once TOF combined with the RVdis samples as one phenotype (RVH). Therefore, TOF reveals the molecular signature of the malformation in addition to the adaptation portrait.
Beside genes involved in cell cycle, a characteristic feature of the TOF signature is the upregulation of ribosomal proteins S6, L37a, S3A, S14, and L13A (Figure 4A). A specific role of ribosomal proteins during cardiac development has been only described for the chick ribosomal protein L10, which is downregulated in the cardiac outflow tract of chick embryos lacking neural crest cells.23
Our expression data reveal a TOF-specific dysregulation of potential targets that could be involved in pathways leading to cardiac dysdevelopment (Figure 4A); for example, SNIP, A2BP1, and KIAA1437 are upregulated. SNIP interacts with Smad4, a mediator of TGF-β, activin, and BMP signaling, which are essential for normal cardiac development.25 A2BP1 belongs to a novel gene family sharing RNA-binding motifs expressed at the developing heart during mouse embryogenesis.26 KIAA1437 binds k-ras, where k-ras–deficient mice develop a thin ventricular wall and die until term.27 Genes markedly downregulated in TOF include STK33, BRDG1, and TEKT2.
In RVH we observed a hypertrophy-specific gene expression pattern of genes mainly involved in stress response, cell proliferation, and metabolism. Intriguing is the upregulation of ADD2, whose relative ADD1 was recently shown to be associated with hypertension in human.28 Because the expression of several genes of the RVH signature was similar to their expression levels in LV (Figure 2), we analyzed whether the molecular adaptation to pressure overload could lead to a molecular transition from right to left ventricular characteristics. For the RVH-associated genes (P<0.01), we compared the difference between the mean intensity of each gene in RVH and that in normal RV samples to the mean intensity of the gene in normal LV samples. We found a significantly positive correlation coefficient of 0.27 (P<0.0001, permutation test), indicating that the genes dysregulated in RVH have a tendency to behave similarly in the disease state as in normal LV tissue.
To separate in TOF the malformation from the adaptation-specific molecular signature, we subtracted the RVH-specific genes from the TOF molecular portrait. All genes with P<0.1 in RVH were subtracted from all genes with P<0.01 in TOF, resulting in 88 clones highly specific for TOF with regard to the primary changes underlying the malformation process (Data Supplement).
Ventricular Septal Defect
To obtain a molecular portrait that is not influenced by biomechanical adaptation processes, we studied RA samples of patients with VSD, intact tricuspid valve, and normal RA pressure. We observed a VSD-specific molecular signature dominated by downregulated genes with respect to the other RA samples (Figure 4B). As seen in TOF, several ribosomal proteins (S11, L18A, L36, LP0, L31, and MRPS7) are differentially expressed, but here they are downregulated. Other VSD-specific genes encode ion transporters or function during vertebrate development. The differential expression of ion channels was restricted to solute and potassium channels (SLC26A8, SLC16A5, SLC4A7, KCNS2, and KCNN3). A thorough literature study of downregulated genes in VSD revealed that a major part is involved in cell proliferation and differentiation during embryogenesis as well as apoptosis. Examples are AMD1,29 RIPK3,30 EGLN1,31 and SIAHBP1.32 It is noteworthy to mention the significant downregulation of ARVCF deleted in velo-cardio-facial syndrome.33
Genome-Wide Compendium of Persistent Human Heart Transcripts
In addition, we selected a set of 6075 clones (4340 different Unigene clusters) that appeared to be persistently transcribed in our samples (Data Supplement).
Most microarray studies in human published to date, including the present one, have been observational studies, and it should be born in mind that effects related to bias and confounding have to be considered in advance.34 Facing these statistical problems, we started this study by the collection of a large amount of samples, enabling a final selection of a patient population that is as balanced as possible with respect to known confounding factors. After visual inspection of a commonly used t test analysis, we observed age and tissue to be major confounding factors and therefore chose a linear model incorporating these factors as our statistical framework. Nevertheless, the findings of our study should be regarded as preliminary, and additional studies will be necessary to elucidate the role of the identified genes in normal and diseased human hearts.
We are able to show disease-specific molecular portraits for TOF and VSD as well as genes involved in the biomechanical adaptation process due to pressure overload. The combination of statistical analysis and our study design enabled the separation of secondary adaptation-specific expression patterns from effects attributable to primary cardiac dysdevelopment. In addition, we provide a chamber-specific genome-wide expression portrait of the normal human heart.
For a global overview of the disease and tissue types, we applied a variety of computational tools, such as the class discovery method ISIS and the statistical method of correspondence analysis, which could associate functional gene categories with particular phenotypes. We were able to confirm known genes and pathways involved in the developmental process and the molecular difference between the atria and ventricle. Furthermore, we obtained a global molecular portrait of the normal and diseased heart using a genome-wide approach including functionally unknown genes.
Unexpectedly, ribosomal proteins were found as a characteristic part of the TOF and VSD portrait. Taking into account that most human ribosomal genes were expressed at similar levels throughout our sample population, it is likely that these specific differentially expressed genes play an extraribosomal role during cardiac development.33
The global analysis of the VSD molecular signature showed a general transcriptional downregulation compared with TOF as well as with all other sample categories. The question of whether the observed downregulated transcription mirrors a reduction of essential proteins leading to incomplete fusion, an underdeveloped atrium, or an unknown physiological process will have to be elucidated in the future.
Our findings suggest that the analysis of malformed human hearts using powerful techniques like microarrays combined with statistical methods opens a new window to understanding cardiac adaptation and development.
This project was supported by a grant from The Max Planck Society for the Advancement of Science. We acknowledge the support of the German Genome Resource Center Berlin. We thank M. Steinfath for image analysis; I. Dunkel for technical assistance, and J. Kreuder, D. Rawer, W. Huber, and L. Pauliks for helpful discussions.
Futher information on obtained gene expression profiles, gene names, accession numbers, and clone IDs is available in an online-only Data Supplement at http://www.circulationaha.org.
Baumgarten G, Knuefermann P, Kalra D, et al. Load-dependent and -independent regulation of proinflammatory cytokine and cytokine receptor gene expression in the adult mammalian heart. Circulation. 2002; 105: 2192–2197.
Kaynak B, von Heydebreck A, Mebus S, et al. Genome-wide array analysis of normal and malformed human hearts. Available at: http://www.molgen.mpg.de/∼chd/arraywebsup. Accessed April 29, 2003.
Resource Center and Primary Database (RZPD), GmbH. Available at: http://www.rzpd.de. Accessed April 4, 2003.
Steinfath M, Wruck W, Seidel H, et al. Automated image analysis for array hybridisation experiments. Bioinformatics. 2001; 17: 634–641.
Beißbarth T, Fellenberg K, Brors B, et al. Processing and quality control of DNA array hybridization data. Bioinformatics. 2000; 16: 1014–1022.
Storey JD, Tibshirani R. SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In: Parmigiani G, Garrett ES, Irizarry RA, et al, eds. The Analysis of Gene Expression Data: Methods and Software. New York: Springer. In press.
von Heydebreck A, Huber W, Poustka A, et al. Identifying splits with clear separation: a new class discovery method for gene expression data. Bioinformatics. 2001; 17 (suppl 1): 107–114.
Fellenberg K, Hauser NC, Brors B, et al. Correspondence analysis applied to microarray data. Proc Natl Acad Sci U S A. 2001; 98: 10781–10786.
LocusLink database. Available at: http://www.godatabase.org. Accessed April 4, 2003.
Chambers DM, Peters J, Abbott CM. The lethal mutation of the mouse wasted (wst) is a deletion that abolishes expression of a tissue-specific isoform of translation elongation factor 1alpha, encoded by the Eef1a2 gene. Proc Natl Acad Sci U S A. 1998; 95: 4463–4468.
Guo W, Li H, Aimond F, et al. Role of heteromultimers in the generation of myocardial transient outward K+ currents. Circ Res. 2002; 90: 586–593.
Karibe A, Tobacman LS, Strand J, et al. Hypertrophic cardiomyopathy caused by a novel α-tropomyosin mutation (V95A) is associated with mild cardiac phenotype, abnormal calcium binding to troponin, abnormal myosin cycling, and poor prognosis. Circulation,. 2001; 103: 65–71.
Yang J, Moravec CS, Sussman MA, et al. Decreased SLIM1 expression and increased gelsolin expression in failing human hearts measured by high-density oligonucleotide arrays. Circulation. 2000; 102: 3046–3052.
Gaussin V, Van de Putte T, Mishina Y, et al. Endocardial cushion and myocardial defects after cardiac myocyte-specific conditional deletion of the bone morphogenetic protein receptor ALK3. Proc Natl Acad Sci U S A. 2002; 99: 2878–2883.
Deleted in proof.
Morrison AC, Bray MS, Folsom AR, et al. ADD1 460W allele associated with cardiovascular disease in hypertensive individuals. Hypertension. 2002; 39: 1053–1057.
Sun X, Lee J, Navas T, et al. RIP3, a novel apoptosis-inducing kinase. J Biol Chem. 1999; 274: 16871–16875.
Wool IG. Extraribosomal functions of ribosomal proteins. In: Green R, Schroeder R, eds. Ribosomal RNA and Group I Introns. Austin: RG Landes; 1996: 153–178.