Abstract P224: Whole Genome Sequencing of 8 Indian Asians
Background The genetic architecture and variation of Indian Asians, who represent one quarter of the world's population, has not been described. This represents an important obstacle to the identification of the genetic factors contributing to diseases encountered in Indian Asians. Aim To identify and describe the patterns of genetic variation in Indian Asians. Methods We carried out high-depth whole genome sequencing of 8 Indian Asian men, using paired-end and mate-pair libraries, and Illumina GAIIx instruments. We used Stampy, with BWA as a pre-mapper, to align reads to Genome Reference Consortium build 37 of the human genome (GRCh37). We used GATK and SAMtools to call SNPs and indels; accepting genetic variants called by both algorithms as confirmed. Results Mean coverage was 28.4x (range 13.9 to 32.5x); 99.8% of the mappable genome was covered by at least one read in each sample. We found 6,602,840 autosomal SNPs (mean 3,318,386 per person) of which 436,823 (6.6%) are novel (not in dbSNP132 or 1000G June 2011). The majority of novel SNPs were singletons (88% vs 20% for known SNPs). There were 50,585 novel SNPs present at least twice (ie MAF>10%), and 2,174 novel SNPs predicted to affect protein coding. Amongst the novel cSNPs that are identified as pathogenic by SIFT or PolyPhen2, 145 are in genes linked by OMIM to human disease, including obesity (FTO, UCP1), diabetes mellitus (CDKAL1, GCGR, HNF1B), lipid metabolism (APOB), renal disease (NPHP4, PKD1), hypertension (NOS2), iron and B vitamin metabolism (CUBN, TCN2, TF), and susceptibility to malaria and leprosy (CR1, FCGR2A, NOS2, TLR1). There were 65,613 novel autosomal indels of which 35,097 are present at least twice, and 2,301 novel deletions >100bp. We found that amongst the novel SNPs and indels discovered, >50% are not in high LD (r2≥0.8) with tagSNPs on available high-density microarrays Conclusions Our results reveal 502,436 new genetic variants amongst Indian Asians, including coding SNPs and indels in genes involved in atherosclerosis, carbohydrate and lipid metabolism, immunity and inflammation. The majority of novel variants are in low LD with standard commercial micro-arrays, indicating that these genome-wide arrays do not capture Indian Asian specific genetic variation. Our findings will inform the design of future studies to identify the genetic factors contributing to cardiovascular disease and other disorders that are more common amongst Indian Asians.
- © 2012 by American Heart Association, Inc.