Genome Architecture and Sequence Variation in Health and Disease
Availability of information on DNA sequence in human genomes and advances in technologies to amplify and sequence DNA have led to significant progress in delineating sequence differences that lead to disease. These techniques have also led to the discovery of sequence variants that occur in healthy individuals.
Studies of variation in the human genome are greatly facilitated through the availability of microarrays designed to detect single nucleotide polymorphisms (SNPs) that occur with frequencies greater than 1% to 5% in the population. Gene loci that are close to each other are often coinherited. SNP analyses can determine a series of alleles of loci in a specific region (a haplotype). Microarray technologies enable analysis of as many as one million SNPs on each array. These microarrays can also determine structural variation and copy number changes, defined as deletion or duplications greater than 1 kilobase (kb). Specific probes for regions known to frequently harbor copy number changes are also present on SNP microarrays such as the Affymetrix 6.0 array. Advances in technologies in DNA sequencing include massively parallel sequencing, often referred to as next-generation sequencing.
This chapter explores aspects of structural genomic variation and sequence variation in different populations and the role of sequence differences in the etiology of common disorders such as diabetes mellitus, obesity, and coronary heart disease. It also covers next-generation sequencing and examples of its application to the discovery of gene defects that lead to disease.
Through the use of polymerase chain reaction techniques, samples with low concentrations of DNA can be used to derive material for DNA sequencing. This chapter discusses applications of these techniques to discover how the sequence in modern humans differs from that of Neanderthals and early modern humans. Also presented are reports of studies of DNA extracted from two teeth from a man who died in 1783. DNA analysis enabled researchers to diagnose the disease that afflicted him and analyze the specific mutation and surrounding polymorphisms that connected him to present-day patients with the same disease.
Structural variation
In the human genome, segmental duplications with highly identical sequence are usually interspersed and separated by more than 1 megabase. She, et al. (2006), identified more than 400 duplication blocks within the human genome. Segmental duplications are frequently clustered in pericentric and subtelomeric regions (Marques-Bonet, et al., 2009). Evidence indicates that pericentric and subtelomeric duplications evolved independently from intrachromosomal duplications. Core duplicons of 5–30kb occur in intrachromosomal duplications. One example of a core duplicon is LCR16a, which is rich in Alu repeats.
Unequal crossover between directly oriented duplicated segments may lead to dosage changes or altered structure and function of a gene. Marques-Bonet, et al., noted that most copy number polymorphisms result from this mechanism.
Regions between segmental duplications may be deleted, duplicated, or inverted as a result of unequal crossover. The existence of highly similar duplicated segments on two different chromosomes may lead to translocation events. Polymorphisms also exist within the segmental repeats, and in different individuals, these regions may be larger or smaller. Segmental duplications are particularly abundant in certain chromosome regions, such as 15q11-q13, and these regions are frequent sites of deletions and duplications.
A key question is whether a specific structural variant, such as a deletion or a duplication (copy number variant) that includes unique sequence DNA, is a direct cause of phenotypic abnormality. Genomic syndromes often occur as a result of deletion or duplication of genomic regions that are flanked by segmental duplication blocks. In these syndromes, specific phenotypes result from the deletion of specific regions; for example, Williams syndrome results from the deletion of chromosome 7q11.2. Characteristic phenotypic features of this syndrome include cognitive and behavioral impairments, distinct facial features, and cardiac malformations.
Girirajan and Eichler (2010), reviewed findings in a subset of genomic structural changes in which a particular genomic change results in a series of phenotypes in which specific clinical features differ in different individuals. Differences occur in the degree to which individuals with the same defect are affected—that is, there are varying degrees of penetrance. The clinical consequences of a particular dosage change in a specific region may be influenced by dosage changes or mutations elsewhere in the genome.
Examples of specific regions where deletions are associated with a variety of phenotypes include 16p11.2. In some cases with deletion in this region, severe obesity occurs; other cases with the same deletion are diagnosed with autism, while in others, congenital malformations and developmental delay occur. Diverse phenotypes have been described in cases with deletion of 17q12; some cases present with hereditary neuropathy, with a tendency to pressure palsy (HNPP); and in other cases, schizophrenia occurs. Other diagnoses encountered in patients with 17q12 deletion include renal cystic disease or maturity-onset diabetes of the young. Deletion in 1q21.1 may be associated with a learning disability in some cases and with congenital heart disease or schizophrenia in others.
The copy number variants associated with diverse phenotypes are sometimes found with low frequency in control populations. One question that arises is whether the different phenotypic consequences result from slight differences in the position of deletion breakpoints and whether sequence differences occur in the same region on the homologous chromosome.
Another genetic factor that may play a role in some cases is that the deletion of a specific locus on one chromosome unmasks a recessive mutant allele at that locus on the homologous chromosome. Other important possible explanations for the phenotypic variation are that additional genetic modifiers elsewhere in the genome modify the phenotype.
Girirajan and Eichler (2010), proposed that a two-hit genomic model most likely explains the variable phenotypes in individuals with copy number variants in 16p12.1 or 22q11.2.
Copy number variations and deletions, in particular, are most often considered to be of clinical relevance if they arise de novo—that is, if they are present in a child but absent in the parents. However, growing evidence indicates that parents who carry specific copy number variants may have subclinical manifestations attributable to the genomic change. CNVs and microarray are illustrated in Figure 1.1.
Figure 1.1 Results of analyzing SNP alleles and copy number variants using an Affymetrix 6.0 array and genotyping console on twin females with autism. Rows 1 and 2 at the top of the figure show the distribution of A and B alleles of specific SNPs. Note the identical patterns of alleles in the twins. Rows 3, 4, 5, and 6 show a chromosomal region with a copy number variant. Each twin has three copies of the CNV that encompasses three genes, shown at the bottom of the figure. A known population variant region is indicated, but the variant region is shorter and does not encompass a gene.