Articles of the week: Genetic forecasting – cell biology experimentally validates functional mutations from genome sequencing
Imagine you live in Boston or New York. It is Monday January 26, 2015. You are watching headlines of an impending blizzard, trying to figure out the truth about the weather for the next day. You find that the National Weather Service has a cool online tool – experimental probabilistic snow forecast (see here). As described in Slate magazine (see here), this tool predicted a 67 percent chance of at least 18 inches in New York City.
Unfortunately, most people interpreted this data that there would be 18 inches of snow, not that there could be (with a certain probability) 18 inches of snow.
It was not until Mother Nature did her experiment that we saw the outcome: not much snow in the Big Apple, more than 2 feet of snow in Boston.
The analogy with human genetics is this: it is possible to forecast the functional consequences of deleterious mutations, but it is not until the experimental snow falls – molecular or cellular experiments revealing the functional consequences of mutations – that the functional consequences are actually known. And without knowing the functional consequences of mutations, it is difficult to determine the association of these mutations with human disease.
This week’s paper, published in PLoS Genetics by Merck’s head of genetics in the Department of Genetics & Pharmacogenomics (GpGx), Dr. Heiko Runz, illustrates this concept.
Background: Identification of novel disease genes through sequencing is limited by two key factors: the large proportion of “neutral” alleles in the human genome that far outnumber harmful disease-causing variants; and the inability to easily distinguish harmful from neutral alleles. To overcome the noise that occurs when neutral and harmful variants are included in genetic association studies, complex trait disease cohorts for gene discovery have to be very large. This was highlighted recently for the low-density lipoprotein receptor (LDLR), where sequencing of >9,700 individuals was needed to establish that rare variants increase risk of myocardial infraction (MI) in the general population (Do et al., Nature 2014). The authors of that study hypothesized that it might be possible to substantially reduce sample sizes for gene-based association tests (also called “burden tests”) if only harmful variants could be considered.
Summary of the study: The current PLoS Genetics study, which is one of the first to combine sequencing with systematic phenotypic experiments in cells, shows that this is indeed the case. The authors established a scalable microscope-based functional profiling platform that allowed them to characterize the biological functions of LDLR missense alleles in an unbiased and quantitative manner through systematic overexpression and complementation experiments in cells. They leveraged the exomes of 1,716 cases with early-onset MI and 1,519 MI-free controls, as well as exome chip data from close to 40,000 individuals, to generate mutation constructs for all LDLR missense variants. This experimental data was used to stratify variants as either harmful or neutral based on in vitro ascertained functions. When incorporated functional information into genetic burden tests, they could refine the risk of rare LDLR allele carriers from 4.5- to 25.3-fold for high LDL-C, and from 2.1- to 20-fold for MI. Based on these outcomes they estimate that their strategy would have made it possible to reduce sample sizes for establishing LDLR as MI gene by more than half.
Why this is important: The paper establishes a novel approach to extract function from sequence data. Further, it demonstrates the improved power of genetic studies that incorporate experimentally validated functional variants into tests of association. It is likely that future studies will use functional data to overcome obstacles in interpreting rare protein-coding genetic variation, which will spur complex disease gene discovery. As more and more large-scale sequencing studies are performed (we are waiting to hear more, President Obama!), interpreting the functional consequences of genetic data – and making this part of genetic association tests – will become increasingly important.
For drug discovery, it is important to not only identify disease-associated genes, but also to know functional status of disease-associated variants. As described below for polycythemia vera (PCV), a crucial step in drug discovery is understanding whether to activate or inhibit a target of interest. Functional studies in pre-clinical models (e.g., cellular models, animal models) are an important component of any large-scale sequencing study. In fact, it is likely that functional studies will become the bottleneck in interpreting large-scale sequencing studies.
As with snowfall, these experimental methods help to overcome the challenges of predicting functional consequences of inherited variation – they determine, via direct experimentation, whether mutations are harmful or neutral.
Other studies of interest
Ruxolitinib versus Standard Therapy for the Treatment of Polycythemia Vera, Vannucchi et al (NEJM, January 2015). Somatic mutations in JAK2 cause a rare blood disease, polycythemia vera (PCV), in which the bone marrow makes too many red blood cells. Disease-causing mutations are gain-of-function. In a study by Novartis (press release here), a JAK1-2 inhibitor, ruxolitinib, was administered to 110 PCV patients vs 112 PCV patients treated with standard therapy. The primary end point was both hematocrit control through week 32 and at least a 35% reduction in spleen volume at week 32, as assessed by means of imaging. The results were convincing: the primary end point was achieved in 21% of the patients in the ruxolitinib group versus 1% of those in the standard-therapy group (P<0.001). Hematocrit control was achieved in 60% of patients receiving ruxolitinib and 20% of those receiving standard therapy; 38% and 1% of patients in the two groups, respectively, had at least a 35% reduction in spleen volume.
Variation in the Human Immune System Is Largely Driven by Non-Heritable Influences, Brodin et al. (Cell, January 2015). This study offers many insights into the role of causal biology in human disease both in terms of genetics and also in terms of perturbagens that affect development of the immune system. Firstly despite the title, the authors show there are many highly heritable immune traits. Many examples listed in the Supplementary Table 4 contain proteins that have been successfully drugged in the treatment of immune disorders, including IL12, IL17, IFNA, and IL6. Each of these traits have a heritability >0.6, indicating a genetic influence. Other quantitative traits such as CD4+ central memory cells and NK-T cells are also highly heritable. Secondly, as commented by Jean-Laurent Casanova and Laurent Abel (Cell 160, January 15, 2015) and shown in Figure 1, the measures of heritability decline with age, reflecting the effects of environmental perturbagens such as infectious agents. Brodin et al. present data showing the impact of CMV virus on heritability estimates of immune traits. While CD8+ and γδ T-cell populations are greatly impacted, others are not. These results indicate the importance of stratifying traits based on age measures, as the interventions for each will vary by perturbagen reflecting the causal biology relevant to each. [Thanks to Alan Herbert]
The Evolution and Functional Impact of Human Deletion Variants Shared with Archaic Hominin Genomes, Lin et al (Molecular Biology and Evolution, January 2015). Alleles that are adaptive in one environment are frequently maladaptive in another. In other words, protective alleles, say in the environments of our ancestors may actually be associated with disease in today’s environment. A prime example is the starvation resistant phenotype of the Pima Indians in the Southwest of North America was adaptive before the modern age that brought rich caloric food to this population driving the highest levels diabetes in the world. As another example of this dichotomy, research coming out of Gokcumen’s lab that used classic evolutionary comparative biology tools to identify alleles that where driven by adaptive evolution yet those very same alleles are associated diseases today. To identify these genes they mapped the human, archaic hominin and chimpanzees genomes together and identified exonic deletions. Then the authors mapped these deletions to evolutionary trees that describe the relationship among these primates. From here the authors identified those molecular evolutionary events that where driven by positive natural selection. The surprising result is that some of the genes identified have been associated with psoriasis and Crohn’s disease. This observation begs the question about drug development. How often is drug development directed and reversing the effects of the modern environment because those ancient alleles are no longer adaptive? [Thanks to David Nickle]