In science, a pendulum swings as new discoveries are made and old hypotheses proven false. Unfortunately, the arc of this swing is often unrelated to the facts, but more tied to the prevailing views of what is and what should be. With incomplete information, the pendulum may swing too far in one direction – for example, towards the view that genome-wide association studies (GWAS) will identify the vast majority of genetic risk for complex traits in relatively small cohorts (now defined humbly as tens-of-thousands of case-control samples). After an initial wave of discoveries – or lack thereof – the pendulum swung too far in the other direction: disease-associated variants from GWAS cannot explain most of the estimated heritability in complex traits, therefore rare variants of large effect must be the root genetic cause of complex traits.
Too often, science creates an artificial mirror image of data interpretation. If one hypothesis is not true, then the opposite must be true. If it is not common variants, then it must be rare variants; if it is not genetics, then it must be epigenetics; if it is not the host, then it must be the microbiome; and so forth. Too often, incomplete data to support one model results in a knee-jerk reaction towards an orthogonal model, even if there is little evidence to support the model. Lack of evidence to support one model should not be mistaken as evidence in support of another.
I fear that the pendulum again swings blindly in the genetic basis of complex traits.
Ironically, several recent studies have taken the trendy view that rare protein-coding variants do not explain much of the variance in complex traits and therefore rare variants are not important. A potentially damaging message that emerges from these studies is that it is not worthwhile to conduct large-scale sequencing studies. In a study published by Hunt et al in Nature (see here), this conclusion was drawn after an analysis of only 25 genes, or <1% of genes in the human genome, in a sample size large by today’s standards for sequencing experiments, but still underpowered to detect anything but associations of rare variants with large effect sizes. They made a remarkably broad conclusion: “Our data provide little stimulus in support of large-scale whole-exome sequencing projects in common autoimmune diseases.“
Let’s briefly review the facts pertaining to the genetic architecture of complex traits (see previous post here). First, family-based heritability estimates indicate that inherited genetics can explain between 25-75% of most complex traits. Second, most disease-associated variants that have reached a stringent level of genome-wide significance explain far less than the family-based estimates. Third, polygenic modeling of GWAS data indicates that a large portion of additional genetic signal remains buried in the proverbial haystack, and that much of this signal is due to common alleles of small (but not infinitesimal) effect size (see our polygenic modeling Nature Genetics paper here). And fourth, candidate and whole exome sequencing studies in sample sizes powered to detect rare variants of large effect have identified reproducible associations, but not of the scale to explain the remaining heritability derived from family-based estimates.
Regarding the fourth point…why might this be? The potential sources of “missing heritability” remain unchanged: common alleles of small effect; rare alleles of small to modest effect; over-estimates of family-based heritability studies; other forms of genetic variation not captured by contemporary GWAS arrays or exome-sequencing technology; etc. The candidate gene sequencing studies that have failed to identify rare alleles of large may have “guessed” at the wrong genes; geneticists have a poor history of picking good candidate genes, after all. What is more likely, in my opinion, is that disease-associated rare alleles exist for complex traits, but the effect size of these rare variants is modest. Based on population genetics theory, as well as anecdotes in other diseases, it seems likely that many genes will harbor a series of rare variants that contribute to complex disease risk. The uncomfortable fact is that it will likely require extremely large patient collections to arrive at a convincing level of statistical significance.
To keep things in perspective, there are a few important points to make. First, if the goal of human genetics is to uncover biological pathways that cause complex diseases, then the focus on “missing heritability” is misguided. The focus should be on biology, as insights into biology will lead to new drugs and biomarkers. In another issue of Nature (see here), a study performed a large-scale sequencing study in patients with low bone mineral density (osteoporosis) and uncovered a rare nonsense mutation within the LGR4 gene. They draw a strikingly different conclusion: “Our results highlight the value of human genome sequence information in the context of rich phenotypic information from which the effects of rare deleterious mutations can be directly assessed in humans, creating a human model of physiological disturbance or disease.”
Second, there is an unintended consequence of broad conclusions that focus on “missing heritability” alone – it is not worthwhile to perform large-scale sequencing studies. These conclusion may serve the authors well in publishing their paper – it represents a dramatic statement – but the conclusions extend into the minds of some (e.g., those who control funding) who may not be able to discern nuanced arguments between biology and heritability. Let’s not forget that large-scale sequencing studies are only now happening, and that the methods to design and interpret sequence data are under continuous development (see review here).
And third, it is critical to do the right experiment to arrive at sound scientific conclusions, not hyperbole. The right experiment is to perform genome-scale sequencing in extremely large patient collections, and to make those data available to the public for analysis. Yes, I understand that in a world of limited resources, we cannot do all possible experiments, and that the experiment I just described is extremely expensive. But we need to be very careful of making strategic decisions based on incomplete information that are in favor of the opposite viewpoint (i.e., the mirror image; the other arc to the pendulum’s swing; the orthogonal concept).
Over time, the model that there are few rare variants that contribute to risk of disease may prove correct, as suggested by Hunt et al. This would be inconsistent with population genetics theory – but there are admittedly assumptions that go into most of these models that may not prove to be correct. But rather than draw broad conclusions based on incomplete information, let’s do the right experiment to arrive at truth. The human race deserves this chance.
It is remarkable how far the field of human genetics has come over the last decade. It would be a shame to fall short of our goal of complete information.
Let’s not let the pendulum swing blindly.