The value of genetics to clinical prediction depends upon the underlying genetic architecture of complex traits (including disease risk and drug efficacy/toxicity). It is increasingly clear that common variants contribute to common phenotypes, but that extremely large sample sizes are required to tease apart true signal from the noise at a stringent level of statistical significance. Occasionally, common variants have a large effect on common phenotypes (e.g., MHC alleles and risk of autoimmunity; VKORC1 and warfarin metabolism), but this seems to be the exception rather than the rule.
A recent paper published in Nature Genetics explores this concept in more detail (download PDF here). As stated in the manuscript by Chatterjee and colleagues: “The gap between estimates of heritability based on known loci and those estimated owing to the comprehensive set of common susceptibility variants raises the possibility of substantially improving prediction performance of risk models by using a polygenic approach, one that includes many SNPs that do not reach the stringent threshold for genome-wide significance.” They measure the ability of models based on current as well as future GWAS to improve the prediction of individual traits.
The results, which are intriguing, depend not only on the underlying genetic architecture (which is often unknown, especially for PGx traits), but also disease prevalence and familial aggregation: “We observed that for less common, highly familial conditions, such as T1D and Crohn’s disease, risk models that include family history and optimal polygenic scores based on current GWAS can identify a large majority of cases by targeting a small group of high-risk individuals (for example, subjects who fall in the highest quintile of risk). In contrast, for more common conditions with modest familial components, such as T2D, CAD and prostate cancer, risk models based on GWAS with current sample sizes (N) or foreseeable sample sizes in the near future (for example, 3N) can miss a large proportion (>50%) of cases by targeting a small group of high-risk individuals. For these common diseases, polygenic models using current GWAS data can identify a small minority of the population with elevated risk.”
For all traits, the area under the curve (AUC) of the genetic model alone outperformed the family history model alone (see Table 2). Since family history is included in clinical decision making today, this suggests that genetic data — if easily available and readily interpretable — would also be used for clinical decision making. As genetic data becomes incorporated into routine care, it is not unreasonable to assume that this will indeed be reality. (For example, see the recent contract between the VA and Personalis).
One challenge is the ability to generate polygenic models in the first place. The method developed and implemented by Chatterjee et al relies on estimates of the underlying genetic architecture (see Table 1). This method, and methods that we and others have developed (see here), works well for traits in which extremely large GWAS and genome-wide significant results are available. For most other traits — and especially PGx traits — the underlying genetic architecture is unknown.
Here are a few of my predictions about the underlying genetic architecture of complex traits, which put into perspective the predictive clinical utility of various polygenic methods (with attention to PGx traits):
1. Most common phenotypes (e.g., disease risk, drug efficacy) will have a genetic architecture similar to the 10 traits described in the Chatterjee et al publication. That is, most will be highly polygenic, with a substantial contribution of common variants of small effect sizes. To uncover these variants, large GWAS are required, together with methods that integrate other genomic data to help tease apart the signal (“causal alleles”) from the noise (everything else). As the variants are uncovered, realistic predictions can be made about the value of clinical prediction using methods such as those described by Chatterjee et al. An informed decision can be made as to whether such information will impact clinical care. For some phenotypes, including PGx traits such as drug efficacy, even modest predictive models may have value in influencing the decision to select one drug over another.
2. Occasionally, common phenotypes will have a variant (whether common or rare) with a large effect size, as is the case for MHC alleles and autoimmunity and VKORC1 and warfarin metabolism. A simple GWAS in moderately-sized sample collections should always be done to search for such variants. However, even if these variants are discovered, it does not indicate that the remaining genetic architecture for that common trait will follow a similar pattern. This has been borne out in autoimmunity, where despite large effect sizes for alleles in the MHC, the majority of the remaining disease-associated variants follow a pattern more typical of other complex traits (small effect sizes).
3. Rare phenotypes are different, especially those that show high familial aggregation. Whole genome sequencing in families has been shown to be very effective at identifying variants for rare traits, with many variants being rare in the general population.
4. For rare adverse drug events (ADE’s), it is often difficult to obtain familial aggregation data, as the phenotype requires drug exposure. Still, it is reasonable to assume that rare PGx traits will mirror the architecture of rare diseases. It is also reasonable to assume that, occasionally, rare ADE’s will be the result of common variants of large effect size, as human evolution has not “seen drugs”, which minimizes negative evolutionary selection to drive down allele frequency. This has been seen for some phenotypes, such as lumiracoxib and carbamazepine toxicity (see here and here).
5. Both common and rare phenotypes likely have rare and low-frequency variants that contribute to overall heritability. However, the available data, which is still limited, suggest that the relative contribution of rare variants to common phenotypes will be modest at best. Sequencing studies will ultimately be required to empirically test the contribution of rare variants to both common and rare phenotypes.
Finally, genetics is much more than just risk prediction! Even if prediction is limited, a tremendous amount of information about biological pathways can be gained through a variety of bioinformatic and experimental analyses. In fact, I would argue that genetics is MORE about biological pathways than risk prediction. But this is a topic for another time…