A new genetics initiative was announced today: the creation of FinnGen (press release here). FinnGen’s goal is to generate sequence and GWAS data on up to 500,000 individuals with linked clinical data and consented for recall. There are many applications for such a resource, including drug discovery and development. In this blog, I want to first describe the application of PheWAS for drug discovery and development, and then introduce FinnGen as a new PheWAS resource (see FinnGen slide deck here).
[Disclaimer: I am an employee of Celgene. The views expressed here are my own.]
PheWAS turns GWAS on its head. While GWAS tests millions of genetic variants for association to a single trait, PheWAS does the opposite: tests hundreds (if not thousands) of traits for association with a single genetic variant. This approach is primarily relevant for those genetic variants with an unambiguous functional consequence – for example, a variant associated with disease risk or a variant that completely abrogates gene function. There are useful online resources (see here), as well as several nice recent reviews by Josh Denny and colleagues, which provide additional background on PheWAS (see here, here).
Work that originated from my academic lab represents the first example of PheWAS for drug discovery – in particular, how to use PheWAS to predict on-target adverse drug events (ADEs) and to select indications for clinical trials (see 2015 PLoS One publication here). In this study, we showed that loss-of-function variants in a gene, TYK2, influences risk of multiple autoimmune diseases (e.g., rheumatoid arthritis, lupus, inflammatory bowel disease) without increasing risk of infection. A subsequent study expanded on this work and added elegant functional data to support the concept that PheWAS is useful for therapies that modulate TYK2 in a manner that mimics human alleles (see Science Translational Medicine study here). To date, there is no approved TYK2-inhibitor, but there are ongoing early clinical trials, with other programs in pre-clinical development.
Other studies have demonstrated the application of PheWAS for drug discovery. One of the first examples of PheWAS for drug repurposing was published in 2015 (here). This study identified nearly 14,800 drug-disease pairs and more than 38,000 novel candidates for repurposing. Another recent example includes a PheWAS of genetic variants implicated in Th17 pathogenesis (here).
Recently, we posted a PheWAS study on bioRxiv in nearly 700,000 individuals (here). We conducted a PheWAS using four large real-world data cohorts (23andMe, UK Biobank, FINRISK, CHOP) and harmonized clinical data from 57 published GWAS. We identified new associations that may predict adverse drug events (e.g., acne, high cholesterol, gout and gallstones for therapies that inhibit PNPLA3, or asthma for therapies that inhibit MDA5 / IFIH1).
It is worth noting that PheWAS is still in its infancy. The first publication was in Bioinformatics from Josh Denny, Marylyn Ritchie, Dana Crawford, and colleagues in 2010 (here). A search reveals only 63 additional publications on PubMed since 2010, 10 of which are reviews (here). Interestingly, there are 33 pre-print publications on bioRxiv (here), which suggests that many new studies will soon appear in traditional journals following peer review. Many of these preprints are analytical methods by prominent statistical geneticists such as Marylyn Ritchie and Sarah Pendergrass (here), Goncalo Abecasis (here, here), and George Davey Smith (here). Google trends shows a slight uptick since the first mention in 2011, but the pattern is modest (here).
Finland has been a hotbed for genetic research for decades. The unique population history has led to certain rare diseases being significantly more common in people whose ancestors are ethnic Finns (aka “Finnish heritage diseases”). As a result, several early successes from genome-wide linkage studies in rare diseases were from Finland (e.g., Marfan’s syndrome).
Finland’s population history also enriches for certain low-frequency loss-of-function (LoF) variants that may have been purged via negative selection from other populations. These LoF variants achieve a higher frequency in Finland as compared to other populations (see here, here), which increases power to detect genetic associations.
Another advantage of Finland is that it has an integrated health care system, which provides rich and harmonized phenotype data. Approximately 90% of the population receives their care from the public system, with a single national health record, Kanta, covering nearly 80% of the population. Each individual has their own 11-digit personal number, which allows all data to be integrated across the health care system.
An example of the power of genetics in Finland comes from a 2014 PLoS Genetics study by Aarno Palotie, Mark Daly, Daniel MacArthur, and colleagues (see here). This study examined 83 LoF variants across 60 phenotypes in 36,262 Finns. The most compelling finding was that LPA splice variants confer protection from coronary heart disease (CHD): each copy of the LPA-lowering variant reduces CHD risk by ~20%.
How will FinnGen build on these past successes?
The new initiative will greatly expand the total number of people with both genotype and sequence data. Whole genome sequencing on 10,000 individuals will discover over 90% of all low-frequency LoF variants. These variants, as well as other low-frequency variants, can then be imputed into the broader population using genotyping arrays. Clinical data from health registries and longitudinal electronic health records will serve as the basis for PheWAS. At the end of the first year alone, FinnGen estimates that there will be genotype and phenotype data on at least 100,000 individuals. This number will grow by 100,000 or more each year until all 500,000 individuals have genotype and phenotype data.
But FinnGen is more than a discovery tool for drug discovery. The study should have direct benefits to patients receiving health care in Finland. “All breakthroughs that arise from the project will eventually benefit health care systems and patients both locally and globally,” says Research Director Aarno Palotie, from the Institute for Molecular Medicine Finland (FIMM) at the University of Helsinki. He adds: “With FinnGen, we can build a foundation for health innovations and personalized treatments.”
While FinnGen is unique in terms of population and genetic history, there are other large-scale biobanks aiming to provide similar genotype-phenotype tools. While a comprehensive list is beyond to scope of this blog, here I highlight a few recent announcements:
– Geisinger, which has long been a leader in this area, recently launched the Geisinger National Precision Health Initiative under the direction of Dr. Hunt Willard (see here).
– The Million Veteran’s Program, or MVP, is on track to build one of the largest medical databases within the United States (see here).
The UK Biobank deserves special attention, as it represents an excellent public resource for PheWAS. Indeed, it seems as if a new study emerges from the UK Biobank on a daily basis! One of my favorite studies is that from Manny Rivas and colleagues (see bioRxiv preprint here). They studied 18,228 protein-truncating variants across 135 UK Biobank phenotypes, leading to 27 novel associations with medical phenotypes. Another recent study, published in Lancet Diabetes and Endocrinology, used a genetic risk score to survey the frequency and phenotype of type 1 diabetes in adults (see here, here).
Genetic studies represent just the beginning for drug discovery. The field needs to move beyond simple phenotypic associations to quantitative models for predicting the impact of genetic perturbations.
For example, we recently posted a bioRxiv preprint testing 10.6 million genetic variants against levels of 2,994 proteins in 3,301 individuals (see here). This represents the largest protein quantitative trait loci (pQTLs) study to date. We identify 1,927 genetic associations with 1,478 proteins, including both cis- and trans-acting pQTLs, and describe several different applications to drug discovery and development. These pQTLs represent powerful genetic instruments for Mendelian randomization, including the ability to precisely quantitate the impact of a genetic variant on protein function.
As the amount of large-scale sequencing grows, so will the demand for high-throughput functional characterization. Most rare variants discovered by genome sequencing will remain “variants of uncertain significance”, or VUS, until deeper functional characterization is performed. High-throughput CRISPR remains one tool (see Nature Reviews Genetics here), but there are many other approaches as well.
The field also needs a more diverse therapeutic armamentarium to mimic the effects of human genetic mutations. Fortunately, progress continues to be made in fields such as gene & cellular therapy (e.g., see New England Journal of Medicine articles on gene therapy for hemophilia A here, commentary here; CAR-T editorial here; LifeSciVC “living medicines” blog here) and novel approaches to small molecule perturbations (e.g., an entire issue of the Journal of Medicinal Chemistry was devoted to protein degradation, see here).