Meta-analysis of Genome Wide Association Studies Identifies Genetic Markers of Late Toxicity Following Radiotherapy for Prostate Cancer

Nearly 50% of cancer patients undergo radiotherapy. Late radiotherapy toxicity affects quality-of-life in long-term cancer survivors and risk of side-effects in a minority limits doses prescribed to the majority of patients. Development of a test predicting risk of toxicity could benefit many cancer patients. We aimed to meta-analyze individual level data from four genome-wide association studies from prostate cancer radiotherapy cohorts including 1564 men to identify genetic markers of toxicity. Prospectively assessed two-year toxicity endpoints (urinary frequency, decreased urine stream, rectal bleeding, overall toxicity) and single nucleotide polymorphism (SNP) associations were tested using multivariable regression, adjusting for clinical and patient-related risk factors. A fixed-effects meta-analysis identified two SNPs: rs17599026 on 5q31.2 with urinary frequency (odds ratio [OR] 3.12, 95% confidence interval [CI] 2.08–4.69, p-value 4.16 × 10− 8) and rs7720298 on 5p15.2 with decreased urine stream (OR 2.71, 95% CI 1.90–3.86, p-value = 3.21 × 10− 8). These SNPs lie within genes that are expressed in tissues adversely affected by pelvic radiotherapy including bladder, kidney, rectum and small intestine. The results show that heterogeneous radiotherapy cohorts can be combined to identify new moderate-penetrance genetic variants associated with radiotherapy toxicity. The work provides a basis for larger collaborative efforts to identify enough variants for a future test involving polygenic risk profiling.


Introduction
Radiotherapy is used in the treatment of up to 50% of cancer patients and around 40% of long-term cancer survivors underwent radiotherapy at some point in their treatment. For example, approximately half of the 1.1 million men diagnosed with prostate cancer worldwide each year receive radiotherapy, and the 5-year relative survival rates approach 100% for non-metastatic disease (Howlader et al., 2013). Although modern treatments minimize radiation doses to surrounding normal tissues, some men develop long-term toxicity (Bentzen et al., 2010). The risk of severe toxicity limits doses, which aim to keep the prevalence below 5%. Mild and moderate effects are common (10-50% of those treated) (Alemozaffar et al., 2011;Dearnaley et al., 2012;Heemsbergen et al., 2006;Kneebone et al., 2004;Resnick et al., 2013;Syndikus et al., 2010), impact negatively on quality-of-life, and are an important factor when men consider treatment options (Davison et al., 2002).
There is a need for a test that reflects a cancer patient's radiosensitivity and predicts the likelihood of toxicity. Many assays have been explored but none proved sufficiently reliable for clinical application. Over the past 15 years interest increased in identifying the genetic variants associated with risk of toxicity. The rationale behind the work is that a future test based on a germline polygenic risk score will not suffer from the poor reproducibility associated with other assays measuring radiosensitivity (Barnett et al. 2015).
Mutations associated with well-characterized radiosensitivity syndromes such as ataxia telangiectasia (Taylor et al., 1975) are rare and do not explain the general inter-individual variation in toxicity following radiotherapy (Safwat et al., 2002). Rather, it is hypothesized that common genetic variants, such as single nucleotide polymorphisms (SNPs) account for most of the heritability of radiosensitivity (West and Barnett, 2011). Studies have begun to identify common variants associated with radiotherapy toxicity. Candidate gene studies showed rs2868371 in HSPB1 (MIM 602195) (Lopez Guerra et al., 2011;Pang et al., 2013) and rs1800469 in TGFB1 (MIM 190180) (Guerra et al., 2012) are associated with late effects of lung radiotherapy; and rs1800629 in TNF (MIM 191160) (Talbot et al., 2012) and rs1139793 in TXNRD2 (MIM 606448) (Edvardsen et al., 2013) are risk SNPs for late toxicity following breast radiotherapy. Genome-wide association studies (GWAS) identified a locus on chr11q14.3 associated with rectal bleeding (Kerns et al., 2013b) and a locus on chr2q24.1 within TANC1 (MIM 611397) associated with overall toxicity  following radiotherapy for prostate cancer. Another study showed more associations at the p-value b 5 × 10 −7 level than expected by chance, providing the strongest evidence to date that many common genetic variants are associated with risk of toxicity .
Recently published GWAS have limitations that we aimed to overcome by using a meta-analysis approach. The published studies used a multi-stage approach, where a small first-stage cohort was analyzed for a genome-wide panel of SNPs and only the most significant SNPs were genotyped in validation datasets. Thus, true positive SNPs were likely missed because they were not tested in the full set of individuals. Here, the Radiogenomics Consortium (West and Rosenstein, 2010) undertook a meta-analysis of four GWAS in order to maximize statistical power (Cohn and Becker, 2003) to discover additional risk variants. It is known that risk factors for late toxicity include not only genetics but also dosimetric parameters, co-morbidities, and patient demographics (Barnett et al. 2009). The latter factors can vary between cohorts as can the treatment (e.g. in prostate cancer: external beam or brachytherapy; type of fractionationlarge or small doses per fraction; variable use of hormone therapy; variable use of surgery) and scales used to assess toxicity. There were concerns, therefore, whether the heterogeneity across cohorts might limit our ability to identify variants.
This study is important because our ability to identify enough SNPs for a risk profile for clinical implementation is dependent on combining multiple heterogeneous cohorts. The aim was to show that multi-center radiotherapy cohorts could be harmonized and analyzed to identify risk SNPs by increasing the number of individuals analyzed in a single stage (Skol et al., 2006). STROGAR guidelines  for reporting radiogenomic studies, which build on the STREGA and STROBE guidelines (Little et al., 2009;von Elm et al., 2007), were followed throughout.

Participants
The four cohorts (RAPPER, RADIOGEN, Gene-PARE, and CCI) comprised individuals with adenocarcinoma of the prostate treated with radiotherapy with curative intent. Table 1 shows the number of individuals in each cohort the number with genome-wide SNP data available, and the final number included in the GWAS meta-analysis after excluding samples for quality control or due to missing data. Informed consent was obtained from all study participants and all studies conform to standards indicated by the Declaration of Helsinki. RAPPER (UKCRN1471; n = 727) (Burnet et al., 2006) was approved by the Cambridge South Research Ethics Committee (05/Q0108/365). Individuals received neoadjuvant androgen suppression and external beam radiotherapy, (EBRT): 233 from MRC RT01 (ISRCTN47772397) (Sydes et al., 2004) and 494 from CHHiP (ISRCTN97182923) (Dearnaley et al., 2012). RADIOGEN (n = 741) was approved by the Galician Ethical Committee. Individuals received conformal radical or post-prostatectomy EBRT at the Clinical University Hospital of Santiago de Compostela, Spain (Fachal et al., 2012), and 511 individuals had hormone therapy.
Gene-PARE (n = 895) (Ho et al., 2006) was approved by the Mount Sinai Medical Center Institutional Review Board. Individuals had  . Of the 597 participants in RADIOGEN, one lacked data on rectal toxicity and seven data on rectal volume and were excluded from the analysis of rectal bleeding. 120 participants had no data on baseline urinary frequency and 119 were missing data on decreased urine stream and were excluded from the respective analyses. b Of the 290 participants in Gene-PARE, 55 were lacking data for rectal volume and were excluded from analysis of rectal bleeding. 35 participants did not complete the urinary questionnaire and were excluded from analysis of urinary frequency and decreased urine stream. c Of the 150 participants in CCI, 15 were lacking data for rectal volume or diabetes and were excluded from analysis of rectal bleeding. brachytherapy with/without EBRT at the Mount Sinai Hospital, New York, and 472 received hormone therapy. 125 I (160Gy; TG-43) was used in those undergoing brachytherapy alone and 103 Pd (124Gy) in those also receiving EBRT (Stock et al., 1995;Stone et al., 2003). The CCI cohort (n = 155) (Kerns et al., 2013b) was approved by the Health Research Ethics Board of Alberta (Cancer). Individuals were recruited from the Cross Cancer Institute in Edmonton and the Tom Baker Cancer Centre in Calgary, Alberta, Canada. All 155 individuals underwent EBRT and 82 received hormonal therapy.
Total biologic effective dose (BED)  was calculated for individuals in all four studies to compare radiation exposure across studies (α/β = 3).

Assessment of Late Radiotherapy Toxicity
Participants were assessed prospectively for toxicity (see Table S1). Toxicity was measured 1.5-2.5 years following treatment with the latest value used if more than one assessment was recorded during this follow-up window. For rectal bleeding in Gene-PARE, a 1-5 year window was allowed, because the scoring system assigns grades based on whether rectal bleeding occurs as a single incident or intermittent symptoms over time. Urinary endpoints were re-graded to harmonize across the studies (Table S2). As urinary daytime frequency and nocturia were recorded separately (RAPPER, Gene-PARE) or as a single endpoint (RADIOGEN, CCI), they were collapsed into a single endpoint in RAPPER and Gene-PARE by taking the maximum score between the two.
Changes in scores from baseline were calculated for each endpoint. Improvement from baseline (i.e. a negative change in score) was coded as 0. Table S3 lists the two-year prevalence of the toxicity endpoints in the four cohorts, including samples that were not genotyped. A Standardized Total Average Toxicity (STAT) score was calculated as described previously (Barnett et al., 2012b) to provide an overall measure of toxicity. All samples with 2-year toxicity data available and the endpoints described in Table S1 were used to calculate STAT.
Data simulation determined the most statistically powerful way to analyze toxicity: ordinal logistic regression of exact change in toxicity grade from baseline; binary logistic regression with no change in toxicity grade compared with a ≥1 point increase in toxicity grade; or binary logistic regression with change in toxicity grade ≤ 1 point compared with a ≥2 point change. Genotype and toxicity phenotype data for 600 individuals were simulated under two models: (1) null (no association), and (2) alternative model of linear association characterized by a loglinear per-allele increase in odds of toxicity. Under both types of model, genotypes were randomly assigned to each subject according to one of five minor allele frequencies (MAF): 0.05, 0.15, 0.25, 0.35 or 0.45. To create the null data set, a toxicity grade was also randomly assigned to each subject. Datasets representing a log-linear association between genotype and toxicity were created by assigning each subject a toxicity risk score that was calculated using their genotype and a pre-specified effect size (log(odds ratio) = log(1.5)): for subjects i = 1 … n; β the pre-specified effect size; (2× MAF × β) the population average. A random risk was also added to each score, representing non-genetic factors that may influence toxicity, to create a logistic distribution of toxicity risk. This logistic distribution was then used to assign a toxicity grade to each subject such that subjects lying at the high end of the risk distribution had higher toxicity grades than those at the low end of the distribution. Under both types of model, two phenotype distributions were simulated to reflect the distributions of urinary frequency and rectal bleeding observed in RAPPER.
Each simulation was repeated 100,000 times. Three regression models were fitted to each set of simulated data, with toxicity grade as the outcome variable and genotype as the independent variable. Each model produced a p-value for comparison.
The power of each type of model applied to the log-linear association data was assessed by tabulating the number of p-values (out of 100,000) achieving statistical significance. Applying a simple Bonferroni correction to the observed p-values to account for 100,000 test repeats, a test was considered statistically significant if p-value b 5 × 10 −6 . The model that detected the most significant associations was deemed most powerful. p-Values obtained from analyses of the null data sets were assessed for type I error.
Results from 100,000 simulations showed that both ordinal logistic and binary (no versus ≥1 point change) logistic regression were equally powerful statistical approaches (Table S4), and binary logistic regression (no change in toxicity grade compared with a ≥1 point increase in toxicity grade) was used for the primary analysis. No model displayed higher-than-expected type I errors. Data simulation was performed using R (R Core Development Team, 2014) with the Ordinal package (Christensen, 2013).

Genotyping, Quality Control and Imputation
Germline DNA was genotyped using commercial SNP arrays as part of previously completed GWAS Fachal et al., 2014;Kerns et al., 2013a;Kerns et al., 2013b;Kerns et al., 2013c). Quality control filtering removed SNPs that were missing in N2.5% of samples, had MAF b1% or displayed frequencies deviating from those expected under Hardy-Weinberg Equilibrium (p-value b 10 −6 ). Samples with N5% of SNPs missing, showing cryptic relatedness, showing excess heterozygosity or being principle component analysis (PCA) outliers were removed (Table 1). After filtering, the four datasets had genotyping rates N99%. To minimize potential confounding by population structure, individuals with non-European ancestry (Table 1) were excluded using PCA performed with samples of known ancestry from the International HapMap Project (International HapMap, 2003). Imputation using IMPUTE2 software (Howie et al., 2009) with the 1000 Genomes phase I, version 3 (release date 3/16/2012) reference panel (1092 individuals from all 14 populations) (Genomes Project et al., 2012) yielded~38 million SNPs, small insertions and deletions, and structural variants in each study. These datasets were filtered as above in addition to removing SNPs with information score, a measure of imputation certainty, b 0.3. The final datasets included in the GWAS meta-analysis were: 6,672,177 SNPs in 527 RAPPER participants; 6,767,156 SNPs in 597 RADIOGEN participants; 6,627,946 SNPs in 290 Gene-PARE participants; and 6,504,337 SNPs in 150 CCI participants.
SNP rs17599026 was directly genotyped in the RADIOGEN and RAP-PER samples using a TaqMan assay (forward primer sequence TGCTCATGATGAAGGTATGCTTTCT, reverse primer sequence ACAAAACTGTATTCCCAAGACAAAGC and probe sequences CACCATCCTAAAGCAGTG plus ACCATCCTAAAACAGTG; Applied Biosystems, Thermo Fisher Scientific) according to the manufacturer's standard protocol (PCR annealing and extension performed at 60°C for 1 min × 40 cycles). It was not possible to design a TaqMan assay with high specificity for rs7720298 as it is immediately adjacent to another SNP (rs7720176).

Statistical Analysis
Individual GWAS were analyzed using binary logistic (individual toxicity endpoints) or linear (STAT) regression, adjusting for non-genetic risk factors identified by QUANTEC (Michalski et al., 2010;Viswanathan et al., 2010). These were: age at treatment, diabetes, rectal volume and BED in analysis of rectal bleeding; age at treatment, transurethral resection of the prostate (TURP) prior to radiotherapy, baseline symptom grade and BED in analysis of urinary endpoints; and all factors in analysis of STAT. Hormone therapy was also included in the Gene-PARE and CCI analyses because baseline symptoms were assessed prior to completion of hormone therapy. Multivariable regression was performed using SNPTEST software (Marchini et al., 2007) with the frequentist test and expected method. The expected method uses the genotype dosages produced from imputation, which account for the uncertainty in using imputed genotypes rather than experimentally determined genotypes. Thus, inaccuracy related to using imputed data was accounted for in the statistical analysis. An additive genetic inheritance model was assumed in all analyses.
A fixed-effects meta-analysis with inverse variance weighting was performed using the regression beta coefficients and standard errors produced by SNPTEST. Studies have shown fixed-effects to be more statistically powerful compared with random-effects when the primary purpose is SNP discovery rather than refinement of the effects size estimate (Evangelou and Ioannidis, 2013). A chi-squared test of heterogeneity was performed for each SNP. SNPs considered significant had a meta-pvalue ≤ 5 × 10 −8 and agreement in effect direction across all cohorts. Significant SNPs were re-analyzed using an ordinal logistic regression model.
Regions of linkage disequilibrium were defined based on r 2 ≥ 0.5 in SNPs from the 1000 Genomes European population with the meta-analysis top hits using Haploview software (Barrett et al., 2005). The most recent release of the 1000 Genomes data (version 5, release date 5/2/ 2013) was used to estimate linkage disequilibrium.

Power Calculations
Power calculations were performed using the web-based Genetic Power Calculator (Purcell et al., 2003) assuming a sample size of 1564 individuals, a phenotype prevalence of 20%, and type I error of 5 × 10 −8 . Table 2 summarizes the cohort characteristics and Table 3 the toxicity prevalences of the 1564 men treated for prostate cancer who were included in the analysis. Two years following radiotherapy 17.8% (277 of 1557) of individuals experienced rectal bleeding, 15.0% (212 of 1410) an increase in urinary frequency, and 8.1% (101 of 1245) a decrease in urine stream. The meta-analysis had ≥ 99% power to detect SNPs with MAF ≥ 10% and per-allele effect size ≥ 2 assuming a genome-wide significance threshold of 5 × 10 −8 (Table 4).

Results
Meta-analysis Q-Q plots ( Fig. S1) show no genomic inflation, suggesting that population structure was adequately controlled using principal components analysis to exclude outliers with non-European ancestry. The most statistically significant SNPs associated with late toxicity are listed in Tables 5-8, which show several SNPs had meta-pvalues reaching or approaching significance (p-value ≤ 5 × 10 −6 ) with concordant effect direction across the individual studies. Three SNPs reached genome-wide significance: rs17599026 on 5q31.2 with urinary frequency (3.12, 95% CI 2.08-4.69, p-value 4.16 × 10 −8 ); rs7720298 on 5p15.2 with decreased urine stream (OR 2.71, 95% CI 1.90-3.86, pvalue = 3.21 × 10 −8 ); and rs11230328 on 11q12.2 with STAT score (Beta 0.31, 95% CI 0.21-0.41, p-value = 6.27 × 10 −10 ). rs17599026 was directly genotyped in the GenePARE and CCI studies and imputed in the RADIOGEN (information score = 0.90) and RAPPER (information score = 0.87) studies. rs7720298 was imputed in all four studies (RADIOGEN information score = 0.91, RAPPER information score = 0.93, GenePARE information score = 0.94, and CCI information score = 0.95) and rs11230328 was imputed in all four studies (RADIOGEN information score = 0.68, RAPPER information score = 0.61, GenePARE information score = 0.88, and CCI information score = 0.97). To provide a more interpretable effect size for rs11230328, STAT score was dichotomized at the mean and analyzed using logistic regression, which resulted in an OR of 1.59 (95% CI 1.25-2.02). There were no genome-wide significant SNPs associated with rectal bleeding, and though several approached significance, these require investigation in future larger studies. Fig. 1 shows forest plots for the SNPs that reached the genome-wide significance threshold.   In order to confirm the quality of the imputation, rs17599026 was genotyped directly in the RADIOGEN and RAPPER samples. Genotyping was successful in 645 of 654 RADIOGEN samples (correlation between the imputed and genotyped data of 0.88) and 595 of 600 RAPPER samples (correlation between the imputed and genotyped data of 0.85). These results confirmed that the genotypes are in strong agreement with the imputed data. When the direct genotype data was used in analysis, rs17599026 showed an association signal consistent with that seen using the imputed data, though the p-value fell short of the strict threshold for genomewide significance (OR 2.48, 95% CI 1.68-3.63, p-value = 3.66 × 10 −6 ).
Associations between the top SNPs (meta-analysis p-value ≤ 5 × 10 −6 ) and toxicity severity were modeled using ordinal logistic regression (Table  9). For all SNPs, the p-value from ordinal logistic regression was similar to that from binary logistic regression. For several SNPs the association signal improved slightly with the ordinal logistic model: rs12497518, rs141044160, rs6999859, and rs4804134 associated with rectal bleeding; rs17599026, rs7366282, rs4534636, rs10101158, rs10209697, rs6003982, and rs7356945 associated with urinary frequency; rs7720298, rs673783, rs62091368, and rs144596911 associated with decreased urine stream.
Regions of linkage disequilibrium (LD) were defined for the three loci tagged by SNPs reaching genome-wide significance. rs17599026, associated with urinary frequency, tags a 106 kb region of LD (base position 137,657,783-137,763,798; Fig. 2A) containing part of KDM3B (including the promoter region through exon 20), the upstream FAM53C, and part of the upstream CDC25C (the promoter region through exon 6). rs7720298, associated with decreased urine stream, tags a 39 kb region of LD (base position 13,858,328-13,897,362; Fig. 2B) that contains exons 16 through 30 of DNAH5. rs11230328 was not in strong linkage disequilibrium with any other common SNPs found in the more recent release of the 1000 Genomes population data, and this locus may represent a spurious association. rs11230328 lies within a LINE element, and may be difficult to map in the genome due to its location within a region of high homology. Imputation coverage (i.e. the number of SNPs successfully imputed in the study datasets out of common SNPs (MAF ≥ 0.05) within the 1000 Genomes European population) was high within the regions tagged (correlation r2 ≥ 0.5) by rs17599026 (173/183, 94.5%) and rs7720298 (130/136, 95.6%).
SNPs rs17599026 and rs7720298 are located in non-coding regions, as is common in GWAS, and the LD blocks tagged by these SNPs cover coding and non-coding regions. rs17599026, associated with urinary frequency, lies in an intronic region 23 bp downstream of exon 20 of KDM3B (MIM 609373; NM_016604.3:c4753+23CN T) encoding the lysine-specific demethylase 3B protein. KDM3B is highly expressed in bladder tissue from the Human Protein Atlas project (Uhlen et al., 2015), suggesting that this gene could be involved in normal bladder function and potentially dysfunction following damage from radiation exposure. rs17599026 itself does not lie in a site of known transcription factor binding or chromatin modification from the Encyclopedia of DNA Elements (ENCODE) catalog (ENCODE Project Consortium, 2012) and there are no significant expression quantitative trail loci (eQTLs) for this SNP in the Genotype-Tissue Expression (GTEx) project (GTEx Consortium, 2013). Given that rs17599026 is very close to exon 20, it could have an effect on splicing. However, no significant splicing motif alteration was detected using Human Splicing Finder (Desmet et al., 2009). ENCODE data show that the large LD block tagged by this SNP contains multiple transcription factor binding sites, DNase hypersensitive sites, histone methylation sites (methylation of lysine 4 at histone 3, H3K4Me3 and methylation of lysine 4 at histone 1, H3K4Me1), and histone acetylation sites (acetylation of lysine 27 at histone 3, H3K27Ac) that may affect regulation of the nearby genes. rs7720298, associated with decreased urine stream, lies in an intronic region just downstream of exon 30 of DNAH5 (MIM 603335; NM_001369.2:c.4950 +1233GN C) encoding the dynein, axonemal, heavy chain 5 protein that is part of a microtubule-associated motor protein complex. Rare mutations in DNAH5 can result in development of abnormal cilia and flagella in cells that lead to primary ciliary dyskinesia, Table 5 Rectal bleeding multivariable analysis results (p b 5 × 10 −6  which is a disorder characterized in part by chronic respiratory tract infections (Escudier et al., 2009). In addition to playing an important role in the lung, DNAH5 is expressed in both kidney and bladder tissue, suggesting a biologic role in normal function of the urinary tract (Uhlen et al., 2015). This SNP does not lie in a site of known transcription factor binding or chromatin modification from the ENCODE catalog nor is it an eQTL based on data from GTEx, but the region of LD tagged by rs7720298 contains several transcription factor binding sites and sites of DNase hypersensitivity measured in ENCODE cell lines, suggesting that it may tag a site of transcriptional regulation.

Discussion & Conclusions
This meta-analysis aimed to identify SNPs associated with late radiotherapy toxicity in a single tumor site. By having a total sample size in a single stage analysis of N1500 men with prostate cancer (versus~600 in published studies), the study identified two risk loci. This study had ≥99% power to identify common SNPs (MAFs ≥10%) that confer a relatively large increased risk for developing late toxicity (OR ≥2.0). Identification of multiple loci in this single-stage meta-analysis versus single loci in published GWAS that used a staged approach is consistent with the increased power. The meta-analysis showed that heterogeneity in radiotherapy datasets is not a barrier for future multi-cohort radiogenomic studies. The absence of significant SNPs associated with rectal bleeding is consistent with the relatively limited statistical power. Most common variants identified via GWAS have more modest effects (ORs 1.15 to 1.5) than this study was powered to detect. Our group previously reported an excess of associations at the p b 5 × 10 −7 level for rectal bleeding showing many SNPs should be identified as sample sizes increase . In addition, our current approach of dichotomizing toxicity at grade 0 versus grade 1 or worse and considering two years of follow-up may not be optimal for rectal bleeding, which could be explored as our radiogenomic cohorts increase.
The significant SNPs lie in non-coding regions, as is common in GWAS where~64% of identified SNPs lie in enhancer regions (Hnisz et al., 2013). Gene enhancer elements are noncoding segments of DNA involved in regulating transcriptional programs (Corradin and Scacheri, 2014). Their location is predicted by DNase I hypersensitive regions of open chromatin flanked by sties of histone methylation. These enhancers can be active (associated with the acylation of lysine 27 of histone 3 [H3K27Ac]) or repressed (correlated with histone marks H3K27Me3 and H3K9Me3). Although premature to speculate on the functional consequences of these SNPs, they lie in intronic regions of genes that are expressed in tissues exposed to radiation during treatment of prostate cancer and are involved in the symptoms experienced by men having radiotherapy for this malignancy. Fine mapping studies and functional characterization of causal variants contained within these loci should expand our understanding of the biology underlying the pathogenesis of radiotherapy toxicity.
rs17599026 was genotyped to check the quality of the imputed data from RADIOGEN and RAPPER. Direct genotyping confirmed the high quality of the imputation, but we were unable to genotype directly the other two SNPs for which imputed data were used for all four cohorts. rs7720298 had high imputation information scores (N0.95) in all three studies of urinary decreased stream, lending confidence to this association. However, rs11230328 had a lower imputation quality (b0.70) in two of the four studies of overall toxicity, and it is not present in the latest 1000 Genomes Project release (v5a; 05/02/2013). Thus, rs11230328 may represent a spurious association.
None of the previously studied candidate gene SNPs for radiotherapy toxicity were among the top loci identified in this meta-analysis, consistent with a recent validation study showing no evidence for association between 92 candidate SNPs and radiotherapy toxicity among~1600 individuals treated with radiotherapy for breast or prostate cancer (Barnett et al., 2012a). No SNP discovered in previous GWAS was Table 7 Decreased stream multivariable analysis results (p b 5 × 10 −6   identified in this de novo analysis, likely due to methodological differences. The published locus on chr2q24.1, identified in RADIOGEN and replicated in RAPPER and Gene-PARE , was excluded because the MAFs of the tagging SNPs were b5%, the threshold set in this meta-analysis to maximize statistical power. Subsequent to the meta-analysis, we tested SNPs in the chr2q24.1 locus for replication in CCI (the only cohort not included in the initial publication) using the methods of the originally published paper. All seven previously reported SNPs showed a consistent direction of association with overall toxicity in CCI, and rs264651 showed a statistically significant association (linear regression beta = 0.66; 95% CI 0.06, 1.27; p-value = 0.03). The locus was also tested for association with the three individual toxicity endpoints investigated in this paper, and appeared to be predominantly associated with urinary toxicity (Table S5). SNPs rs7582141, rs6432512, rs264588, and rs264631 were significantly associated with an increased risk for decreased urinary stream and increased urinary frequency (meta-analysis p-values b 0.05, Table S5). The directions of the associations were consistent across the other three SNPs, with rs264663 and rs264651 reaching statistical significance for urinary frequency. The direction of the association of six of the seven SNPs was consistent for the rectal bleeding endpoint, but none of the SNPs approached statistical significance, suggesting that the urinary endpoints are predominantly driving the association of this locus with overall toxicity. This finding thus supports the initial report that chr2q24.1 is a risk locus for late radiotherapy toxicity in prostate cancer. rs7120482 on chr11q14.3 was previously identified in Gene-PARE and replicated in a combined dataset that included RADIOGEN and CCI (Kerns et al., 2013b), but the association with rectal bleeding was restricted to a recessive inheritance model. In this meta-analysis, dominant and recessive models were not considered in order to avoid an increased burden of multiple comparisons correction. Subsequent to this meta-analysis, we tested rs7120482 for replication in RAPPER (the only cohort not included in that initial publication) using the methods of the originally published paper. The initial association with rectal bleeding was not replicated in RAPPER (odds ratio = 0.74; 95% CI 0.08, 7.33; p-value = 0.79). This lack of replication could be due to differences in the radiotherapy regimen or scoring of toxicity -the incidence of rectal bleeding was low in RAPPER (3%) compared with the other studies (10% to 18%). However, the initial association fell just short of genome-wide significance and might be a false positive result. At this early stage of radiogenomic GWAS we expect some initial associations to be lost as sample sizes increase and false-positive findings are reduced, as seen in other GWAS meta-analyses (Evangelou and Ioannidis, 2013). Finally, Gene-PARE previously identified risk SNPs for erectile dysfunction (Kerns et al.,  (rs7720298); and (c) overall toxicity (rs11230328). p-Values are from meta-analysis of regression coefficients and standard errors from regression analysis performed in each study. Odds ratios (OR; urinary frequency and decreased urine stream) or regression beta coefficient (STAT score) are shown for each individual study as well as the meta-analysis (Summary). The size of the box marking each odds ratio or beta is proportional to the precision of estimate for the given study. Lines on the boxes denote 95% confidence intervals. 'Toxicity' and 'No toxicity' (rs17599026 and rs7720298) was defined as ≥1 point increase in toxicity grade versus no change in toxicity grade from pre-radiotherapy scores. Overall toxicity was analyzed as a continuous variable (STAT). 2010; Kerns et al., 2013a), but this endpoint was not assessed in the other studies in the meta-analysis. Thus, previously identified SNPs remain important, and the meta-analysis reports additional loci. This study has several strengths. First, toxicity was assessed prospectively in all four studies, which minimizes recall bias. Second, baseline information ensures any SNP-toxicity associations identified occur because of the radiotherapy rather than any pre-existing (possibly tumor-related) symptoms. Third, a multivariable approach accounted for relevant treatment and clinical variables. Radiogenomics involves phenotypes that occur in response to an environmental exposure (i.e. therapeutic radiation) that is measurable and can be adjusted for. Fourth, reliable and current imputation methods were used to obtain comparable and dense SNP maps across the four studies, allowing meta-analysis of studies involving different SNP genotyping platforms.
Without imputation, only 15,144 SNPs were common to all four datasets, reducing the likelihood of identifying SNPs associated with late radiotherapy toxicity.
This study has limitations. First, the four studies were designed independently and involved different methods for recording toxicity. Endpoints needed harmonizing and the approach might be sub-optimal, but serves as a useful guide for future radiogenomics studies involving multiple cohorts. Although endpoints were dichotomized to minimize differences in toxicity grades across the cohorts, the simulation experiment showed minimal loss of statistical power. Second, there was the heterogeneity in radiotherapy protocols. For example, Gene-PARE included individuals who received brachytherapy with EBRT, which increases urinary toxicity (Sanda et al., 2008). Similarly, the proportion of individuals who received androgen deprivation differed across the studies. This heterogeneity was adjusted for using multivariable analysis and by a meta-analysis approach. Importantly, none of the top SNPs showed evidence of heterogeneity in effect size across studies (Tables 5-8). The first SNPs identified are those that rise above such noise in the data, and the significance of our work is that we can find variants despite imperfect datasets. We attempted to control for differences in radiation dose by adjusting for total biologically effective dose in each GWAS. However, this is a surrogate for the doses received by specific normal tissues. Future cohorts with detailed dosimetry data to the different organs at risk will improve on our ability to identify SNPs. The heterogeneity in cohorts should be embraced, as any predictive test must use SNPs that are associated with toxicity across treatment centers and protocols. Last, in order to minimize confounding by population structure, the present analysis was restricted to individuals of European ancestry. It will be important for future studies to focus on other ancestral groups, both for replication of the loci identified here and for discovery of additional loci.
While this study was successful in identifying radiosensitivity SNPs via a meta-analysis of GWAS, it is modestly sized compared with GWAS of other diseases and traits, and it should be viewed as the starting point for expanded radiogenomics studies. Given that our study was not powered to detect SNPs with odds ratio b 2, many true positive SNPs will have been missed. Also, though the SNPs identified here reach statistical significance and show consistency across multiple independent studies, there is still a possibility that they will fail to replicate in other cohorts. This is a challenge in GWAS (McCarthy et al., 2008), but meta-analysis and replication studies have proved successful in validating hundreds of risk SNPs for a wide variety of diseases and phenotypes. Future large GWAS meta-analyses will identify additional SNPs, and this work is underway. The Radiogenomics Consortium is a participant in the OncoArray Network, which is a large collaborative effort to gain new insight into the genetic architecture and mechanisms underlying several cancers as well as outcomes related to the their treatment. Approximately 5000 additional DNA samples from individuals treated with radiotherapy for prostate cancer are being genotyped using the customized OncoArray, and metaanalysis of this greatly expanded set of GWAS data using the methods developed in this paper will uncover additional risk SNPs, provide a platform for further validation of the SNPs identified here, and serve as the basis for future post-GWAS analyses on these confirmed loci.
The last decade saw a rapid expansion of knowledge of the genetics of disease susceptibility in the general population. Numerous large collaborative GWAS showed that polygenic risk profiles can be built based on multiple SNPs each conferring small effects but together a significant proportion of susceptibility to common diseases. With the rapid decline in costs for genetic testing, there is growing acceptance that risk prediction models incorporating genetic and environmental factors will be important in future healthcare provision (Chatterjee et al., 2016). In 2012 there were an estimated 32.6 million people alive five years after being diagnosed with cancer (http://www.cdc.gov/cancer/international/statistics.htm), and many will be living with the consequences of treatment. Research developing models predicting susceptibility to long-term radiation effects is important and the findings reported here shows that heterogeneous radiotherapy cohorts can be combined to identify common genetic variants associated with toxicity. The work provides a basis for larger collaborative efforts to identify enough variants for a future test involving polygenic risk profiling.

Funding Sources
This work was supported by Cancer Research UK (C1094/A11728 to CMLW and NGB for the RAPPER study, C26900/A8740 to GCB, and C8197/A10865 to AMD), the Royal College of Radiologists (C26900/