Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2015 Jun 4;96(6):869–882. doi: 10.1016/j.ajhg.2015.04.015

Comprehensively Evaluating cis-Regulatory Variation in the Human Prostate Transcriptome by Using Gene-Level Allele-Specific Expression

Nicholas B Larson 1,, Shannon McDonnell 1, Amy J French 2, Zach Fogarty 1, John Cheville 2, Sumit Middha 1, Shaun Riska 1, Saurabh Baheti 1, Asha A Nair 1, Liang Wang 3, Daniel J Schaid 1, Stephen N Thibodeau 2
PMCID: PMC4457953  PMID: 25983244

Abstract

The identification of cis-acting regulatory variation in primary tissues has the potential to elucidate the genetic basis of complex traits and further our understanding of transcriptomic diversity across cell types. Expression quantitative trait locus (eQTL) association analysis using RNA sequencing (RNA-seq) data can improve upon the detection of cis-acting regulatory variation by leveraging allele-specific expression (ASE) patterns in association analysis. Here, we present a comprehensive evaluation of cis-acting eQTLs by analyzing RNA-seq gene-expression data and genome-wide high-density genotypes from 471 samples of normal primary prostate tissue. Using statistical models that integrate ASE information, we identified extensive cis-eQTLs across the prostate transcriptome and found that approximately 70% of expressed genes corresponded to a significant eQTL at a gene-level false-discovery rate of 0.05. Overall, cis-eQTLs were heavily concentrated near the transcription start and stop sites of affected genes, and effects were negatively correlated with distance. We identified multiple instances of cis-acting co-regulation by using phased genotype data and discovered 233 SNPs as the most strongly associated eQTLs for more than one gene. We also noted significant enrichment (25/50, p = 2E−5) of previously reported prostate cancer risk SNPs in prostate eQTLs. Our results illustrate the benefit of assessing ASE data in cis-eQTL analyses by showing better reproducibility of prior eQTL findings than of eQTL mapping based on total expression alone. Altogether, our analysis provides extensive functional context of thousands of SNPs in prostate tissue, and these results will be of critical value in guiding studies examining disease of the human prostate.

Introduction

The human transcriptome is highly dynamic, both at a population level across individuals1,2 and within individuals by cell type.3,4 Gene transcription itself is regulated by multiple transcription factors that bind to cis-regulatory elements (CREs) up- and/or downstream of the corresponding gene. SNPs that occur within these CREs can disrupt or enhance the binding affinity of the interacting proteins, or altogether new CREs can be created. These regulatory variants are known as cis-acting expression quantitative trait loci (cis-eQTLs), and the fact that they directly alter the expression of the gene in an allele-specific fashion results in transcriptional allelic imbalance (Figures 1A and 1B).

Figure 1.

Figure 1

Using ASE to Detect cis-eQTLs

(A) Example illustration of a cis-acting SNP that regulates gene expression. The width of the gray arrows indicates the magnitude of ASE.

(B) Distributions of total expression for an example gene are based on the underlying cis-eQTL SNP genotype in the population.

(C) Illustration of bioinformatics procedures utilized for quantifying TE and ASE measures with the use of RNA-seq and phased genotype data. Quantification of relative ASE was determined by the subset of total RNA-seq reads that overlapped eSNPs.

Dysregulation of gene expression has been attributed to multiple disease traits—the Human Genome Mutation Database5 (accessed September 2014) currently lists 3,024 regulatory variants responsible for inherited diseases. Although considerable effort has been dedicated to understanding the impact of protein-coding variation (e.g., missense and nonsense polymorphisms), genome-wide association studies (GWASs) and large-scale next-generation sequencing (NGS) projects often uncover intergenic trait-associated variants with no direct functional interpretation. Moreover, GWAS results have been shown to be highly enriched in regulatory regions,6 and many studies have concluded that trait-associated SNPs are likely to be eQTLs.7,8 Consequently, the identification of genetic variation that affects gene expression is fundamental to not only resolving mechanisms governing transcriptomic diversity but also functionally interpreting genetic variation relevant to disease.

Recent advancements in NGS technologies have enabled high-throughput RNA sequencing (RNA-seq) for expression studies.9 RNA-seq experiments present manifold benefits, including reduced technical variation and greater dynamic range, over traditional microarray studies for quantifying gene expression.10,11 RNA-seq also provides base-pair characterization of RNA sequences, facilitating detection of novel transcripts and multiple isoforms. However, RNA-seq processing is not without its challenges, given that mapping bias,12–14 coverage issues, outlier samples, batch effects, and unknown covariates15,16 can all limit data integrity and eQTL reproducibility.17

Direct sequencing of RNA is particularly useful for quantifying allele-specific expression (ASE), the relative proportion of transcribed RNA attributable to each homologous copy of a gene. By identifying RNA-seq reads that overlap heterozygous genetic loci, one can estimate ASE from the respective read counts supporting the reference and alternate alleles. One advantage of ASE-level data is the ability to distinguish between cis- and trans-acting eQTL effects, as indicated by ASE imbalance. Moreover, joint utilization of ASE and total expression (TE) data can improve statistical power for detecting cis-eQTLs over TE alone by the direct observation of allele-specific effects.18

Although a number of recent studies have evaluated cis-eQTL associations by using RNA-seq data, these have been generally limited to blood-based11 or lymphoblastoid cell line analyses.2 Given that the architecture of CREs is also known to be specific to cell type, the consequences of regulatory variation can vary greatly from tissue to tissue. Thus, it is of great value to further characterize cis-acting regulatory variation for individual tissue types, given that overlap between blood and primary-tissue eQTLs could be relatively poor19 and tissue-specific effects could go undetected. For the prostate, a tissue-specific eQTL analysis might be pertinent to a number of conditions and/or traits, such as prostate cancer (PRCA [MIM: 176807]) and prostate-specific antigen (PSA) levels. Androgen hormone activity is of particular relevance to not only prostate development but also prostate tumorigenesis, because androgen deprivation is a common therapeutic strategy for metastatic disease.20 The occurrence of eQTLs tagging androgen receptor (AR) DNA binding elements might be of interest for further study.

By combining dense, chromosomally phased genotyping data with RNA-seq expression profiling under rigorous quality control, we present a comprehensive evaluation of cis-acting autosomal SNP associations with differential gene expression in 471 samples of normal human prostate tissue. We examined the properties of detected cis-eQTLs within the context of genes preferentially expressed in prostate tissue, as well as their relation to prostate-specific regulatory elements. Our analysis also specifically highlights the utility of ASE data in discovering cis-eQTLs by demonstrating greater reproducibility of previous eQTL findings than of results from TE-only analyses. Finally, we examined the congruence of trait-associated SNPs with cis-eQTLs identified in our study.

Material and Methods

Sample Collection and Processing

We obtained over 4,000 normal prostate tissue samples from an archive of fresh-frozen materials collected when individuals underwent either a radical prostatectomy (RP) for low-Gleason-grade (<7) prostate cancer or a cystoprostatectomy (CP); samples were continuously stored at −80°C until use. We included CP samples in the selection criteria to mitigate concerns of field effect21 on the RP-sample gene-expression profiles from proximal tumor tissue. Some of the samples were obtained with written informed consent from participants as part of a repository established by the Mayo Clinic Specialized Program of Research Excellence for prostate cancer. The remaining samples were from surgical waste material whose subject identifiers were permanently removed prior to analyses. By this manner, the samples were anonymized. The use of all samples and the study protocol were approved by the Mayo Clinic Institutional Review Board.

From 916 samples meeting inclusion criteria, a new H&E slide was prepared from each normal tissue sample and re-examined for confirmation of the following: (1) prostate cancer was not present, (2) high-grade prostatic intraepithelial neoplasia was not present, (3) normal prostatic epithelial glands represented >40% of all cells, (4) lymphocytic population represented <2% of all cells, and (5) the normal epithelium present was from the posterior region of the prostate. A total of 565 affected samples met these inclusion criteria and were eligible for DNA and RNA processing. Sections for RNA or DNA extraction were obtained from processed frozen tissue samples, such that an H&E section was obtained at the beginning, between the RNA and DNA designated sections, and at the end. All H&E sections were re-evaluated by a pathologist according to the same inclusion criteria described above. Forty-seven samples no longer met those criteria and were removed, resulting in 518 total samples eligible for DNA and RNA extraction. DNA was extracted with the Puregene tissue-extraction protocol per the manufacturer’s recommendations. Assessment of DNA quality consisted of examining the 260/230 ratio and DNA yield. RNA was extracted with the QIAGEN miRNAeasy Mini Kit and the QIAcube instrument in accordance with the manufacturer’s instructions. RNA quality was determined by the RNA integrity number (>7) (Agilent Bioanalyzer) and the 260/280 ratio. After exclusions for quality control, 494 samples remained eligible for DNA genotyping and RNA sequencing. Sample randomization was utilized at each processing step to minimize potential batch effects.

Genome-wide Genotyping

Samples were genotyped with Illumina Infinium 2.5M bead arrays (Illumina) according to the manufacturer’s protocol. The DNA samples were randomly plated into 96-well plates, and one parent-child Centre d’Etude du Polymorphisme Humain trio from the HapMap Project was included as a duplicate for quality control. Individual SNPs were assessed for quality according to marker genotyping call rates (>95%), duplicate sample concordance, and Hardy-Weinberg equilibrium (p > 1E−5). Sample quality was assessed according to sample genotyping call rates (>95%), gender inconsistency, heterozygosity rates, and concordance with RNA-seq mRNA variant calls (>98%). In addition, PLINK22 and STRUCTURE23,24 were used for checking cryptic relatedness and population stratification, respectively. Samples with evidence of excess non-European ancestry were excluded. The presence of residual population substructure was evaluated by principal-component analysis25 (PCA) on the SNP correlation matrix, and significant eigenvalues were identified on the basis of Tracy-Widom p values < 0.05.

RNA Sequencing

RNA libraries were prepared with the TruSeq RNA Sample Prep Kit v.2 (Illumina) according to the manufacturer’s instructions. Libraries were loaded onto paired-end flow cells at concentrations of 8–10 pM for the generation of cluster densities of 700,000/mm2 with the Illumina cBot and cBot Paired-End Cluster Kit v.3 according to Illumina’s standard protocol. The flow cells were sequenced as 51 × 2 paired-end reads on an Illumina HiSeq 2000 with TruSeq SBS Sequencing Kit v.3 and HCS v.2.0.12 data-collection software. Base calling was performed with Illumina’s RTA v.1.17.21.3. A minimum of 50 million total reads per sample was considered sufficient for analysis, and 234 samples with insufficient reads were re-sequenced; if no quality issues were identified with the individual runs, the BAM files were merged for these samples. RNA-seq analysis was performed with the MAPR-Seq pipeline,26 an integrated suite of open-source bioinformatics tools, along with in-house-developed methods. Paired-end reads were aligned by TopHat 2.0.627 against the hg19 genome build (UCSC Genome Browser) with the bowtie1 aligner option. RSeQC28 software was applied for assessing RNA sample quality.

Gene-Expression Quantification

TE gene counts were generated for 23,398 genes on the basis of RefSeq annotation with HTseq29 software and iGenomes gene annotation files obtained from Illumina. Genes were declared to have detectable expression if the median read count across samples was ≥7 (i.e., ≥14 paired-end fragments). PCA and graphical methods were applied for evaluating potential gene-expression batch effects and/or clustering by surgery type. Eleven genes previously determined to be susceptible to allelic mapping bias13 were a priori excluded from analysis.

ASE Counts

RNA-seq reads are informative for ASE if they overlap one or more expressed SNPs (eSNPs), polymorphisms that are transcribed and present in the RNA. Allele-specific read counts can in turn be conceived as a vector of two separate counts attributable to each allele, the sum of which is ≤TE. To generate gene-level ASE read counts, we first chromosomally phased the genotype data by using SHAPEIT2.30 We additionally imputed genotypes with EZImputer, an in-house imputation workflow based on IMPUTE2,31 by using the 1000 Genomes phase I integrated variant set (March 2012) as a reference to improve the quantity of eSNPs (and thus ASE-informative reads). We restricted imputation to the genic regions of the genome to optimize allele-specific read mapping while minimizing the impact on multiple-testing burden. High-confidence imputed genotypes were retained on the basis of allele dosage r2 > 0.9, minor allele frequency (MAF) ≥ 1%, and genotype posterior probability > 0.5. BAM files of the aligned RNA-seq reads uniquely mappable to one of the two phased haploid genomes were produced with the asSeq18,32 package in R, resulting in two separate BAM files per sample. Following author default recommendations, we conducted additional quality-control post-processing steps in the preparation of the BAM files prior to haplotype-specific mapping, such that we required reads to be uniquely mappable and filtered all duplicate reads. We then generated gene-level ASE read counts from these BAM files by using HTSeq under settings similar to those used for the TE gene counts. An illustration of this bioinformatics procedure is presented in Figure 1C.

cis-eQTL Mapping

Association testing for cis-eQTLs was conducted with the trecase function in the asSeq R package,18,32 which tests cis-eQTL associations by using RNA-seq read-count data. The trecase function simultaneously performs three association tests for each candidate SNP-gene pair on the basis of (1) TE-only counts under a negative binomial regression model, (2) available ASE-only counts under a beta-binomial regression model, and (3) a unified joint likelihood approach that models both data types (i.e., TE and ASE). In brief, let ti denote TE for individual i for a given gene, and it is assumed to have a negative binomial distribution such that

f(ti;μi,ϕ)=Γ(tij+1ϕ)ti!Γ(1ϕ)(11+ϕμi)1ϕ(ϕμi1+ϕμi).

Mean parameter μi can accommodate adjustment covariates and offsets through a log-link function, and the significance of a candidate eQTL SNP is determined with a likelihood-ratio test. ASE analysis first requires estimates of local haplotypes hi,a and hi,b for each subject i. The ASE of local haplotype hi,a is ni,ha, the number of reads mapped to hi,a, where the total number of mappable reads for the gene region for subject i is ni = ni,ha + ni,hb. Let j be the index position of the candidate eQTL SNP, and consider individual i to be heterozygous at j (i.e., in genotype AB, B is the alternate allele). Let nij,B be the number of reads mapped to the haplotype containing SNP allele B. Then, nij,B = ni,ha if allele B is on hi,a, or nij,B = ni,ha if allele B is on hi,b. Under the null hypothesis that alleles A and B are expressed equally, πB (the proportion at which allele B is expressed) is 0.5; the alternative hypothesis is that πB ≠ 0.5. For subject i, nij,B is modeled under a beta-binomial distribution with parameters πB and dispersion parameter θ as follows:

f(nij,B|ni,πB,θ)=(ninij,B)k=0nij,B1(πB+kθ)k=0ninij,B1(1πB+kθ)k=1ni1(1+kθ).

The likelihood is then defined as the product of f(nij,B|ni, πB, θ) across all subjects i for which ni > 0. The TE and ASE models are combined in a model of the log-odds of observing reads from the same haplotype of allele A or B of the candidate eQTL SNP. An underlying assumption of the joint method is that the eQTL effect is cis-acting in order to jointly parameterize the effect of the candidate eQTL SNP across both data types. We refer the reader to the relevant articles for greater detail.18,32

As per the default recommendations of the software, we required a minimum of five individuals to both be heterozygous at the putative cis-eQTL SNP and have a sufficient quantity ( 5) of allele-specific reads to consider a gene-SNP pair for association analyses using ASE information, whereas we excluded genes occurring on the sex chromosomes from our analysis. All common (empirical MAF ≥ 1%) genotyped and imputed SNPs within 500 kb of the transcription start site (TSS) and transcription end site (TES) of a given gene, inclusive of the genic region, were considered for association with expression of that gene. We selected this threshold to accommodate degradation in phasing accuracy over large physical distances because we anticipated intervals of ∼1 Mb between phase switch errors on the basis of performance evaluation of SHAPEIT2.30

We modeled mean expression log-linearly while adjusting for cell-type composition (i.e., proportions of cells corresponding to the epithelium and lymphocytes) and used the upper quartile of a given sample’s gene expression vector as an offset to adjust for differences in total library size. It is common practice in eQTL studies to also account for latent sources of non-genetic variation via PCA.17,33 We identified a total of 14 principal components (PCs) for covariate adjustment by using a PC selection criterion requiring that the proportion of explained variation for a given PC was >1% (Figure S1; Table S1). Two PCs identified from analysis of the correlation matrix of the SNP data indicated no significant correlation with transcript abundance and were not considered for covariate adjustment. Unless otherwise noted, all effect-size estimates are reported as mean expression log-ratios, whereby the TE-only estimates correspond to the log-ratios of homozygous alternate-genotype subjects relative to homozygous reference-genotype subjects under the assumption of an additive genetic model. For the ASE-only and joint models, the effect estimates correspond to the log-ratio of the mean ASE for an alternate allele carrying a haplotype relative to the reference allele’s haplotype. Note that under a truly cis-acting regulatory mechanism, these effect definitions are equivalent (see Sun et al.32 for details).

Statistical Significance

We determined statistical significance at the gene level by considering the most significant cis-eQTL SNP association on a gene-by-gene basis. We defined an eQTL as “gene-wise” significant at a false-discovery rate (FDR) of 0.05 by using the R package fdrtool,34 an empirical null approach, after gene-specific Bonferroni adjustment for the number of individual SNPs evaluated for that gene. This approach addresses the inflated type I error rate due to multiple testing for genes in regions of high SNP density and has been shown to behave comparably to permutation-based FDR methods for eQTL studies.35 Results for each of the three association methods were evaluated separately from each other because of differences in multiple-testing burden and statistical power. Results that were determined not to be cis-acting gene-level effects (e.g., trans-acting or imprinting) as a result of discrepant TE-level and ASE-level effect estimates, on the basis of the trans-test asSeq p values below a Bonferroni-adjusted alpha level of 0.05 (p < 4.55E−9), were excluded from the joint-testing results prior to FDR evaluation because they did not meet the underlying assumptions of the joint model. The same gene-level Bonferroni correction was applied across all testing methods on the basis of the quantity of SNPs that met testing criteria a priori.

It is possible that multiple SNPs might independently regulate the expression of a given gene, and this would go undetected if only the single strongest association were reported. Moreover, multiple SNPs in perfect LD at an eQTL might unduly reduce statistical power of enrichment and replication analysis results when only the top associated SNP is reported as significant. For the purposes of downstream analyses of significant findings, we additionally declared, separately for each testing method, any SNP-gene pair to be “experiment-wise” significant under a global FDR of 0.001 for the set of all analyzed SNP-gene pair eQTL results.

Genomic Annotation

All physical map features of genes were defined by the previously mentioned Illumina gene-annotation files. DNaseI hypersensitivity (HS) sites from the analysis of prostate epithelial cell line (PrEC) data were downloaded from ENCODE (GEO: GSE29692). We additionally downloaded the ChromHMM36 analysis output of PrEC epigenetic data processed by Taberlay et al.37 (GEO: GSE57498) to characterize the epigenomic context of associated SNPs. Experimentally detected prostate epithelial cell enhancers (total and cell-type specific) were downloaded from the FANTOM5 Transcribed Enhancer Atlas (TEA).38 In-silico-predicted AR DNA binding sites were downloaded from JASPAR39 (motif MA0007.2). We defined a SNP in our genotype data as tagging an enhancer region or AR site if it was in high linkage disequilibrium (LD) with any dbSNP138 variant within these sites. All LD calculations were conducted with PLINK on the basis of 1000 Genomes Phase 1 European (EUR) subjects, and high LD was defined as R2 > 0.8. A list of prostate-specific genes, defined as preferentially (though not necessarily exclusively) expressed in prostate tissue, was obtained from the database Tissue-Specific Gene Expression and Regulation (TiGER).40 Statistically significant (p < 1E−8) trait-associated GWAS results were downloaded from the NHGRI GWAS Catalog41 (accessed May 2014).

Enrichment Analyses

Gene and SNP enrichment analyses for specific subsets (e.g., reported GWAS SNPs) were conducted with Fisher’s exact test.42 Genomic annotation enrichment in the gene-level cis-eQTLs was conducted on the basis of overlap with the nine distinct chromatin states defined in the PrEC ChromHMM output BED file. For each analysis method, the set of significant SNPs was contrasted with a null SNP set on the basis of matched MAF and distance to the TES and TSS of a tested gene. Each SNP tested for an eQTL association was assigned a MAF bin in increments of 0.05, ranging from 0 to 0.5, as well as bins related to distance from the TSS and TES of a corresponding gene for all SNP-gene pairs including that SNP. Distances were given a positive sign if they were upstream of the TSS or downstream of the TES and a negative sign otherwise. Bins were based on intervals of size 2 kb (from 0 to 10 kb), 10 kb (from 10 to 100 kb), and 100 kb (from 100 to 500 kb) in both directions. Use of both distances also implicitly accounts for gene size and within-gene versus outside-gene cis-eQTL associations. Potential null SNPs were all SNPs that did not meet experiment-wise or gene-wise significance for any gene or testing method, totaling 1,129,641 SNPs. If on occasion a significant eQTL SNP did not have a matching null SNP, it was discarded from enrichment analysis. For the set of cis-eQTLs within the transcribed region of the associated gene, annotation enrichment was additionally evaluated on the basis of whether the SNP was within the 5′ UTR, the 3′ UTR, an exon, or an intron.

The set of genes with evidence of strong cis-acting eQTLs might be over-represented in active genetic pathways that are robust to dysregulation of component genes and/or irrelevant to normal prostate function, which could be of further biological interest. Similarly, genes that exhibit no evidence of any eQTL association might indicate highly conserved expression networks. We conducted pathway-based analyses for these two gene sets (highly dysregulated genes and null genes) by using WebGestalt,43,44 a web-based gene-set analysis toolkit. We defined the dysregulated gene subset as the union of gene-wise significant eQTL genes from each of the three association methods with a corresponding eQTL-effect parameter estimate equal to |β| > log(2), corresponding to a 2-fold expression change. The null gene set was defined as all expressed genes without a gene-wise significant eQTL result across all three testing methods. All enrichment analyses were conducted on the basis of Kyoto Encyclopedia of Genes and Genomes (KEGG)45 pathways and Gene Ontology (GO)46 terms at a Benjamini-Hochberg FDR threshold of 0.05, and p values adjusted for multiple testing were reported with the complete set of expressed genes as a background set.

Results

Of the 494 normal prostate tissue samples eligible for analysis, 471 (453 RP samples and 18 CP samples) passed sample exclusion criteria for DNA and RNA quality-control assessment. Visual examination of PCA results gave no indication of global differences in gene-expression profiles by surgery type (Figure S2). A total of 16,570 autosomal genes were determined to be expressed and suitable for our cis-eQTL analysis, and 13,998 of these also met the criteria for ASE analysis. Overall, 11,024,659 unique SNP-gene pairs were considered for association testing (Table 1; Figure 2A). At a gene-wise FDR of 0.05, 63.2% (10,471/16,570), 46.3% (6,482/13,994), and 72.1% (10,097/13,998) of evaluated genes were declared to be significantly dysregulated for the TE, ASE, and joint association testing methods, respectively (Figure 2B; Tables S2–S4). Manhattan-type plots of the most significant cis-eQTL associations per gene under the TE and joint testing methods are presented in Figure 3. When all approximately 11 million SNP-gene pairs were considered, 365,900 cis-eQTLs were identified to be significant at an FDR of 0.001 (maximum nominal p value = 4.18E−5) under the TE testing framework. Of note, the proportion of SNP-gene pairs declared significant was sizably larger in the joint method of testing (5.1%) than in the TE method (3.3%) at the same FDR level. TE data analyses using conditional quantile normalization47 and the Matrix eQTL6 R package (which applies a linear regression approach) yielded findings comparable to those from the TE trecase analysis (data not shown).

Table 1.

Summary of Performed cis-eQTL Association Tests and Significant Results by Association Testing

Method eQTL Association Tests
Number of SNPs
SNP-Gene Pairs Unique Genes Unique SNPs Gene-wise Experiment-wise
TE 11,024,659 16,570 1,455,343 10,471 (63.2%) 365,900 (3.3%)
ASE 8,541,159 13,994 1,403,840 6,482 (46.3%) 268,153 (3.1%)
Joint 8,026,374 13,998 1,389,388 10,097 (72.1%) 409,322 (5.1%)

Figure 2.

Figure 2

Venn Diagrams of Tested cis-eQTL Associations and Gene-Level Significant Findings

(A) Venn diagram depicting the overlap of unique SNP-gene pairs evaluated by each of the three cis-eQTL association methods (TE only, ASE only, and joint).

(B) Venn diagram of genes with gene-wise significant eQTL associations per method. An overlapping result does not particularly indicate that the exact same SNP is the most significant eQTL for that gene across methods.

Figure 3.

Figure 3

Manhattan Plots of cis-eQTL Association p Values

Manhattan plots depicting Bonferroni-adjusted p values of the strongest associated eQTLs for all (A) 16,570 genes under the TE method and (B) 13,998 genes under the joint testing method. Autosomes are ordered and alternately colored with dark and light gray. Genes in general are plotted as open circles, and solid green circles indicate genes identified as preferentially expressed in prostate tissue by TiGER. For display purposes, association results from prostate-specific genes with an adjusted p value < 1E−25 are labeled with a gene symbol.

As indicated by Figure 2A, the ASE and joint tests did not perfectly overlap, given that some ASE-only associations, including all SNPs for four genes, were not testable. Because of a number of factors, including the number of samples with ASE data and the magnitude of ASE measurements, 114,886 ASE beta-binomial tests failed. However, of these, 78,075 tests were still successfully conducted by the joint method and were retained in our results. In addition to the 41,854 joint tests that were filtered out because of low trans-test p values, 642,181 joint tests completely failed (i.e., had no testing results) as a result of an inability to estimate joint-model parameters. We attributed these model-fitting failures to similar conditions that resulted in rejected trans-test results (i.e., non-cis-acting regulation or chance null results that present as such).

For the purposes of illustration, the gene-specific results for UPK3A (uroplakin-3a precursor [MIM: 611559]), a TiGER-designated prostate-specific gene with one of the most highly significant cis-eQTL associations, are displayed in Figure 4. UPK3A encodes the protein uroplakin 3A, a uroplakin protein family member that is present in both the urothelium and the prostate epithelium.48 The most strongly associated eQTL SNP, rs2742629, is 481 bp upstream of the TSS for UPK3A, and the alternate allele carries a haplotype associated with a 2.94-fold change for ASE (joint p = 9.24E−208). The SNP rs2742629 is located within a large LD block overlapping multiple PReC DNaseI HS peaks near the 5′ end of UPK3A, indicating possible promoter modification. Direct evidence of the ASE imbalance associated with this SNP is presented in Figures 4C and 4D, which respectively demonstrate the relative allelic expression at the gene and eSNP levels on the basis of the corresponding phased rs2742629 genotype.

Figure 4.

Figure 4

Example cis-eQTL Results for UPK3A

Example cis-eQTL results for gene UPK3A, one of the top differentially expressed genes designated as preferentially expressed in prostate tissue.

(A) –log10 p values of eQTL association results by base-pair position. Each SNP position corresponds to up to three test results, and the top associated SNP, rs2742629, is labeled and indicated by a filled symbol.

(B) Boxplots of TE (indicated by raw total mapped reads) by the rs2742629 genotype.

(C) This scatterplot of ASE coded by the phased rs2742629 genotype is based upon reads uniquely mappable to each phased haplotype.

(D) An allelic-ratio plot of eSNPs (x axis) compares allelic expression ratios (y axis) for subjects homozygous for the rs2742629 genotype (cyan squares) and subjects heterozygous for the rs2742629 genotype (magenta circles). Each point represents a heterozygous eSNP (x axis) for a given sample, and the relative size of the plotting mark indicates the quantity of supporting ASE reads. Allelic bias is indicated by a deviation from 0.50 (horizontal dashed line).

Characteristics of Significant cis-eQTL Findings

In general, we observed high densities of cis-eQTLs in proximity to the TSSs and TESs of the dysregulated genes and that 63.8% of gene-wise significant joint-model results occurred within 50 kb of at least one of these positions (Figure 5A). Whereas TE-only and joint results indicated bias toward the TSS of the regulated gene, ASE-based results tended toward the 3′ end. The proportion of gene-level significant SNPs that were within the transcribed region of the associated gene was also higher in the ASE and joint testing results (TE = 0.40, ASE = 0.51, joint = 0.43). Of the 8,762 genes that corresponded to gene-wise significant cis-eQTLs across both TE and joint methods, 2,363 (26.6%) were for different SNPs, and the median absolute physical distance between these SNP pairs was 28 kb. We additionally investigated properties of the 1,335 genes identified to have significant gene-level eQTLs that were not detected by the TE analysis. Overall, these genes tended to have a longer coding length and a higher rate of expression but a smaller magnitude of estimated effects (Figures S3A–S3C).

Figure 5.

Figure 5

Properties of Significant cis-eQTL Associations

(A) A density plot of the physical position of gene-wise significant results in relation to the TES and TSS of each gene is given for each association testing method. For display purposes, the distance between the TES and TSS is normalized to be 100 kb.

(B) Scatterplot of absolute values of eQTL effect-size estimates by relative physical position for the gene-wise significant joint eQTL association results.

(C) Annotation enrichment results for the PrEC ChromHMM chromatin states for all gene-level SNPs, as well as gene-region enrichment for the subset of these SNPs that occur within the transcribed region of the associated gene.

Estimated eQTL effect sizes were strongly associated with position relative to the associated gene (Figure 5B). Spearman correlation testing resulted in a significant negative correlation between absolute estimated effect size and relative position (ρ = −0.094, p < 2.2E−16), defined as the minimum absolute physical distance from the TSS or TES. We discovered a similar inverse association with increasing MAF (ρ = −0.097, p < 2.2E−16), indicating purifying-selection pressure on eQTLs of large effect. Chromatin-state enrichment results for the gene-level significant eQTL SNPs were relatively consistent across testing methods (Figure 5C), demonstrating cis-eQTL enrichment in predicted promoter CREs. Examination of the cis-eQTLs within the associated gene, however, indicated specific enrichment of 3′ UTR SNPs for the methods using ASE data, commensurate with the positional bias observed in Figure 5A. Further analysis into the coding effect of an eQTL variant (synonymous versus nonsynonymous) did not indicate any substantial difference in enrichment (data not shown).

Replication of Previous Findings in Whole Blood

To examine the overlap between our results and those of previously conducted eQTL analyses by using similar data types in a different primary tissue, we obtained the significant cis-eQTL results from Battle et al.,35 who analyzed whole-blood samples from 922 subjects. Of the 10,914 SNP-gene pairs reported to be significant blood cis-eQTLs, 8,696 overlapped our analysis data and were evaluated by at least one testing approach. Because of differences in power across the studies and within our study across association testing methods, we measured reproducibility by using the estimated proportion of true alternative hypotheses, or true-discovery rate (TDR),49 on the basis of the p values from our association analyses. Using fdrtool, we estimated the TDR to be 0.682, 0.579, and 0.710 for the TE, ASE, and joint association results, respectively. These rates indicate not only sizeable overlap of regulatory variation for genes expressed across multiple tissues but also improved reproducibility of previous findings when ASE data are taken into consideration.

To additionally compare the top cis-eQTLs per gene across tissue types, we evaluated the LD between the blood and prostate eQTL SNPs for the sets of genes that were both gene-wise significant in our analyses and reported in Battle et al.35 (TE = 6,075, ASE = 3,909, joint = 5,299). An LD calculation was excluded if the top blood eQTL SNP was not evaluated in our prostate eQTL analysis for the associated gene. Interestingly, whereas a small proportion of genes (5%–8%) shared the exact eQTL SNP, the vast majority of the top prostate eQTL SNPs (50%–60%) were in low LD (R2 < 0.2) with the reported blood SNPs (Table 2), despite the high reproducibility of the eQTLs in general. However, these discrepancies might in part be attributable to differences in study power and design protocol.

Table 2.

LD Comparison of Gene-Level Significant cis-eQTLs by Gene between Prostate and Blood

Method Number of Genes Number of SNPs
Same SNPa LDb= 1.0–0.8 LDb= 0.8–0.6 LDb= 0.6–0.4 LDb= 0.4–0.2 LDb= <0.2
TE 6,075 492 (8.1%) 1054 (17.4%) 388 (6.4%) 432 (7.1%) 616 (10.1%) 3,093 (51.0%)
ASE 3,909 213 (5.4%) 462 (11.8%) 199 (5.1%) 230 (5.9%) 382 (9.8%) 2,423 (62.0%)
Joint 5,299 441 (8.3%) 766 (14.5%) 294 (5.5%) 319 (6.0%) 543 (10.2%) 2,936 (55.4%)
a

The set of SNPs that are the same gene-level eQTL across both tissue types.

b

LD between top eQTL SNPs.

Prostate-Specific Genes

Of the 128 genes listed by TiGER to be prostate specific, 114 overlapped our expressed gene set, and 106 had sufficient ASE data. Of these, 98 (85.6%) corresponded to gene-wise significant eQTLs by at least one testing method (p = 0.00095), and 34 of these corresponded to highly dysregulating cis-eQTLs. Notable findings included PSCA (prostate stem cell antigen [MIM: 602470]; joint p = 3.09E−214), which carries a known truncating variant, and multiple kallikreins (KLK2 [MIM: 147960], joint p = 4.57E−32; KLK3 [MIM: 176820], joint p = 2.53E−11; and KLK4 [MIM: 603767], joint p = 1.02E−26), which are hypothesized to regulate semen liquefaction.50,51 KLK2 encodes a serine protease responsible for cleaving the PSA precursor (KLK3) into its active form.

cis-eQTLs Tagging CREs

A total of 1,767 unique enhancers were listed for prostate epithelial cells in the TEA, and 330 of these were designated to be specific to this cell type. In our genotype dataset, we identified 996 SNPs tagging variants within 559 of these enhancer regions; 206 of these SNPs tagged 103 prostate-specific enhancers (Table S5). Overall, we identified 29.8% (167/559) of total prostate enhancers to be associated with an experiment-wise significant finding by the joint testing method, and 26 of these corresponded to the most significant eQTL association for 45 genes. In contrast, 140 (25.0%) enhancers were tagged by experiment-wise significant eQTLs in the TE analysis. Similar assessment of 189 JASPAR-predicted AR binding sites (Table S6) identified 45 unique AR elements with experiment-wise associations for 56 genes, including four gene-level significant results (BIRC3 [MIM: 601721], CERS3 [MIM: 615276], HMGXB3, and MARCKS [MIM: 177061]). MARCKS has specifically been identified as a target gene in PRCA therapy, given that is relevant to cell motility and mitogenesis, and its expression was noted to be downregulated by overexpression of miRNA mir-21 in PRCA cells.52 The associated cis-eQTL SNP (rs9481384, MAF = 0.079), also associated with MARCKS downregulation (β = −0.21, joint p = 5E−13), is directly located within an experimentally validated AR binding site in PRCA cell line VCaP.53 Of note, although a MARCKS eQTL was also reported by Battle et al.,35 the reported SNP (rs2049923) was not significant in our analysis (joint p = 0.094).

Co-regulatory Variation

A benefit of incorporating ASE information into cis-eQTL analysis is the ability to identify SNPs that co-regulate multiple genes through cis-acting mechanisms rather than true cis-eQTLs that dysregulate one gene and lead to downstream trans effects on the expression of other local genes. Of the 262,909 unique SNPs that were declared to be experiment-wise significant cis-eQTLs by the joint testing approach, 83,008 (31.6%) were associated with allelic expression imbalance in more than one gene (Table 3). To further evaluate characteristics of these co-regulatory cis-eQTLs, we examined the subset of 261 SNPs corresponding to gene-wise significant joint cis-eQTL findings for two or more genes. The Spearman correlation coefficient of the cis-eQTL effects between unique gene pairs was ρ= 0.43 (p = 4.14E−13), indicating a similar direction of effect on co-regulated genes (Figure S2).

Table 3.

Distribution of the Number of Co-regulated Genes by Experiment-wise and Gene-wise Significant cis-eQTL SNPs from the Joint cis-eQTL Analysis

Number of Genes Number of SNPs
Gene-wise Experiment-wise
2 221 (2.2%) 49,173 (18.9%)
3 11 (0.1%) 17,533 (6.7%)
4 0 8,364 (3.2%)
5 0 3,676 (1.4%)
6 0 1,888 (0.7%)
≥7 0 1,834 (0.7%)

Results are reported as the total number of significant SNPs and the percentage of total unique SNPs corresponding to significant eQTLs.

Gene-Set Enrichment Analysis

We identified a total of 1,345 genes corresponding to eQTLs that met our definition of large regulatory effect, as well as 4,089 genes with no significant gene-wise cis-eQTL associations. KEGG pathway-enrichment analysis of the dysregulated gene set identified 13 significant pathways, whereas analysis of the null genes resulted in one enriched pathway (Table S7). Of note, all of the pathway findings for the highly dysregulated gene set were driven in part by genes related to the major histocompatibility complex (MHC). Analysis of GO terms additionally resulted in enrichment of MHC class I (adjusted p = 2.00E−4) and class II (adjusted p = 4.00E−4) genes, along with multiple terms related to these genes (Figure S5). Three significant KEGG results for the null gene set included the cell-cycle pathway, spliceosome, and oxidative phosphorylation. Significant GO terms for null genes were similarly related to critical cell function, including regulation of transcription and metabolic processes (Figure S6).

Enrichment in GWAS Findings

Of the 6,027 SNP rsIDs that we retrieved from the NHGRI Catalog and that were reported to be associated with a trait, 3,655 were included in our genotyping data from eQTL analysis. For enrichment testing, we considered a SNP in our analysis to be a prostate eQTL if it was experiment-wise significant by at least one association testing method, resulting in a total of 325,702 designated eQTL SNPs. Enrichment analyses resulted in significant over-representation (1,850/3,655, p < 2.2E−16) of the prostate eQTLs within the GWAS SNPs, such that 141 SNPs corresponded to at least one gene-wise significant result. Similar enrichment was observed for SNPs associated with PRCA risk. Of the 50 total GWAS Catalog PRCA risk SNPs evaluated by our eQTL analysis, 25 corresponded to experiment-wise prostate eQTLs (p = 2E−5) for 54 genes, and four risk SNPs were the most significant eQTL association for at least one gene. A total of 12 SNPs were reported to be associated with PSA levels, and four of these were also listed as PRCA risk SNPs. Of these, only one SNP (rs1354774) was an experiment-wise significant cis-eQTL that was not also a PRCA risk SNP. The alternate allele of rs1354774 was associated with significant upregulation (joint p = 3.53E−7) of KLKP1 (kallikrein pseudogene 1), an androgen-regulated long non-coding RNA (lncRNA).54

Discussion

The analysis of cis-acting regulatory variation in primary tissues has the potential to elucidate the genetic basis of complex traits and further our understanding of transcriptomic diversity across cell types. Using phased and imputed genome-wide genotype data and high-throughput RNA sequencing, we identified extensive cis-eQTLs affecting a large majority of expressed genes in the human prostate transcriptome. This constitutes a comprehensive eQTL analysis of cis-regulatory SNPs in normal prostate tissues. Our study thus provides not only findings of expression patterns specific to the human prostate but also the genetic contributions to variability of these gene expressions in prostate tissues.

Our findings support many general conclusions regarding cis-eQTLs from previous such studies, including a concentration of large differential expression effects near the TSS and TES and an inverse relationship between effect size and MAF.35 Although RNA-seq data have been previously shown to have improved power over microarray expression data in eQTL studies, on the basis of the broader dynamic range of RNA-seq, we additionally demonstrated improved ability to replicate previously identified cis-eQTL associations by leveraging ASE information. Surprisingly, the different testing methods showed a stark discrepancy in the physical-distance distribution of gene-wise significant eQTLs in relation to the TSS and TES of the corresponding genes, and the methods that utilized ASE data showed enrichment of 3′ UTR SNPs. Although this could be attributable to additional read filtering applied prior to haplotype mapping, it could also be due to each data type’s differing sensitivities to detecting underlying eQTL mechanisms, such as nonsense-mediated decay or RNA silencing, and thus warrants further investigation.

Several cis-eQTLs were characterized by association with multiple genes, and many had similar direction and magnitude of regulatory effect across the dysregulated genes. Of these, 233 were determined to be the most significant eQTL for those genes, indicating co-regulatory modules of substantial effect in prostate tissue. Co-regulation of multiple genes was also observed in previously reported PRCA risk SNPs, indicating that these SNPs might have complex regulatory effects that warrant further investigation. Our use of chromosomally phased genotype data in our analyses provides additional evidence that these co-regulated genes are located on the same allele and might further our understanding of gene co-regulation. The common presence of cis-acting co-regulatory modules was specifically noted in Battle et al.,35 and our results indicate this to be the case in prostate tissue as well.

Gene-set enrichment analysis of highly dysregulated genes in human prostate tissue indicated several pathways involved in autoimmune response, notably MHC class II genes. Similar enrichment results were noted in the blood eQTL analysis of breast cancer survivors,55 whose immune system processes had significantly more eQTLs than those of healthy women. The MHC genes are responsible for regulating immune response, and variants within these genes have been implicated in multiple diseases.56 Dysregulation of MHC genes has been attributed to cancer risk in general, and MHC class II has been recognized as important in antitumor response.57 Downregulation of MHC class I genes has also been noted in multiple cancers as a mechanism for avoiding recognition by the immune system. Our association results for MHC genes present interesting opportunities for further study, given that many eQTL SNPs have extreme effects on multiple genes. In contrast, the null eQTL gene set was highly enriched with regulation of transcriptional and metabolic pathways. These results unsurprisingly highlight systems critical for normal cell function.

In addition to replicating previous analyses showing enrichment of trait-associated SNPs from GWASs in our eQTL findings, we found similar enrichment specifically for PRCA risk SNPs in prostate tissue. These results complement previous work indicating that the heritable risk of PRCA might be driven by dysregulation of gene expression.58–60 Our findings related to these SNPs not only highlight potential genes regulated by these variants but also provide potential functional context for new risk loci in future studies. Because our analyses were conducted on a large set of normal tissue samples, the identified eQTLs might reflect underlying functional PRCA risk effects more accurately than those from tumor-tissue studies. Interestingly, seven of the 20 PRCA risk SNPs were experiment-wise significant for more than one gene, and four of them (rs10187424, rs6465657, rs684232, and rs3850699) were associated with expression variation of three or more separate genes. Figure 6 visualizes the co-regulatory associations between PRCA risk SNP rs68423261 and genes FAM57A (MIM: 611627), GEMIN4 (MIM: 606969), and VPS53 (MIM: 615850). Further investigation into the CREs local to these PRCA SNPs could improve our understanding of PRCA in general and the underlying genetic pathways relevant to PRCA tumorigenesis in particular. Additionally, the evaluation of GWAS SNPs associated with PSA levels indicated a significant experiment-wise cis-eQTL association between rs1354774 and lncRNA KLKP1, a kallikrein pseudogene. Investigating the role of KLKP1 in regulating PSA might help improve PSA screening methods for PRCA through genetic correction.62 The ubiquity of significant eQTLs in our prostate-tissue analysis, however, might warrant caution in immediately ascribing the functional mechanisms of these SNPs to dysregulation of gene expression. Follow-up fine-mapping and functional analyses will be necessary for elucidating the effects of these eQTL SNPs.

Figure 6.

Figure 6

Evidence of cis-Acting Co-regulation of Multiple Genes

Illustration of cis-acting co-regulation of three separate genes by PRCA risk SNP rs684232.

(A) Multi-gene regional association plot of eQTL association −log10 p values for SNPs within 200 kb of rs684232 for FAM57A (red square), GEMIN4 (green circle), and VPS53 (blue triangle); color intensity depicts LD with rs684232. Below is the UCSC Genome Browser display of genomic annotation for the accompanying region, including PrEC chromatin-state segmentation and DNase-seq peaks (the color scheme is based on UCSC ChromHMM display conventions), TEA enhancers, and AR binding sites.

(B) Expression correlation matrices for the (left) TE (library-size adjusted) and (right) ASE allelic ratios for the three associated genes. Subjects were included for the latter if they had at least 20 ASE reads for each gene. The non-zero correlations for the allelic expression ratios indicate correlated gene expression specific to a given haplotype.

(C) Boxplots of the allelic expression ratios for rs684232 (left) homozygotes and (right) heterozygotes. The ratio for heterozygotes is calculated as the rs684232 reference allele carrying haplotype expression divided by the sum of ASE reads per gene. The null expectation (0.50) is indicated by a dashed line. The heterozygote boxplots demonstrate the concurrent downregulatory effect attributed to the alternate allele across all three genes.

Detecting cis-acting regulatory variation via the joint model additionally leverages the allele-specific read-count data, potentially leading to sizable increases in power. Additionally, cis- and trans-acting effects are distinguishable after the genotype data are phased into haplotypes, as directly illustrated in Figure 5A. One limitation of this approach, however, is an increased information burden, given that many subjects might not be heterozygous for eSNPs in a given gene and might thus be uninformative for ASE. To address this issue, we imputed the genic regions to capture additional expressed SNPs not directly interrogated by the genotype assay, after which approximately 2% of all aligned reads were ASE informative. However, even with imputation of the genic regions, many genes simply lacked highly polymorphic eSNPs, given that compared to the TE-only analysis, this analysis was only able to analyze approximately 72.8% of the unique SNP-gene pairs within the physical-distance requirements. Moreover, possible long-range (>500-kb) cis-eQTLs will have been missed by our analysis as a result of our phasing-based distance thresholds. A reasonable resolution is to defer to the joint testing result when possible for a given SNP-gene pair. Finally, additional work investigating allele-specific isoform expression is also warranted.

In summary, by integrating high-density phased genotypes and RNA-seq data, our analyses have uncovered thousands of cis-eQTLs throughout the prostate transcriptome. Of interest, the vast majority of expressed transcripts (approximately 70%) demonstrate expression differences due to the presence of regulatory SNPs. Moreover, the distribution of LD between top prostate and blood cis-eQTLs indicates that the relative regulatory effect sizes of multiple eQTLs for a given gene might vary greatly by tissue type. These results continue to highlight the complexity of gene regulation and variation in human cells and serve as a substantial resource for further investigation into the regulatory landscape of prostate tissue.

Acknowledgments

This work was supported by grants from the Department of Defense (W81XWH-11-1-0261) and the NIH (CA151254 to S.N.T.). SNP genotyping and RNA sequencing were partially supported by the NIH (R01CA157881 to L.W.).

Supplemental Data

Document S1. Figures S1–S6 and Tables S1 and S5–S7
mmc1.pdf (1.1MB, pdf)
Spreadsheet S1. Tables S2–S4
mmc2.xlsx (6.7MB, xlsx)
Document S2. Article plus Supplemental Data
mmc3.pdf (3.3MB, pdf)

Web Resources

The URLs for data presented herein are as follows:

References

  • 1.Stranger B.E., Montgomery S.B., Dimas A.S., Parts L., Stegle O., Ingle C.E., Sekowska M., Smith G.D., Evans D., Gutierrez-Arcelus M. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 2012;8:e1002639. doi: 10.1371/journal.pgen.1002639. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lappalainen T., Sammeth M., Friedländer M.R., ’t Hoen P.A., Monlong J., Rivas M.A., Gonzàlez-Porta M., Kurbatova N., Griebel T., Ferreira P.G., Geuvadis Consortium Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Brown C.D., Mangravite L.M., Engelhardt B.E. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLoS Genet. 2013;9:e1003649. doi: 10.1371/journal.pgen.1003649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Dimas A.S., Deutsch S., Stranger B.E., Montgomery S.B., Borel C., Attar-Cohen H., Ingle C., Beazley C., Gutierrez Arcelus M., Sekowska M. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science. 2009;325:1246–1250. doi: 10.1126/science.1174148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Stenson P.D., Ball E.V., Mort M., Phillips A.D., Shiel J.A., Thomas N.S., Abeysinghe S., Krawczak M., Cooper D.N. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 2003;21:577–581. doi: 10.1002/humu.10212. [DOI] [PubMed] [Google Scholar]
  • 6.Maurano M.T., Humbert R., Rynes E., Thurman R.E., Haugen E., Wang H., Reynolds A.P., Sandstrom R., Qu H., Brody J. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–1195. doi: 10.1126/science.1222794. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Nicolae D.L., Gamazon E., Zhang W., Duan S., Dolan M.E., Cox N.J. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 2010;6:e1000888. doi: 10.1371/journal.pgen.1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hindorff L.A., Sethupathy P., Junkins H.A., Ramos E.M., Mehta J.P., Collins F.S., Manolio T.A. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bainbridge M.N., Warren R.L., Hirst M., Romanuik T., Zeng T., Go A., Delaney A., Griffith M., Hickenbotham M., Magrini V. Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach. BMC Genomics. 2006;7:246. doi: 10.1186/1471-2164-7-246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Marioni J.C., Mason C.E., Mane S.M., Stephens M., Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517. doi: 10.1101/gr.079558.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009;10:57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Vijay N., Poelstra J.W., Künstner A., Wolf J.B. Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol. Ecol. 2013;22:620–634. doi: 10.1111/mec.12014. [DOI] [PubMed] [Google Scholar]
  • 13.Panousis N.I., Gutierrez-Arcelus M., Dermitzakis E.T., Lappalainen T. Allelic mapping bias in RNA-sequencing is not a major confounder in eQTL studies. Genome Biol. 2014;15:467. doi: 10.1186/s13059-014-0467-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Degner J.F., Marioni J.C., Pai A.A., Pickrell J.K., Nkadori E., Gilad Y., Pritchard J.K. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25:3207–3212. doi: 10.1093/bioinformatics/btp579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Leek J.T., Storey J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3:1724–1735. doi: 10.1371/journal.pgen.0030161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chakraborty S., Datta S., Datta S. Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies. Bioinformatics. 2012;28:799–806. doi: 10.1093/bioinformatics/bts022. [DOI] [PubMed] [Google Scholar]
  • 17.Ellis S.E., Gupta S., Ashar F.N., Bader J.S., West A.B., Arking D.E. RNA-Seq optimization with eQTL gold standards. BMC Genomics. 2013;14:892. doi: 10.1186/1471-2164-14-892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Sun W. A statistical framework for eQTL mapping using RNA-seq data. Biometrics. 2012;68:1–11. doi: 10.1111/j.1541-0420.2011.01654.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.McKenzie M., Henders A.K., Caracella A., Wray N.R., Powell J.E. Overlap of expression quantitative trait loci (eQTL) in human brain and blood. BMC Med. Genomics. 2014;7:31. doi: 10.1186/1755-8794-7-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Vaarala M.H., Hirvikoski P., Kauppila S., Paavonen T.K. Identification of androgen-regulated genes in human prostate. Mol. Med. Rep. 2012;6:466–472. doi: 10.3892/mmr.2012.956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chai H., Brown R.E. Field effect in cancer-an update. Ann. Clin. Lab. Sci. 2009;39:331–337. [PubMed] [Google Scholar]
  • 22.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M.A., Bender D., Maller J., Sklar P., de Bakker P.I., Daly M.J., Sham P.C. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Price A.L., Patterson N.J., Plenge R.M., Weinblatt M.E., Shadick N.A., Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006;38:904–909. doi: 10.1038/ng1847. [DOI] [PubMed] [Google Scholar]
  • 24.Patterson N., Price A.L., Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Venables W.N., Ripley B.D. Springer; New York: 2002. Modern applied statistics with S. [Google Scholar]
  • 26.Kalari K.R., Nair A.A., Bhavsar J.D., O’Brien D.R., Davila J.I., Bockol M.A., Nie J., Tang X., Baheti S., Doughty J.B. MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing. BMC Bioinformatics. 2014;15:224. doi: 10.1186/1471-2105-15-224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Trapnell C., Pachter L., Salzberg S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–1111. doi: 10.1093/bioinformatics/btp120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang L., Wang S., Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012;28:2184–2185. doi: 10.1093/bioinformatics/bts356. [DOI] [PubMed] [Google Scholar]
  • 29.Anders S., Pyl P.T., Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Delaneau O., Zagury J.F., Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods. 2013;10:5–6. doi: 10.1038/nmeth.2307. [DOI] [PubMed] [Google Scholar]
  • 31.Marchini J., Howie B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 2010;11:499–511. doi: 10.1038/nrg2796. [DOI] [PubMed] [Google Scholar]
  • 32.Sun W., Hu Y. eQTL Mapping Using RNA-seq Data. Stat. Biosci. 2013;5:198–219. doi: 10.1007/s12561-012-9068-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Pickrell J.K., Marioni J.C., Pai A.A., Degner J.F., Engelhardt B.E., Nkadori E., Veyrieras J.B., Stephens M., Gilad Y., Pritchard J.K. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010;464:768–772. doi: 10.1038/nature08872. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Strimmer K. fdrtool: a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics. 2008;24:1461–1462. doi: 10.1093/bioinformatics/btn209. [DOI] [PubMed] [Google Scholar]
  • 35.Battle A., Mostafavi S., Zhu X., Potash J.B., Weissman M.M., McCormick C., Haudenschild C.D., Beckman K.B., Shi J., Mei R. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 2014;24:14–24. doi: 10.1101/gr.155192.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Ernst J., Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods. 2012;9:215–216. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Taberlay P.C., Statham A.L., Kelly T.K., Clark S.J., Jones P.A. Reconfiguration of nucleosome-depleted regions at distal regulatory elements accompanies DNA methylation of enhancers and insulators in cancer. Genome Res. 2014;24:1421–1432. doi: 10.1101/gr.163485.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Andersson R., Gebhard C., Miguel-Escalada I., Hoof I., Bornholdt J., Boyd M., Chen Y., Zhao X., Schmidl C., Suzuki T., FANTOM Consortium An atlas of active enhancers across human cell types and tissues. Nature. 2014;507:455–461. doi: 10.1038/nature12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mathelier A., Zhao X., Zhang A.W., Parcy F., Worsley-Hunt R., Arenillas D.J., Buchman S., Chen C.Y., Chou A., Ienasescu H. JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res. 2014;42:D142–D147. doi: 10.1093/nar/gkt997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Liu X., Yu X., Zack D.J., Zhu H., Qian J. TiGER: a database for tissue-specific gene expression and regulation. BMC Bioinformatics. 2008;9:271. doi: 10.1186/1471-2105-9-271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Welter D., MacArthur J., Morales J., Burdett T., Hall P., Junkins H., Klemm A., Flicek P., Manolio T., Hindorff L., Parkinson H. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Fisher R.A. Hafner Pub. Co.; Darien, Conn.: 1970. Statistical methods for research workers. [Google Scholar]
  • 43.Zhang B., Kirov S., Snoddy J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 2005;33:W741–W748. doi: 10.1093/nar/gki475. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wang J., Duncan D., Shi Z., Zhang B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res. 2013;41:W77–W83. doi: 10.1093/nar/gkt439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Kanehisa M. The KEGG database. Novartis Found. Symp. 2002;247:91–101. discussion 101–103, 119–128, 244–252. [PubMed] [Google Scholar]
  • 46.Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., The Gene Ontology Consortium Gene ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Shabalin A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Olsburgh J., Harnden P., Weeks R., Smith B., Joyce A., Hall G., Poulsom R., Selby P., Southgate J. Uroplakin gene expression in normal human tissues and locally advanced bladder cancer. J. Pathol. 2003;199:41–49. doi: 10.1002/path.1252. [DOI] [PubMed] [Google Scholar]
  • 49.Storey J.D., Tibshirani R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Emami N., Deperthes D., Malm J., Diamandis E.P. Major role of human KLK14 in seminal clot liquefaction. J. Biol. Chem. 2008;283:19561–19569. doi: 10.1074/jbc.M801194200. [DOI] [PubMed] [Google Scholar]
  • 51.Michael I.P., Pampalakis G., Mikolajczyk S.D., Malm J., Sotiropoulou G., Diamandis E.P. Human tissue kallikrein 5 is a member of a proteolytic cascade pathway involved in seminal clot liquefaction and potentially in prostate cancer progression. J. Biol. Chem. 2006;281:12743–12750. doi: 10.1074/jbc.M600326200. [DOI] [PubMed] [Google Scholar]
  • 52.Li T., Li D., Sha J., Sun P., Huang Y. MicroRNA-21 directly targets MARCKS and promotes apoptosis resistance and invasion in prostate cancer cells. Biochem. Biophys. Res. Commun. 2009;383:280–285. doi: 10.1016/j.bbrc.2009.03.077. [DOI] [PubMed] [Google Scholar]
  • 53.Wei G.H., Badis G., Berger M.F., Kivioja T., Palin K., Enge M., Bonke M., Jolma A., Varjosalo M., Gehrke A.R. Genome-wide analysis of ETS-family DNA-binding in vitro and in vivo. EMBO J. 2010;29:2147–2160. doi: 10.1038/emboj.2010.106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Kaushal A., Myers S.A., Dong Y., Lai J., Tan O.L., Bui L.T., Hunt M.L., Digby M.R., Samaratunga H., Gardiner R.A. A novel transcript from the KLKP1 gene is androgen regulated, down-regulated during prostate cancer progression and encodes the first non-serine protease identified from the human kallikrein gene locus. Prostate. 2008;68:381–399. doi: 10.1002/pros.20685. [DOI] [PubMed] [Google Scholar]
  • 55.Landmark-Høyvik H., Dumeaux V., Nebdal D., Lund E., Tost J., Kamatani Y., Renault V., Børresen-Dale A.L., Kristensen V., Edvardsen H. Genome-wide association study in breast cancer survivors reveals SNPs associated with gene expression of genes belonging to MHC class I and II. Genomics. 2013;102:278–287. doi: 10.1016/j.ygeno.2013.07.006. [DOI] [PubMed] [Google Scholar]
  • 56.Trowsdale J. The MHC, disease and selection. Immunol. Lett. 2011;137:1–8. doi: 10.1016/j.imlet.2011.01.002. [DOI] [PubMed] [Google Scholar]
  • 57.Lu Q.L., Abel P., Mitchell S., Foster C., Lalani E.N. Decreased HLA-A expression in prostate cancer is associated with normal allele dosage in the majority of cases. J. Pathol. 2000;190:169–176. doi: 10.1002/(SICI)1096-9896(200002)190:2<169::AID-PATH517>3.0.CO;2-#. [DOI] [PubMed] [Google Scholar]
  • 58.Hazelett D.J., Rhie S.K., Gaddis M., Yan C., Lakeland D.L., Coetzee S.G., Henderson B.E., Noushmehr H., Cozen W., Kote-Jarai Z., Ellipse/GAME-ON consortium. Practical consortium Comprehensive functional annotation of 77 prostate cancer risk loci. PLoS Genet. 2014;10:e1004102. doi: 10.1371/journal.pgen.1004102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Jiang J., Jia P., Shen B., Zhao Z. Top associated SNPs in prostate cancer are significantly enriched in cis-expression quantitative trait loci and at transcription factor binding sites. Oncotarget. 2014;5:6168–6177. doi: 10.18632/oncotarget.2179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Xu X., Hussain W.M., Vijai J., Offit K., Rubin M.A., Demichelis F., Klein R.J. Variants at IRX4 as prostate cancer expression quantitative trait loci. Eur. J. Hum. Genet. 2014;22:558–563. doi: 10.1038/ejhg.2013.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Eeles R.A., Olama A.A., Benlloch S., Saunders E.J., Leongamornlert D.A., Tymrakiewicz M., Ghoussaini M., Luccarini C., Dennis J., Jugurnauth-Little S., COGS–Cancer Research UK GWAS–ELLIPSE (part of GAME-ON) Initiative. Australian Prostate Cancer Bioresource. UK Genetic Prostate Cancer Study Collaborators/British Association of Urological Surgeons’ Section of Oncology. UK ProtecT (Prostate testing for cancer and Treatment) Study Collaborators. PRACTICAL (Prostate Cancer Association Group to Investigate Cancer-Associated Alterations in the Genome) Consortium Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nat. Genet. 2013;45:385–391, e1–e2. doi: 10.1038/ng.2560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Helfand B.T., Loeb S., Hofer M.D., Kim D.Y., Hu Q.Y., Cooper P.R., McGuire B.B., Catalona W.J. Personalized Psa Testing Using Genetic Variants Can Possibly Decrease the Number of Prostate Biopsies. J. Urol. 2012;187 doi: 10.1016/j.juro.2012.12.023. E489–E489. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S6 and Tables S1 and S5–S7
mmc1.pdf (1.1MB, pdf)
Spreadsheet S1. Tables S2–S4
mmc2.xlsx (6.7MB, xlsx)
Document S2. Article plus Supplemental Data
mmc3.pdf (3.3MB, pdf)

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES