First of all, we used FastQC (fastqc_v0.11.5) to perform basic statistics on the quality of the raw reads. Then, those read sequences produced by the Illumina pipeline in FASTQ format were pre-processed through Trimmomatic (Trimmomatic-0.36) software using the parameter (SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 ILLUMINACLIP:adapter.fa:2:30:10 MINLEN:36). The remaining reads that passed all the filtering steps were counted as clean reads and all subsequent analyses were performed on those clean reads. At last, we used FastQC to perform basic statistics on the quality of the clean reads.
Bismark software (version 0.16.3) was used to align bisulfite-treated reads to a human bisulfite-converted reference genome (-X 700 --dovetail). The human reference genome was firstly transformed into bisulfite-converted version (C-to-T and G-to-A converted) and then indexed using bowtie2. Sequence reads were also transformed into fully bisulfite-converted versions (C-to-T and G-to-A converted) before they are aligned to similarly converted versions of the genome in a directional manner. Sequence reads that produce a unique best alignment from the two alignment processes (original top and bottom strand) were then compared to the normal genomic sequence and the methylation state of each cytosine position in the read was inferred. The reads aligned to the same genomic region were regarded as duplicated ones. The sequencing depth and coverage were summarized using deduplicated reads.
The results of methylation extractor (bismark_methylation_extractor, --no_overlap) were transformed into bigWig format for visualization using IGV browser. The bisulfite conversion rate was calculated as the percentage of thymine (T) sequenced at cytosine (C) reference positions in the lambda genome.
java -jar trimmomatic-0.36.jar \ SLIDINGWINDOW:4:15 \ LEADING:3 \ TRAILING:3 \ ILLUMINACLIP:adapter.fa:2:30:10 \ MINLEN:36
bismark <genome_dir> \ -1 <read1_fq> -2 <read2_fq> \ -X 700 --dovetail
bismark_methylation_extractor --no_overlap
PMDs in each sample were identified by using a hidden Markov model (HMM)-based tool, MethPipe v.3.4.3.
pmd -o <pmd_out_file> -i 300 -b 1000 -v <cpg_input_file>
DMRs were identified by using DSS v.2.14.0.
library(DSS) require(bsseq) dat1 <- read.table(<normal.dss.txt.gz>, header=TRUE) dat2 <- read.table(<tumor.dss.txt.gz>, header=TRUE) names <- c(<normal_ID>, <tumor_ID>) base <- paste0(names[1], "_", names[2]) BSobj <- makeBSseqData( list(dat1, dat2), names) # Perform statistical test for DML dmlTest <- DMLtest(BSobj, group1=names[1], group2=names[2], smoothing=TRUE, smoothing.span=500) wfile = paste0(dir, "/DMLtest_", base, ".txt.gz") write.table(dmlTest, file = gzfile(wfile, compression = 9), quote = FALSE, sep = "\t", row.names = FALSE) # DMR detection dmrs <- callDMR(dmlTest, delta=0.2, p.threshold=10^-16, minlen=200, minCG=5, dis.merge=50, pct.sig=0.5) wfile = paste0("/DMRs_", base, ".txt.gz") write.table(dmrs, file = gzfile(wfile, compression = 9), quote = FALSE, sep = "\t", row.names = FALSE)
To define CpG island methylator phenotype in PCa, recurrently methylated promoter CGIs were first identified. Mean methylation levels of all 18,584 CGIs located in gene promoters were calculated per each pair of tumor and matched normal samples using smoothed CpG methylation levels. The CGIs defining CIMP (CIMP-CGIs) were selected based on the following criteria: 1) average methylation levels of CGIs across normal samples were less than 0.4. 2) More than half (>93) of the tumors have higher methylation levels than their matched normal samples by 0.3.
zcat <methylation matrix of CGI by sample with header and row names> | awk 'NR==1 { print $0"\tMeanN\tNumT" } NR>1 { n=0; s=0; d=0; for(i=2;i<189;i++) { if($i!="NA") { n++; s+=$i; j=i+187; if($j!="NA" && $j-$i>=0.3) d++ } } if(n>0) {s/=n} if(n==0) {s="NA"}; if((s<0.4 && d>93)) print $0"\t"s"\t"d; } ' | gzip -nc > <cimp_out_file>
Epimutation burden of a tumor was calculated as the number of CpGs with methylation differences ≥0.2 between tumor and its matched normal sample divided by the total number of CpGs in the genome. Epimutation burden ranged from 0.013 to 0.45. Correlation between the epimutation burden and other mutational or clinicopathological features were tested by using Spearman’s rank order correlation.