DNA-methylation  Workflow

Quality Control

First of all, we used FastQC (fastqc_v0.11.5) to perform basic statistics on the quality of the raw reads. Then, those read sequences produced by the Illumina pipeline in FASTQ format were pre-processed through Trimmomatic (Trimmomatic-0.36) software using the parameter (SLIDINGWINDOW:4:15 LEADING:3 TRAILING:3 ILLUMINACLIP:adapter.fa:2:30:10 MINLEN:36). The remaining reads that passed all the filtering steps were counted as clean reads and all subsequent analyses were performed on those clean reads. At last, we used FastQC to perform basic statistics on the quality of the clean reads.


Bismark software (version 0.16.3) was used to align bisulfite-treated reads to a human bisulfite-converted reference genome (-X 700 --dovetail). The human reference genome was firstly transformed into bisulfite-converted version (C-to-T and G-to-A converted) and then indexed using bowtie2. Sequence reads were also transformed into fully bisulfite-converted versions (C-to-T and G-to-A converted) before they are aligned to similarly converted versions of the genome in a directional manner. Sequence reads that produce a unique best alignment from the two alignment processes (original top and bottom strand) were then compared to the normal genomic sequence and the methylation state of each cytosine position in the read was inferred. The reads aligned to the same genomic region were regarded as duplicated ones. The sequencing depth and coverage were summarized using deduplicated reads.

The results of methylation extractor (bismark_methylation_extractor, --no_overlap) were transformed into bigWig format for visualization using IGV browser. The bisulfite conversion rate was calculated as the percentage of thymine (T) sequenced at cytosine (C) reference positions in the lambda genome.

Step1: Adaptor trimming

java -jar  trimmomatic-0.36.jar \
    LEADING:3 \
    TRAILING:3 \
    ILLUMINACLIP:adapter.fa:2:30:10 \

Step2: Alignment

bismark <genome_dir> \
    -1 <read1_fq> -2 <read2_fq> \
    -X 700 --dovetail

Step3: Duplicated reads removal

Step4: Methylation extraction

bismark_methylation_extractor --no_overlap

PMD detection

PMDs in each sample were identified by using a hidden Markov model (HMM)-based tool, MethPipe v.3.4.3.

pmd -o <pmd_out_file> -i 300 -b 1000 -v <cpg_input_file>

DMR detection

DMRs were identified by using DSS v.2.14.0.


dat1 <- read.table(<normal.dss.txt.gz>, header=TRUE)
dat2 <- read.table(<tumor.dss.txt.gz>, header=TRUE)
names <- c(<normal_ID>, <tumor_ID>)
base  <- paste0(names[1], "_", names[2])

BSobj <- makeBSseqData( list(dat1, dat2), names)

# Perform statistical test for DML
dmlTest <- DMLtest(BSobj, group1=names[1], group2=names[2], smoothing=TRUE, smoothing.span=500)
wfile = paste0(dir, "/DMLtest_", base, ".txt.gz")
write.table(dmlTest, file = gzfile(wfile, compression = 9), quote = FALSE, sep = "\t", row.names = FALSE)

# DMR detection
dmrs <- callDMR(dmlTest, delta=0.2, p.threshold=10^-16, minlen=200, minCG=5, dis.merge=50, pct.sig=0.5)
wfile = paste0("/DMRs_", base, ".txt.gz")
write.table(dmrs, file = gzfile(wfile, compression = 9), quote = FALSE, sep = "\t", row.names = FALSE)

CIMP detection

To define CpG island methylator phenotype in PCa, recurrently methylated promoter CGIs were first identified. Mean methylation levels of all 18,584 CGIs located in gene promoters were calculated per each pair of tumor and matched normal samples using smoothed CpG methylation levels. The CGIs defining CIMP (CIMP-CGIs) were selected based on the following criteria: 1) average methylation levels of CGIs across normal samples were less than 0.4. 2) More than half (>93) of the tumors have higher methylation levels than their matched normal samples by 0.3.

zcat <methylation matrix of CGI by sample with header and row names> | 
    awk 'NR==1 { print $0"\tMeanN\tNumT" }
        NR>1 { n=0; s=0; d=0; 
            for(i=2;i<189;i++) {
                if($i!="NA") {
                    n++; s+=$i; j=i+187;
                    if($j!="NA" && $j-$i>=0.3) d++ } }
            if(n>0) {s/=n}
            if(n==0) {s="NA"};
            if((s<0.4 && d>93)) print $0"\t"s"\t"d; } ' |
    gzip -nc > <cimp_out_file>

Epimutation burden

Epimutation burden of a tumor was calculated as the number of CpGs with methylation differences ≥0.2 between tumor and its matched normal sample divided by the total number of CpGs in the genome. Epimutation burden ranged from 0.013 to 0.45. Correlation between the epimutation burden and other mutational or clinicopathological features were tested by using Spearman’s rank order correlation.

Copyright © Department of Urology, Changhai Hospital & Center for Translational Medicine. All rights reserved.