1) Mapping WTS data to the reference genome: Reference genome and gene model annotation files were downloaded from ensemble genome browser website (ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/ and ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/) directly. Index of the reference genome was built using Bowtie v2.0.6 and paired-end clean reads were aligned to the reference genome using TopHat v2.0.9.
2) Quantification of gene expression level: Cuffdiff was used to calculate FPKMs of both lncRNAs and coding genes in each sample. Gene FPKMs were computed by summing the FPKMs of transcripts in each gene group. FPKM means fragments per kilo-base of exon per million fragments mapped, calculated based on the length of the fragments and reads count mapped to this fragment.
3) Differential expression analysis: Cuffdiff provides statistical routines for determining differential expression in digital transcript or gene expression data using a model based on the negative binomial distribution. Transcripts with an adjusted P-value <0.05 were assigned as differentially expressed.
4) AR output score: The AR output score was calculated as previously described. Briefly, z-scores of 20 androgen-induced genes were computed by subtracting the pooled mean from the RNA-seq expression values and dividing by the pooled standard deviation. The sum of the Z-scores of the AR signaling gene signature represents the AR output score for each sample.
Fusion gene refers to the event in which partial or complete sequences of two individual genes fused together and resulted in a chimeric gene, usually caused by chromosomal translocation. We used SOAPfuse(1.27) software to detect and analyze fusion genes. We compared fusion candidates to public fusion database (Fusionhub), and annotated them in Supplementary Table.
perl SOAPfuse-RUN.pl \ -c <config.txt> \ -fd <sample_dir> \ -l <sample.config> \ -o <outdir> -fs 1 -es 9
config.txt
DB_db_dir = <SOAPfuse/db/GRCh38> DB_wg_soap_ref = $(db_dir)/WG_index_soap/genome.fa.index DB_cytoBand = $(db_dir)/cytoBand.txt DB_trans_soap_ref = $(db_dir)/transcript_index_soap/transcript.fa.index DB_trans_bwa_ref = $(db_dir)/transcript_index_bwa/transcript.fa DB_trans_psl = $(db_dir)/transcript.psl DB_trans_gtf = $(db_dir)/Gene_annotation.gtf.gz DB_gene_psl = $(db_dir)/gene.psl DB_gene_fa = $(db_dir)/gene.fa DB_genefamily = $(db_dir)/gene_family/gene_family.brief.txt DB_blast_homo_list = $(db_dir)/blast_homo_gene.m8.gz PG_pg_dir = <SOAPfuse-v1.27/source/bin> PG_soap = $(pg_dir)/aln_bin/soap2.21 PG_bwa = $(pg_dir)/aln_bin/bwa PG_blat = $(pg_dir)/aln_bin/blat PG_bwt = $(pg_dir)/aln_bin/2bwt-builder2.20 PG_DE_stat = $(pg_dir)/DE_statistic PG_convert = $(pg_dir)/convert PS_ps_dir = <SOAPfuse-v1.27/source> PS_s01 = $(ps_dir)/SOAPfuse-01-alignWG.pl PS_s02 = $(ps_dir)/SOAPfuse-02-align_unmap_transcript.pl PS_s03 = $(ps_dir)/SOAPfuse-03-align_trim_unmap_transcript.pl PS_s04 = $(ps_dir)/SOAPfuse-04-change_SE.pl PS_s05 = $(ps_dir)/SOAPfuse-05-candidate.pl PS_s06 = $(ps_dir)/SOAPfuse-06-divide_soap_denovo_unmap.pl PS_s07 = $(ps_dir)/SOAPfuse-07-junction_seq_deal.pl PS_s08 = $(ps_dir)/SOAPfuse-08-final_fusionGene.pl PS_s09 = $(ps_dir)/SOAPfuse-09-deeper_analysis.pl PD_alignWG = $(all_out)/alignWG PD_align_unmap_Tran = $(all_out)/align_unmap_Tran PD_align_trim_unmap_Tran = $(all_out)/align_trim_unmap_Tran PD_change_SE = $(all_out)/change_SE PD_candidate = $(all_out)/candidate PD_denovo_unmap = $(all_out)/denovo_unmap PD_junction_seq = $(all_out)/junction_seq PD_final_fusion_genes = $(all_out)/final_fusion_genes PA_all_somatic_mode = no PA_all_postfix_of_tissue = 'N:-N;N:-Normal;N:-B;N:-Blood;T:-CA;T:-C;T:-T;T:-Tumor;T:-Cancer' PA_all_fq_postfix = fq.gz PA_all_process_of_align_software = 12 PA_all_shortest_length_trim_unmap_to = 40 PA_all_maximum_genome_loc_trimmed_read_mapped = 2 PA_all_maximum_genome_loc_intact_read_mapped = 1 PA_all_intron_len_extend_from_exon_edge = 100 PA_s02_realign = yes PA_s05_save_genes_name_with_dot = no PA_s05_save_genes_from_same_family = yes PA_s05_amass_control_of_span_reads = yes PA_s05_maximum_fusion_partner_of_one_gene = 10 PA_s05_the_minimum_span_reads_for_candidate = 5 PA_s06_save_reads_have_mismatch_around_fusepos = yes PA_s06_number_of_flank_bases_near_read_end_for_filter_mismatch = 5 PA_s06_the_maximum_mismatch_in_flank_region = 0 PA_s07_the_minimum_span_reads_for_junction_construction = 5 PA_s07_extended_bases_near_pe_read_end = 0 PA_s07_the_min_cons_for_credible_fuse_region = 0.5 PA_s07_maximum_mismatch_for_align_junction_reads = 3 PA_s07_flank_bases_around_fuse_point_for_check_mismatch = 5 PA_s07_maximum_mismatch_in_flank_region = 0 PA_s07_junc_read_map_both_sides_at_least = 7 PA_s08_number_of_extend_bases = 0 PA_s08_insert_control_sup = no PA_s08_min_sum_reads = 5 PA_s08_min_support_reads_for_both_edge = 1,1 PA_s08_min_support_reads_for_one_edge_one_internal = 2,2 PA_s08_min_support_reads_for_both_internal = 2,2 PA_s08_min_intrachr_distance = 1000 PA_s08_min_bases_covered_both_sides_around_fuse_point = 10 PA_s08_only_remain_edge_case = no PA_s09_draw_fusion_expression_svg = yes
SOAPfuse filtering were applies according to default parameter: junction reads=1 and split reads=1. In addition, the junction location and the whether it was in-frame fusion or out of frame fusion were considered. To filter the normal fusion contamination, all the fusion genes detected by the SOAPfuse in normal samples were merged together into a panel of normal. If a breakpoint of the fusion from tumor sample is detected in the panel of normal, the corresponding fusion was removed.