ASTool

Alternative splicing events detection from RNA-Seq data

Step1: Download and installation

Dependence:
Perl,threads(Perl module),SRA Toolkit, STAR, Linux/Unix

Download:
wget http://zzdlab.com/ASTool/ASTool_v1/ASTool_v1.1.zip

Genome FASTA and gene annotation file can be downloaded from Ensemble Plants: https://plants.ensembl.org/index.html

For example:

wget http://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz
wget http://ftp.ensemblgenomes.org/pub/plants/release-52/gtf/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.52.gtf.gz

Installation:
unzip ASTool_v1.1.zip
cd ASTool_v1.1
Step2: Preparing SAM file for ASTool

1.Raw RNA-Seq data is downloaded from NCBI SRA and .sra file is converted to .fastq file using fastq-dump of SRA Toolkit

  Considering the unprocessed transcripts and overlapping antisense transcripts in RNA-Seq data, so we recommend users to use Poly-A enriched RNA-seq and strand-specific protocols.
e.g.
fastq-dump SRR4048211.sra
2. Adaptor removal and quanlity filtering are performed by Trimmomatic
java -jar trimmomatic-0.33.jar SE -phred33 -threads 10 SRR4048211.fastq SRR4048211_trim.fastq LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36&>SRR4048211_trim_log.txt
3. Sequencing reads are mapped to reference genome by STAR
1. Build index

mkdir genome_index_length100
STAR --runThreadN 20 --runMode genomeGenerate --genomeDir genome_index_length100 --genomeFastaFiles Arabidopsis_thaliana.TAIR10.dna.toplevel.fa --sjdbGTFfile Arabidopsis_thaliana.TAIR10.52.gtf --sjdbOverhang 99

2. Alignment

STAR --runThreadN 20 --genomeDir genome_index_length100/ --outFileNamePrefix SRR4048211. --readFilesIn SRR4048211_trim.fastq --outSJfilterReads Unique --outFilterMismatchNmax 4 --quantMode GeneCounts
Step3: Running ASTool
Calculating PSI of four events: IR, ES,A5SS and A3SS.
1 Calculate the junction count (Reference genome notes are available)
perl junction_count.pl --gtf [gene annotation file] --sam [SAM file] --thread [thread] --readlength [readlength] --m [m value] --outdir [outdir]
e.g.
perl junction_count.pl --gtf Arabidopsis_thaliana.TAIR10.51.gtf --sam SRR4048211.Aligned.out.sam --thread 10 --readlength 100 --m 8 --outdir SRR4048211_junction_ref

2. Calculate the PSI
2.1 IR events:
perl IR_PSI.pl --junction [the file of junction count] --gtf [gene annotation file] --m [m value] --outdir [outdir]
(m value: minimum length between read and exon or intron,the same below)
e.g.
perl IR_PSI.pl --junction ./SRR4048211_junction_ref/junction_count.txt --gtf Arabidopsis_thaliana.TAIR10.52.gtf --m 8 --outdir SRR4048211_IR_PSI.txt

2.2 ES events:
perl ES_PSI.pl --junction [the file of junction count] --gtf [gene annotation file] --m [m value] --outdir [outdir]
e.g.
perl ES_PSI.pl SRR4048211_junction_ref/junction_count.txt Arabidopsis_thaliana.TAIR10.52.gtf 8 SRR4048211_ES_PSI.txt

2.3 A5SS/A3SS events:
perl A5SS_A3SS_PSI.pl --junction [the file of junction count] --gtf [gene annotation] --m [m value] --A5SS_outdir [A5SS events outdir] --A3SS_outdir [A3SS events outdir]
e.g.
perl A5SS_A3SS_PSI.pl --junction SRR4048211_junction_ref/junction_count.txt --gtf Arabidopsis_thaliana.TAIR10.52.gtf --m 8 --A5SS_outdir SRR4048211_A5SS_PSI.txt --A3SS_outdir SRR4048211_A3SS_PSI.txt

Main outfile
1. junction_count.txt
Column1: ch

Column2: position of junction

Column3: effect length of junction

Column4: reads mapping to junction
2. IR_PSI.txt
Column1: Gene

Column2: ch

Column3: position of intron (I)

Column4: position of flanking exon 1(E1)

Column5: position of flanking exon 2(E2)

Column6: intron type ("Known" or "Unknown")

Column7: intron type ("Clean" of "Not Clean")

Column8: reads mapping to junction E1_I

Column9: reads mapping to junction I_E2

Column10: reads mapping to junction E1_E2

Column11: warning 1 ("Low Count" or "-")

Column12: warning 2 ("Imblance" or "-")

Column13: PSI
2. ES_PSI.txt
Column1: Gene

Column2: ch

Column3: position of exon (E)

Column4: position of flanking intron 1 (intron between exon E1 and E)

Column5: position of flanking intron 2 (intron between exon E1 and E)

Column6: intron type ("Known" or "Unknown")

Column7: exon type ("Clean" of "Not Clean")

Column8: reads mapping to junction E1_E

Column9: reads mapping to junction E_E2

Column10: reads mapping to junction E1_E2

Column11: warning 1 ("Low Count" or "-")

Column12: warning 2 ("Imblance" or "-")

Column13: PSI
3. A5SS_PSI.txt and A3SS_PSI.txt
Column1: Gene

Column2: ch

Column3: strand

Column4: shoter intron

Column5: longer intron

Column6: -

Column7: -

Column8: reads mapping to junction span shorter intron

Column9: eads mapping to junction span longer intron

Column10: -

Column11: warning 1 ("Low Count" or "-")

Column12: -

Column13: PSI
Visualization
Visualize introns of interest
4 Intron visualization
4.1.1 Calculate the junction count (Reference genome notes are available) See section 1.1 for details


4.1.2 Count junction count without reference comments
perl junction_count_no_ref.pl --sam [SAM file] --thread [thread] --m [m value] --outdir [outdir]
e.g.
perl junction_count_no_ref.pl --sam SRR4048211.Aligned.out.sam --thread 10 --m 8 --outdir SRR4048211_junction_no_ref

4.2 Visualization
perl ASTools_IR_view2.pl --gtf [gene annotation file] --sam [SAM file] --intron [Introns of interest(.txt)] --outdir [outdir] --junction [Junction count with reference comments] --psi [the file of IR PSI] --no_junction [Junction count with no reference comments]
e.g.
The format of interest intron(intron.txt):AT1G01520_1_160303_160417
perl ASTools_IR_view2.pl --gtf Arabidopsis_thaliana.TAIR10.52.gtf --sam SRR5197909.Aligned.out.sam --intron intron.txt --outdir ./SRR4048211_plot --junction ./SRR4048211_junction_ref/junction_count.txt --psi SRR4048211_IR_PSI.txt --no_junction SRR4048211_junction_no_ref/junction_count.txt

Copyright © 2021 Ziding Zhang's Lab - China Agricultural University. All Rights Reserved. Maintained by Huan Qi    京ICP备18035355号