Saturday, February 15, 2014

STAR RNASeq Aligner

STAR


#download app
$wget https://rna-star.googlecode.com/files/STAR_2.3.0e.Linux_x86_64.tgz

$tar -zxvf STAR_2.3.0e.Linux_x86_64.tgz

$cd STAR_2.3.0e.Linux_x86_64/


#splice junction data
$wget ftp://ftp2.cshl.edu/gingeraslab/tracks/STARrelease/STARgenomes/SpliceJunctionDatabases/gencode.v14.annotation.gtf.sjdbcd

#create the dir for index genome
$mkdir hg19

#generate index genome with splice junction annotations
$./STAR --runMode genomeGenerate --genomeDir hg19 --genomeFastaFiles /projects/confidential_sequence/home/sun/data/hg19/hg19.fa --runThreadN 4 --sjdbFileChrStartEnd gencode.v14.annotation.gtf.sjdb --sjdbOverhang 99

$mv hg19 ~/data/star_hg19

$cd ~

$mkdir -p test/STAR; cd ~/test/STAR


#full path of input files does NOT work. instead, create softlinks  on the local folder. 
$ln -s   .

#run
$time STAR_2.3.0e.Linux_x86_64/STAR --genomeDir  data/star_hg19 --readFilesIn *.fastq.gz --outFileNamePrefix  --runThreadN 10 --readFilesCommand zcat 1>std.txt 2>err.txt

The output is a SAM file with name "OUTPUT.Aligned.out.sam"

By using suffix tree algorithm, STAR uses lots of (>30GB)  memory in exchange of speed.




No comments:

Post a Comment