Saturday, February 15, 2014

BWA vs. SNAP

Goal

Compare BWA to SNAP

Approach





Preparation

#app


#data, simulated from hg19-chr20, paired, 30x average coverage
$wgsim -N 10000000 -r 0.01 -1 100 -2 100 -S11 -e0 chr20.fa A1.fq A2.fq > mut.txt



Running Time




BWA SNAP
30m20s 20m01s



Mapping metrics





Variants Concordance





Impression (so far)


1) SNAP is 1.5x faster than BWA (memory consumption is not evaluated)
2) SNAP aligned more TP alignments, less FP alignments than BWA
3) SNAP generated more TP (8512 vs. 740) and FP (2715 vs. 596) variants than BWA






STAR RNASeq Aligner

STAR


#download app
$wget https://rna-star.googlecode.com/files/STAR_2.3.0e.Linux_x86_64.tgz

$tar -zxvf STAR_2.3.0e.Linux_x86_64.tgz

$cd STAR_2.3.0e.Linux_x86_64/


#splice junction data
$wget ftp://ftp2.cshl.edu/gingeraslab/tracks/STARrelease/STARgenomes/SpliceJunctionDatabases/gencode.v14.annotation.gtf.sjdbcd

#create the dir for index genome
$mkdir hg19

#generate index genome with splice junction annotations
$./STAR --runMode genomeGenerate --genomeDir hg19 --genomeFastaFiles /projects/confidential_sequence/home/sun/data/hg19/hg19.fa --runThreadN 4 --sjdbFileChrStartEnd gencode.v14.annotation.gtf.sjdb --sjdbOverhang 99

$mv hg19 ~/data/star_hg19

$cd ~

$mkdir -p test/STAR; cd ~/test/STAR


#full path of input files does NOT work. instead, create softlinks  on the local folder. 
$ln -s   .

#run
$time STAR_2.3.0e.Linux_x86_64/STAR --genomeDir  data/star_hg19 --readFilesIn *.fastq.gz --outFileNamePrefix  --runThreadN 10 --readFilesCommand zcat 1>std.txt 2>err.txt

The output is a SAM file with name "OUTPUT.Aligned.out.sam"

By using suffix tree algorithm, STAR uses lots of (>30GB)  memory in exchange of speed.




Monday, February 10, 2014

Node.js is fun

I have been playing with Node.js for few days and really love it. Trying to build a prototype website with workflows. 

Use case:



  • log into the system
  • upload dataset with supported format (fastq, sam/bam, vcf, bed. etc)
  • describe the dataset
  • choose components/node, each node is a independent operation
  • connect components with edges as DAG (directed acyclic graph)
  • fine tune each component if necessary (add/remove/change parameter settings)
  • execute the flow
  • monitor the progress (check, terminate, pause)
  • check out the output of each component and last result
  • save the flow for future use, share, publish.


#download the NoFlo.js flow demo
$git clone https://github.com/noflo/dataflow-noflo.git flow

$cd flow

$npm install

$grunt build

#start a simple http server to serve the contents
$python -m SimpleHTTPServer

#You may see a log message like this:
#Serving HTTP on 0.0.0.0 port 8000

Now start a browser like Chrome, type in 

"192.168.1.2:8000/demo/" 

#192.168.1.2 is my ip address. YMMV.

You will see a very nice dataflow graph like this:





You can add/delete/drag/move nodes and edges. Cool!