Saturday, February 15, 2014



Compare BWA to SNAP




#data, simulated from hg19-chr20, paired, 30x average coverage
$wgsim -N 10000000 -r 0.01 -1 100 -2 100 -S11 -e0 chr20.fa A1.fq A2.fq > mut.txt

Running Time

30m20s 20m01s

Mapping metrics

Variants Concordance

Impression (so far)

1) SNAP is 1.5x faster than BWA (memory consumption is not evaluated)
2) SNAP aligned more TP alignments, less FP alignments than BWA
3) SNAP generated more TP (8512 vs. 740) and FP (2715 vs. 596) variants than BWA

STAR RNASeq Aligner


#download app

$tar -zxvf STAR_2.3.0e.Linux_x86_64.tgz

$cd STAR_2.3.0e.Linux_x86_64/

#splice junction data

#create the dir for index genome
$mkdir hg19

#generate index genome with splice junction annotations
$./STAR --runMode genomeGenerate --genomeDir hg19 --genomeFastaFiles /projects/confidential_sequence/home/sun/data/hg19/hg19.fa --runThreadN 4 --sjdbFileChrStartEnd gencode.v14.annotation.gtf.sjdb --sjdbOverhang 99

$mv hg19 ~/data/star_hg19

$cd ~

$mkdir -p test/STAR; cd ~/test/STAR

#full path of input files does NOT work. instead, create softlinks  on the local folder. 
$ln -s   .

$time STAR_2.3.0e.Linux_x86_64/STAR --genomeDir  data/star_hg19 --readFilesIn *.fastq.gz --outFileNamePrefix  --runThreadN 10 --readFilesCommand zcat 1>std.txt 2>err.txt

The output is a SAM file with name "OUTPUT.Aligned.out.sam"

By using suffix tree algorithm, STAR uses lots of (>30GB)  memory in exchange of speed.

Monday, February 10, 2014

Node.js is fun

I have been playing with Node.js for few days and really love it. Trying to build a prototype website with workflows. 

Use case:

  • log into the system
  • upload dataset with supported format (fastq, sam/bam, vcf, bed. etc)
  • describe the dataset
  • choose components/node, each node is a independent operation
  • connect components with edges as DAG (directed acyclic graph)
  • fine tune each component if necessary (add/remove/change parameter settings)
  • execute the flow
  • monitor the progress (check, terminate, pause)
  • check out the output of each component and last result
  • save the flow for future use, share, publish.

#download the NoFlo.js flow demo
$git clone flow

$cd flow

$npm install

$grunt build

#start a simple http server to serve the contents
$python -m SimpleHTTPServer

#You may see a log message like this:
#Serving HTTP on port 8000

Now start a browser like Chrome, type in 


# is my ip address. YMMV.

You will see a very nice dataflow graph like this:

You can add/delete/drag/move nodes and edges. Cool!