Friday, March 28, 2014

Gene Fusion Detection - deFuse

##########################################
# Input Data, from the 1000G NA12878 RNASeq
##########################################
FASTQ1=ERR356372_1.fastq.gz 
FASTQ2=ERR356372_2.fastq.gz

#######################################
# deFuse
#######################################
#you have to install dependencies include "samtools", "bowtie", "blat", "faToTwoBits", "R" 
#(check out if library "ada" was installed) and "gmap"

#download app
#http://sourceforge.net/apps/mediawiki/defuse/index.php?title=DeFuse_Version_0.6.1

#path
DEFUSE=/home/ysun/app/defuse-0.6.1/scripts

#download lib data
cd /home/ysun/data/
$DEFUSE/create_reference_dataset.pl -c defuse_cbbl.config

#the default create_reference_dataset.pl has one bug on creating the gmap indicies.
#you need to remove "$gmap_index_directory" in "make" command from "sub #create_gmap_indices".
#In my system , the total size of all lib files is 48GB.

#unzip the FASTQ. deFuse will not work on zipped FASTQ?
gunzip *.fastq.gz

#let us start with a small dataset with 100K reads for testing

head -n 400000 ERR356372_1.fastq > A_1.fastq
head -n 400000 ERR356372_2.fastq > A_2.fastq

#now start the testing job
$DEFUSE/defuse.pl -c defuse_cbbl.config -1 A_1.fastq -2 A_2.fastq -o output_defuse

#if everything goes well  you will see outputs like this:
Importing fastq files
Splitting fastq files
Discordant alignments
Read Stats
        Fragment mean 173.540675389827 stddev 59.0075428839599
        Read length min 49 max 49
Generating discordant alignment clusters
Remove mitochondrial-genomic clusters
Generating maximum parsimony solution
Selecting fusion clusters
Preparing sequences for local realignment
Performing local realignment
Filtering concordant clusters
Generating spanning alignment regions file
Initializing split read alignments
Calculating split read alignments
Evaluating split reads
Calculating spanning stats
Calculating spanning p-values
Calculating split read pvalues
Creating fastas
Splitting fastas
Breakpoint alignments
Annotating fusions
Coallating fusions
Running adaboost classifier
Filtering fusions
Success

#now let us go full scale

$time $DEFUSE/defuse.pl -c defuse_cbbl.config -1 ERR356372_1.fastq -2 ERR356372_2.fastq -o output_defuse 

real    1703m53.046s
user    1642m55.346s
sys     83m2.979s

#Kind of slow

The NA12878 RNASeq dataset is very small with 18 million reads and read length is 49. A rough calculation indicates the average coverage is ~60x. Even so, it took more than one day to get it done!


#What's next
Have a glimpse on the result on deFuse. 
Repeat the analysis with Tophat-Fusion and FusionMap.



Learning CoffeeScript

Supposedly I want to declare a closure nested method in JavaScript like this:



###Sample1
//declaration
 function addInternal(x)
 { 
    //return anonymous nested function
     return function(y) { console.log(x+y);};
 }

//use
var addByTen  = addInternal(10);
addByTen(15);



.

or wrap it up as a variable

###Sample2, assign the parameter to outer function addInternal in declaration
var addByTen  = (function addInternal(x)
{
    //return anonymous nested function
    return function(y){console.log(x+y);};
})(10);

//use
addByTen(15);


The corresponding CoffeeScript version is very tight:

###sample1
addInternal = (x) ->
 (y) -> console.log x+y

addByTen = addInternal 10
addByTen 15

###sample2
addByTen = ((x) ->
 (y) -> console.log x+y ) (10)

addByTen 15