Wednesday, March 14, 2012

Aligner and Genotyper

Recently I ran into a problem:

By viewing the alignment BAM file in IGV, we can observe a significant SNP at a position. All reads that mapped to that regions shows this single mutation. 

However, with GATK's UnifiedGenotyper, this SNP has a very low QUAL value (QUAL=7).

Not sure what caused this confusion, so at now I am trying to use two mapping applications novoalign and BWA  as aligner, and two genotyper application UnifiedGenotyper and samtool/mpileup to call variances.





Ignore an error

I have a long problem when scripting the PBS script to cluster.

To stop a pipeline whenever any errors happened, we can trap it

#!/bin/bash
failclean()
{    
echo 'catched!'
exit 1
}
trap 'failclean' ERR TERM
rm -fr * 
touch b
echo 'hello,world' > 1.txt
Idonotexist
echo 'OK!'
We will get 'catched!' as output because the command 'Idonotexist' will be fail.

In the real case, the first step should be cleaning all existing file and folder to release all local disk space on each cluster node.

"rm -fr *"

However, this command may throw an error because the file permission 

"rm: cannot remove abc.txt: Permission denied"

In fact, if we can not delete other files, it is fine. We just want to use as many disk space as possible.
However, with "trap" the above command will be caught by "trap" which in turn fail your whole script.

So, our questions is: how to ignore and supress any errors when using "rm -fr *".

The solution is:

"rm -fr * 2>/dev/null || true"

Now the final demo script looks like this:

#!/bin/bash
failclean()
{    
echo 'catched!'
exit 1
}
trap 'failclean' ERR TERM
rm -fr *  2>dev/null || true
touch b
echo 'hello,world' > 1.txt
Idonotexist
echo 'OK!'





Tuesday, March 13, 2012

Failed: Install and use VAT

tar -zxvf libbios-1.0.0.tar.gz
cd libbios-1.0.0
./configure --prefix=/home/app/VAT/out/libbios-1.0.0;make;make install

tar -zxvf gsl-1.15.tar.gz
cd gsl-1.15
./configure --prefix=/home/app/VAT/out/gsl-1.15;;make;make install


#cd gd-libgd
#./configure --prefix=/home/app/VAT/out/libgd

#blatSuite34
unzip blatSuite.34.zip -d /home/app/VAT/out/blatSuite-34

tar -zxvf libs3-2.0.tar.gz
cd libs3-2.0
make DESTDIR=/home/app/VAT/out/libs3 install

export CPPFLAGS="-I/home/app/VAT/out/gsl-1.15/include -I/home/app/VAT/out/libbios-1.0.0/include -I/home/app/VAT/out/libs3/include -ggdb"
export LDFLAGS="-L/home/app/VAT/out/gsl-1.15/lib -L/home/app/VAT/out/libbios-1.0.0/lib -L/home/app/VAT/out/libs3/lib"

tar -zxvf vat-2.0.0.tar.gz
cd vat-2.0.0

#vim default.vatrc
#TABIX_DIR /home/app/tabix
#VAT_EXEC_DIR /home/app/VAT/out/vat/bin

./configure --prefix=/home/app/VAT/out/vat;make;make install
export VAT_CONFIG_FILE=/home/app/.vatrc



cd /home/app/test
gunzip gencode.v11.annotation.gtf.gz
awk '/\t(HAVANA|ENSEMBL)\tCDS\t/ {print}' gencode.v11.annotation.gtf | grep 'gene_type "protein_coding"' | grep 'transcript_type "protein_coding"' > gencode.v11.annotation.cds.gtpc.ttpc.gtf
/home/app/VAT/out/vat/bin/gencode2interval < gencode.v11.annotation.cds.gtpc.ttpc.gtf > gencode.v11.annotation.cds.gtpc.ttpc.interval


./faToTwoBit -noMask /tomato/data/hg19.fasta hg19.2bit

export LD_LIBRARY_PATH=/home/app/VAT/out/libbios-1.0.0/lib:/home/app/VAT/out/libs3/lib:/home/app/VAT/out/gsl-1.15/lib



/home/app/VAT/out/vat/bin/interval2sequences hg19.2bit gencode.v11.annotation.cds.gtpc.ttpc.interval exonic > gencode.v11.annotation.cds.gtpc.ttpc.exonic.fasta
#twoBitReadSeqFrag in chr6 start (53371710) >= end (53371710)
#Segmentation fault

/home/app/VAT/out/vat/bin/interval2sequences hg19.2bit gencode.v11.annotation.cds.gtpc.ttpc.interval genomic> gencode.v11.annotation.cds.gtpc.ttpc.genomic.fasta
#pass

/home/app/VAT/out/vat/bin/snpMapper gencode.v11.annotation.cds.gtpc.ttpc.interval gencode.v11.annotation.cds.gtpc.ttpc.genomic.fasta 1
#Segmentation fault

/home/app/VAT/out/vat/bin/indelMapper gencode.v11.annotation.cds.gtpc.ttpc.interval gencode.v11.annotation.cds.gtpc.ttpc.genomic.fasta 2

Thursday, March 1, 2012

R, DESeq and qvalue installation

wget http://cran.cnr.berkeley.edu/src/base/R-2/R-2.14.2.tar.gz
tar -zxvf R-2.14.2.tar.gz
cd R-2.14.2;./configure --prefix=/home/R-2.14.2;make;make install
cd /home/R-2.14.2/bin

wget http://genomics.princeton.edu/storeylab/qvalue/qvalue_1.1.tar.gz
./R CMD INSTALL qvalue_1.1.tar.gz
rm -f qvalue_1.1.tar.gz

source("http://www.bioconductor.org/biocLite.R")
biocLite("DESeq")

phastConsElements46way and avsift

wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/phastConsElements46way.txt.gz
gunzip phastConsElements46way.txt.gz
mv phastConsElements46way.txt hg19_phastConsElements46way.txt
mv hg19_phastConsElements46way.txt /tomato/data
/tomato/app/annovar/convert2annovar.pl x1.pass.vcf -format vcf4 > x1.annovar
/tomato/app/annovar/annotate_variation.pl -regionanno --buildver hg19 -dbtype mce46way x1.annovar /tomato/data/


/tomato/app/annovar/annotate_variation.pl -downdb -buildver hg19 avsift .
mv hg19_avsift.txt* /tomato/data
/tomato/app/annovar/convert2annovar.pl x1.pass.vcf -format vcf4 > x1.annovar
/tomato/app/annovar/annotate_variation.pl -filter -buildver hg19 -dbtype avsift x1.annovar /tomato/data/