Build a pipeline which:
1. Use novoalign to align some data to HG19.
2. Parse(transform) the outputs of novoalign for genomic variants identification:
 2.1 SNP call (use MAQ or SOAP)
 2.2 Indel
 2.3 inversion, duplication etc.
3. Investigate the affects of genomic variance on gene. Synoymous/ Non-Synoymous mutation? using USeq, etc.
4. Visualization of these effects using IGB, UCSC, etc.
***************************************************************************
#download human genome build, build index for novoalign
>pwd #the path for genome build
/home/galaxy/tools/novoalign-2.06.09/genome/HG19
>wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
>tar -zxvf chromFa.tar.gz 
#now we have a list of .fa files for chromosomes.
>rm chromFa.tar.gz 
#use novoindex to build a index
>novoindex hg19.novoindex *.fa
#size is ~6.5G
>ll hg19.novoindex
-rwxr--r-- 1 root root 6505422914 2010-08-13 10:50 hg19.novoindex*
#make a test using 25 sequences
>time novoalign -d genome/HG19/hg19.novoindex -f read/test.txt >hello.txt
real    0m47.846s
user    0m0.180s
sys     0m5.220s
#check out the result
>less hello.txt 
# novoalign (2.06.09MT - Jun 16 2010 @ 12:36:05) - A short read aligner with qualities.
# (C) 2008 NovoCraft
# Licensed for evaluation, educational, and not-for-profit use only.
#  novoalign -d genome/HG19/hg19.novoindex -f read/test.txt 
# Interpreting input files as Illumina FASTQ, Cassava Pipeline 1.3.
# Index Build Version: 2.6
# Hash length: 14
# Step size: 3
....
#     Read Sequences:       25
#            Aligned:       22
#   Unique Alignment:       22
#   Gapped Alignment:        1
#     Quality Filter:        1
# Homopolymer Filter:        0
#       Elapsed Time: 0,035s
# Done.
***************************************************************************
#Install MAQ
>pwd
/home/galaxy/tmp
>wget http://sourceforge.net/projects/maq/files/maq/0.7.1/maq-0.7.1.tar.bz2/download
>tar -jxvf  maq-0.7.1.tar.bz2 
>cd maq-0.7.1
>./configure --prefix=/home/galaxy/tools/maq-0.7.1 && make && make install
***************************************************************************
#Install SOAPsnp
#boost(http://sourceforge.net/projects/boost/) is required to build SOAPsnp
>wget http://sourceforge.net/projects/boost/files/boost/1.43.0/boost_1_43_0.tar.gz/download
>tar -zxvf boost_1_43_0.tar.gz
>cd boost_1_43_0
#move the boost directory to the system path so the gcc can find it.
>mv boost /usr/local/include/
#download SOAPsnp
>wget http://soap.genomics.org.cn/down/SOAPsnp-v1.03.tar.gz
>cd SOAPsnpZ
>make all #this will make a single executable file, soapsnp 
>ll soapsnp
-rwxr-xr-x 1 root root 123425 2010-08-13 12:04 soapsnp*
#move it to $PATH
>mv soapsnp /home/galaxy/tools
>soapsnp 
SoapSNP
Compulsory Parameters:
-i  Input SORTED Soap Result 
-d  Reference Sequence in fasta format 
-o  Output consensus file 
......
***************************************************************************
#download the sequencing data
>lftp -u USERNAME,PASSWORD ftp://cdts.genomics.org.cn
#do not know which data should be downloaded. So go to GenomEx.
No comments:
Post a Comment