Tuesday, September 21, 2010

Simulate data for pipeline

Organsim, CE6

#download
wget http://sourceforge.net/projects/maq/files/maq-data/20080929/calib-36.dat.gz/download

#unzip
gzip -d calib-36.dat.gz

#simulate reads
maq simulate ce6_1.fq ce6_2.fq ce6.fa calib-36.dat

#make indexed genome
novoindex ce6.ndx ce6.fa

#Align reads to ce6 genome
time novoalign -d ce6.ndx -d ce6_1.fq ce6_2.fq | grep chr > A.ce6.nal

real 1m43.108s
user 6m39.372s
sys 0m4.652s

#Format to MAQ
novo2maq A.map - A.ce6.nal

#Convert the reference sequences to the binary fasta format
#maq fasta2bfa ref.fasta ref.bfa

#Build the mapping assembly
maq assemble A.cns ref.bfa A.map 2>assemble.log

#Extract consensus sequences and qualities
maq cns2fq A.cns >A.cns.fq

#Extract list of SNPs
maq cns2snp A.cns >A.snp


Wednesday, September 15, 2010

update potato source code

#!/bin/sh
ssh hiseq@hci-bio2.hci.utah.edu "rsync -arue ssh ying@155.100.235.73:~/workspace/potato/src/*.py potato"
rsync -arue ssh hiseq@hci-bio2.hci.utah.edu:~/potato/*.py .

Tuesday, September 14, 2010

delete all jobs with name=hello

qstat -au u0592675 | grep 'hello' | awk '{print $1}' | awk -F '.' '{print $1}' | xargs qdel

Friday, September 10, 2010

Supported Genome Builds

Human (hg19, Feb_2009, GRCh37)

single-end, HiSeq2000, Rong Mao Lab

Human (hg18, Mar_2006, NCBI Build 36.1)

DNA-methylation, pair-end, Hiseq2000, Brad Cairns Lab

Chip-seq, single-end, GAIIx, Brad Cairns Lab

C_elegans (CE6, May_2008, WS190)

Small RNA sequencing, single-end, GAIIx, Brenda Bass Lab

A_Thaliana (TAIR8, Mar_2008)

Single-end, HiSeq2000, Jason Stajich Lab

M_musculus (mm9, Jul_2007, NCBI Build 37)

ChIP-Seq, single-end, HiSeq2000, Anne Moon Lab

DNA-methylation, pair-end, HiSeq2000, Brad Cairns Lab

D_rerio (Zv8, Jun_2008)

ChIP-Seq, single-end, HiSeq2000, Brad Cairns Lab

Friday, September 3, 2010

No available computational resource at all

1. I submitted a PBS job last Friday which only asked one node, one CPU and five seconds of walltime, now seven days passed, the job is still in the waiting queue. I can not make any tests on the cluster at all.

2. I tried to use Amazon's Cloud Server to do the alignment. However, to do so will require the purchasing of this service. We do not have a public account.





Torque and S3cmd

tmp> wget http://www.clusterresources.com/downloads/torque/torque-2.5.1.tar.gz
tmp> tar -zxvf torque-2.5.1.tar.gz
tmp> cd torque-2.5.1
torque-2.5.1> ./configure && make && make install
torque-2.5.1> ./torque-package-mom-linux-x86_64.sh --install
torque-2.5.1> ./torque-package-client-linux-x86_64.sh --install
torque-2.5.1> cp contrib/init.d/pbs_mom /etc/init.d/pbs_mom
torque-2.5.1> update-rc.d pbs_mom defaults

Modify "/etc/ld.so.conf"
###############################
include /etc/ld.so.conf.d/*.conf
/usr/local/lib
###############################

>ldconfig

torque-2.5.1> ./torque.setup root
initializing TORQUE (admin: root@ubuntu.localdomain)
Max open servers: 4
set server operators += root@ubuntu.localdomain
Max open servers: 4
set server managers += root@ubuntu.localdomain


torque-2.5.1> qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = ubuntu
set server managers = root@ubuntu.localdomain
set server operators = root@ubuntu.localdomain
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300

$TORQUEHOME is /var/spool/torque/


S3CMD


>wget http://sourceforge.net/projects/s3tools/files/s3cmd/0.9.9.91/s3cmd-0.9.9.91.tar.gz/download
>sudo python setup.py install

>s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3
Access Key: 1qaz2wsx
Secret Key: 1qaz2wsx

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3

Wednesday, September 1, 2010

Amazon Cloud

Amazon S3 is a reasonably priced data storage service. Ideal for off-site backups, archiving and other data storage needs. Check out About Amazon S3 section to find out more.

S3cmd is a command line tool for uploading, retrieving and managing data in Amazon S3. It is best suited for power users who don't fear command line. It is also ideal for scripts, automated backups triggered from cron, etc.