Monday, June 28, 2010

Packages

Mako
http://www.makotemplates.org/downloads/Mako-0.3.4.tar.gz

webob
http://pypi.python.org/packages/source/W/WebOb/WebOb-0.9.8.tar.gz#md5=07d1a1a4b0bf0faa67cb6638c632ea61

Thursday, June 24, 2010

The first version of PyDAS2


from amara import bindery

class PyDAS2Element(object):
def __init__(self,**kwargs):
for k,v in kwargs.items():
setattr(self,k,v)
def __str__(self):
return str(self.__class__.__name__)[6:]


def attribute_wrapper(attr):
ret = {}
for k,v in attr.items():
ret[str(k[1])] = str(v)
return ret

def make_element(name,obj):
#return eval('PyDAS2'+name)(**attribute_wrapper(obj.xml_attributes))
return PyDAS2Element(**attribute_wrapper(obj.xml_attributes))

def recursive_parse(root,obj_list,parent_name=None):
if parent_name:
par_name = '.'.join((parent_name,root.__class__.__name__))
else:
par_name = root.__class__.__name__
for i in range(0,len(root.xml_children)): #iterate all child element nodes
if root.xml_children[i].xml_type == 'element':
ele_name = root.xml_children[i].__class__.__name__
ele_obj = root.xml_children[i]
#print par_name,ele_name,attribute_wrapper(ele_obj.xml_attributes)
obj_list.append((par_name,ele_name,attribute_wrapper(ele_obj.xml_attributes)))
recursive_parse(root.xml_children[i],obj_list,par_name)

def parse(xml):
objs = []
root = bindery.parse(xml)
recursive_parse(root,objs)
return objs


def test_source(xml):
objs = parse(xml)
base = ''
for par_name,ele_name,ele_attr in objs:
print par_name,ele_name,ele_attr
#if ele_name == 'SOURCES':
# base = ele_attr['base']
#if ele_name == 'VERSION':
# print ''.join((base,ele_attr['uri']))

#if ele_name == 'CAPABILITY':
# if ele_attr['type'] == 'features':
# print ''.join((base,ele_attr['query_uri']))

if __name__=='__main__':
feature_url = """http://netaffxdas.affymetrix.com/das2/das2/genome/H_sapiens_Mar_2006/features?
segment=http://netaffxdas.affymetrix.com/das2/sequence/H_sapiens_Mar_2006/chr21;
overlaps=26010000:26060000;
type=http://netaffxdas.affymetrix.com/das2/sequence/H_sapiens_Mar_2006/knownGene
"""

#test_source('http://netaffxdas.affymetrix.com/das2/sources')
#test_source('http://netaffxdas.affymetrix.com/das2/genome/H_sapiens_Feb_2009/segments')
test_source('http://netaffxdas.affymetrix.com/das2/genome/H_sapiens_Mar_2006/types')
#test_source('http://netaffxdas.affymetrix.com/das2/genome/H_sapiens_Feb_2009/features')


Monday, June 21, 2010

4Suite and Amara

#see python dist
apt-cache showsrc python

#install python with source code
apt-get install python-dev

#install gcc
apt-get install gcc

#install 4Suite
wget http://sourceforge.net/projects/foursuite/files/4Suite/XML-1.0.2/4Suite-XML-1.0.2.tar.gz/download
tar -zxvf 4Suite-XML-1.0.2.tar.gz
cd 4Suite-XML-1.0.2
python setup.py install

#install Amara
wget http://pypi.python.org/packages/source/A/Amara/Amara-2.0a4.tar.gz#md5=3594ca632fb83796037dc75cebdff161
tar -zxvf Amara-2.0a4.tar.gz
cd Amara-2.0a4
python setup.py install

Kill the Galaxy process

kill -9 `ps -aux | grep universe_wsgi.ini | awk '{print $2}'`

Friday, June 18, 2010

Test on DAS/2 Source

http://netaffxdas.affymetrix.com/das2/sources
will return a XML document that lists available DAS/2 Sources in Affymetrix.

How to present the returned XML to the end user? XLST? No build-in packages in official release. probably using some other XSLT engine e.g. Pyana or 4Suite?


Add a new tool to Galaxy

To add a new tool named "novoalign" for NGS alignment.

1. Add "novoalign_wrapper.xml" to "\tools\sr_mapping"

< tool id="novoalign_wrapper" name="Map with Novoalign" version="1.0.6" >
...
</tool>

2. Add "novoalign_wrapper.py" to "\tools\sr_mapping"
which is used to parse parameters and call the novolaign

3. append a line to "\tool_conf.xml" under section
< section name="NGS: Mapping" id="solexa_tools">

< tool file="sr_mapping/novoalign_wrapper.xml">

4. Restart the Galaxy

Thursday, June 17, 2010

Eclipse works now

Finally plug-ins can be installed into Eclipse on Ubuntu now.
The solution is very simple: run as root. This take me 3 days to find.

Eclipse

Plug-in:
Mercurial http://cbes.javaforge.com/update
PyDev http://pydev.org/updates
Subversion http://subclipse.tigris.org/update_1.6.x

Project:
Galaxy hg clone http://bitbucket.org/galaxy/galaxy-central galaxy
Genoviz svn co https://genoviz.svn.sourceforge.net/svnroot/genoviz genoviz

Eclipse

An error occurred while installing the items
session context was:(profile=PlatformProfile, phase=org.eclipse.equinox.internal.provisional.p2.engine.phases.Install, operand=null --> [R]org.eclipse.cvs 1.0.400.v201002111343, action=org.eclipse.equinox.internal.p2.touchpoint.eclipse.actions.InstallBundleAction).
Cannot connect to keystore.
This trust engine is read only.
The artifact file for osgi.bundle,org.eclipse.cvs,1.0.400.v201002111343 was not found.

Wednesday, June 16, 2010

Reinstall Eclipse with 32bit

Plug-in:
Mercurial http://cbes.javaforge.com/update
PyDev http://pydev.org/updates
Subversion http://subclipse.tigris.org/update_1.6.x

Project:
Galaxy hg clone http://bitbucket.org/galaxy/galaxy-central galaxy
Genoviz svn co https://genoviz.svn.sourceforge.net/svnroot/genoviz genoviz

Monday, June 14, 2010

Eclipse on Ubuntu Server 10.0.4 LT

Bug in Eclipse-3.5.2 running on Ubuntu 10.0.4 LT


Try install eclipse 3.5.1.org.tar.gz


>tar -zxvf eclipse 3.5.1.org.tar.gz
>cd eclipse-R3_5_1-fetched-src/
>./build.sh


Integrate bowtie to Galaxy

1. Download
>wget http://sourceforge.net/projects/bowtie-bio/files/bowtie/0.12.5/bowtie-0.12.5-linux-x86_64.zip/download

2. unzip
>unzip bowtie-0.12.5-linux-x86_64.zip

3. clean
>rm -fr bowtie-0.12.5-linux-x86_64.zip

4. test
>cd bowtie-0.12.5/
>./bowtie --version

5. download pre-build indexes (UCSC HG19)

>wget ftp://ftp.cbcb.umd.edu/pub/data/bowtie_indexes/hg19.ebwt.zip
>unzip hg19.ebwt.zip
>cd /home/ying/tools/bowtie-0.12.5
>mkdir -p data/UCSC
>mv *.ebwt /home/ying/tools/bowtie-0.12.5/data/UCSC
>ls /home/ying/tools/bowtie-0.12.5/data/UCSC
hg19.1.ebwt hg19.2.ebwt hg19.3.ebwt hg19.4.ebwt hg19.rev.1.ebwt hg19.rev.2.ebwt

6. Change the configuration file in galaxy
>vi /home/ying/galaxy/tool-data/bowtie_indices.loc
It should has this line:

#############################
hg19 /home/ying/tools/bowtie-0.12.5/data/UCSC/hg19
#############################

7. Restart the Galaxy

8. Make a online test
8.1 Upload a FASTQ file.
8.2 Apply FASTAQ Groomer
8.3 Run Bowtie

Friday, June 11, 2010

Monday, June 7, 2010

Galaxy Installation

#install screen
wget http://ftp.gnu.org/gnu/screen/screen-4.0.3.tar.gz
tar -zxvf screen-4.0.3.tar.gz
rm -fr screen-4.0.3.tar.gz

#install Python2.6.5
1. wget http://python.org/ftp/python/2.6.5/Python-2.6.5.tgz
2. tar -zxvf Python-2.6.5.tgz
3. rm -fr Python-2.6.5.tgz
4. cd Python-2.6.5/
5. ./configure --prefix=/home/u0592675/py2.6.5
6. make && make install

#Intsall Mercurial
1. wget http://mercurial.selenic.com/release/mercurial-1.5.4.tar.gz
2. tar -zxvf mercurial-1.5.4.tar.gz
3. rm mercurial-1.5.4/ -fr
4. cd mercurial-1.5.4/
5. python setup.py install --home=/home/u0592675/tools/mercurial


#update ~/.bash_profile
PATH = $home/py2.6.5/bin:$home/mercurial/bin:$PATH
export PATH
export DISPLAY=155.100.234.79:0.0

#Download Galaxy
hg clone http://www.bx.psu.edu/hg/galaxy galaxy

#Build and install Galaxy
./setup.sh
./run.sh

#test
host hci-bio1.hci.utah.edu #155.101.160.203
lynx http://155.101.160.203:8080


Problems:

1. Mercurial failed to start due to the PYTHONPATH. Have to install Mercurial in Windows, then get the galaxy distribution.
2. Can not access Galaxy using web browser outside the host machine which blocked ports like 8080.



Saturday, June 5, 2010

Multi-thread and Multi-process

Requirement:

Given a huge (~tens of G) FASTQ file from Illunima NGS, align the short sequences in the FASTQ file to a reference genome. Because the file is too huge, so it should be splitted into small segments then assign the small segments to parallel alignment using a multiple CPU SMP or multiple core CPU.

My current solution:
1. Split the big file on-the-fly
2. Using "multiprocessing" and "subprocess" to do the work

Made some preliminary test code on my desktop, which seems promising. The code should be finished in 3 days if everything goes well.



Yesterday I was working on a test code to parallel the executation og

Friday, June 4, 2010

Python 2.6.5 failed to make on Linux

Failed to find the necessary bits to build these modules:
bsddb185 dl imageop
sunaudiodev
To find the necessary bits, look in setup.py in detect_modules() for the module's name.


Failed to build these modules:
binascii zlib

The binascii is required for pickle and thus all other libraries depend on the pickle.

Thursday, June 3, 2010

Alignment

Alignment is a process to align the sequence fragments to the reference sequence.

Many (at least 10) alignment software for NGS are available now including

etc...


Bioinformatics has a special collection of published papers/applications which were designed for various data analysis on NGS.


First

I started my new job in Bioinformatics Core Facility from June 1, 2010. Much of the future work will focus on the next generation sequencing(NGS) data analysis . So far I only know a little on what is NGS and now I am reading some papers and documents on the NGS to get some basic knowledge.