Monday, June 28, 2010




Thursday, June 24, 2010

The first version of PyDAS2

from amara import bindery

class PyDAS2Element(object):
def __init__(self,**kwargs):
for k,v in kwargs.items():
def __str__(self):
return str(self.__class__.__name__)[6:]

def attribute_wrapper(attr):
ret = {}
for k,v in attr.items():
ret[str(k[1])] = str(v)
return ret

def make_element(name,obj):
#return eval('PyDAS2'+name)(**attribute_wrapper(obj.xml_attributes))
return PyDAS2Element(**attribute_wrapper(obj.xml_attributes))

def recursive_parse(root,obj_list,parent_name=None):
if parent_name:
par_name = '.'.join((parent_name,root.__class__.__name__))
par_name = root.__class__.__name__
for i in range(0,len(root.xml_children)): #iterate all child element nodes
if root.xml_children[i].xml_type == 'element':
ele_name = root.xml_children[i].__class__.__name__
ele_obj = root.xml_children[i]
#print par_name,ele_name,attribute_wrapper(ele_obj.xml_attributes)

def parse(xml):
objs = []
root = bindery.parse(xml)
return objs

def test_source(xml):
objs = parse(xml)
base = ''
for par_name,ele_name,ele_attr in objs:
print par_name,ele_name,ele_attr
#if ele_name == 'SOURCES':
# base = ele_attr['base']
#if ele_name == 'VERSION':
# print ''.join((base,ele_attr['uri']))

#if ele_name == 'CAPABILITY':
# if ele_attr['type'] == 'features':
# print ''.join((base,ele_attr['query_uri']))

if __name__=='__main__':
feature_url = """


Monday, June 21, 2010

4Suite and Amara

#see python dist
apt-cache showsrc python

#install python with source code
apt-get install python-dev

#install gcc
apt-get install gcc

#install 4Suite
tar -zxvf 4Suite-XML-1.0.2.tar.gz
cd 4Suite-XML-1.0.2
python install

#install Amara
tar -zxvf Amara-2.0a4.tar.gz
cd Amara-2.0a4
python install

Kill the Galaxy process

kill -9 `ps -aux | grep universe_wsgi.ini | awk '{print $2}'`

Friday, June 18, 2010

Test on DAS/2 Source
will return a XML document that lists available DAS/2 Sources in Affymetrix.

How to present the returned XML to the end user? XLST? No build-in packages in official release. probably using some other XSLT engine e.g. Pyana or 4Suite?

Add a new tool to Galaxy

To add a new tool named "novoalign" for NGS alignment.

1. Add "novoalign_wrapper.xml" to "\tools\sr_mapping"

< tool id="novoalign_wrapper" name="Map with Novoalign" version="1.0.6" >

2. Add "" to "\tools\sr_mapping"
which is used to parse parameters and call the novolaign

3. append a line to "\tool_conf.xml" under section
< section name="NGS: Mapping" id="solexa_tools">

< tool file="sr_mapping/novoalign_wrapper.xml">

4. Restart the Galaxy

Thursday, June 17, 2010

Eclipse works now

Finally plug-ins can be installed into Eclipse on Ubuntu now.
The solution is very simple: run as root. This take me 3 days to find.



Galaxy hg clone galaxy
Genoviz svn co genoviz


An error occurred while installing the items
session context was:(profile=PlatformProfile, phase=org.eclipse.equinox.internal.provisional.p2.engine.phases.Install, operand=null --> [R]org.eclipse.cvs 1.0.400.v201002111343, action=org.eclipse.equinox.internal.p2.touchpoint.eclipse.actions.InstallBundleAction).
Cannot connect to keystore.
This trust engine is read only.
The artifact file for osgi.bundle,org.eclipse.cvs,1.0.400.v201002111343 was not found.

Wednesday, June 16, 2010

Reinstall Eclipse with 32bit


Galaxy hg clone galaxy
Genoviz svn co genoviz

Monday, June 14, 2010

Eclipse on Ubuntu Server 10.0.4 LT

Bug in Eclipse-3.5.2 running on Ubuntu 10.0.4 LT

Try install eclipse

>tar -zxvf eclipse
>cd eclipse-R3_5_1-fetched-src/

Integrate bowtie to Galaxy

1. Download

2. unzip

3. clean
>rm -fr

4. test
>cd bowtie-0.12.5/
>./bowtie --version

5. download pre-build indexes (UCSC HG19)

>cd /home/ying/tools/bowtie-0.12.5
>mkdir -p data/UCSC
>mv *.ebwt /home/ying/tools/bowtie-0.12.5/data/UCSC
>ls /home/ying/tools/bowtie-0.12.5/data/UCSC
hg19.1.ebwt hg19.2.ebwt hg19.3.ebwt hg19.4.ebwt hg19.rev.1.ebwt hg19.rev.2.ebwt

6. Change the configuration file in galaxy
>vi /home/ying/galaxy/tool-data/bowtie_indices.loc
It should has this line:

hg19 /home/ying/tools/bowtie-0.12.5/data/UCSC/hg19

7. Restart the Galaxy

8. Make a online test
8.1 Upload a FASTQ file.
8.2 Apply FASTAQ Groomer
8.3 Run Bowtie

Friday, June 11, 2010

Illumina CASAVA

Consensus Assessment of Sequence and Variation (CASAVA)

Virtual Machine, Host, Guest, NAT, port map

All these complicated problems!

Monday, June 7, 2010

Galaxy Installation

#install screen
tar -zxvf screen-4.0.3.tar.gz
rm -fr screen-4.0.3.tar.gz

#install Python2.6.5
1. wget
2. tar -zxvf Python-2.6.5.tgz
3. rm -fr Python-2.6.5.tgz
4. cd Python-2.6.5/
5. ./configure --prefix=/home/u0592675/py2.6.5
6. make && make install

#Intsall Mercurial
1. wget
2. tar -zxvf mercurial-1.5.4.tar.gz
3. rm mercurial-1.5.4/ -fr
4. cd mercurial-1.5.4/
5. python install --home=/home/u0592675/tools/mercurial

#update ~/.bash_profile
PATH = $home/py2.6.5/bin:$home/mercurial/bin:$PATH
export PATH
export DISPLAY=

#Download Galaxy
hg clone galaxy

#Build and install Galaxy

host #


1. Mercurial failed to start due to the PYTHONPATH. Have to install Mercurial in Windows, then get the galaxy distribution.
2. Can not access Galaxy using web browser outside the host machine which blocked ports like 8080.

Saturday, June 5, 2010

Multi-thread and Multi-process


Given a huge (~tens of G) FASTQ file from Illunima NGS, align the short sequences in the FASTQ file to a reference genome. Because the file is too huge, so it should be splitted into small segments then assign the small segments to parallel alignment using a multiple CPU SMP or multiple core CPU.

My current solution:
1. Split the big file on-the-fly
2. Using "multiprocessing" and "subprocess" to do the work

Made some preliminary test code on my desktop, which seems promising. The code should be finished in 3 days if everything goes well.

Yesterday I was working on a test code to parallel the executation og

Friday, June 4, 2010

Python 2.6.5 failed to make on Linux

Failed to find the necessary bits to build these modules:
bsddb185 dl imageop
To find the necessary bits, look in in detect_modules() for the module's name.

Failed to build these modules:
binascii zlib

The binascii is required for pickle and thus all other libraries depend on the pickle.

Thursday, June 3, 2010


Alignment is a process to align the sequence fragments to the reference sequence.

Many (at least 10) alignment software for NGS are available now including


Bioinformatics has a special collection of published papers/applications which were designed for various data analysis on NGS.


I started my new job in Bioinformatics Core Facility from June 1, 2010. Much of the future work will focus on the next generation sequencing(NGS) data analysis . So far I only know a little on what is NGS and now I am reading some papers and documents on the NGS to get some basic knowledge.