Next Generation Sequencing and Data Analysis: February 2015

Goal:

build a streamlined workflow from Sequencer to Analysis results directly, aka "one stop solution".

After a user starts an experiment by pushing a button on the touch screen on the sequencer, all the downstream analysis are automatically done.

The outputs from MiSeq will be transfered to another Linux machine, from where the intensities files will be converted into FASTQ files. After that, depending on the type of this experiment, an appropriate pipeline is launched on a cluster computer to processing these data. At last a report is generated and uploaded to a web server for the user to view. Everything was done automatically without any manual operations. No bioinformaticians or computer scientists involved in this process.

Here I list the steps for the installation of "bcl2fastq" on the Linux machine.

#system
Ubuntu 14.04 Server 64 bit

#package
bcl2fastq-1.8.4

#dependency
sudo apt-get install alien dpkg-dev debhelper build-essential xsltproc gnuplot -y

#make a tmp folder
mkdir -p ~/tmp ; cd ~/tmp

#download the bcl2fastq RPM package from illumina.
#The tar ball source code failed to compile on my system

wget ftp://webdata:webdata@ussd-ftp.illumina.com/Downloads/Software/bcl2fastq/bcl2fastq-1.8.4-Linux-x86_64.rpm

#http://seqanswers.com/forums/showthread.php?t=45649
sudo alien -i bcl2fastq-1.8.4-Linux-x86_64.rpm
curl -kL http://install.perlbrew.pl | bash
echo >> ~/.bash_profile "source ~/perl5/perlbrew/etc/bashrc"
perlbrew install perl-5.14.4
perlbrew switch perl-5.14.4
perlbrew install-cpanm

#install expat-2.1.0
wget http://downloads.sourceforge.net/project/expat/expat/2.1.0/expat-2.1.0.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fexpat%2Ffiles%2Fexpat%2F2.1.0%2F&ts=1424461084&use_mirror=softlayer-dal
tar -zxvf expat-2.1.0.tar.gz
cd expat-2.1.0 && ./configure && make && sudo make install

#install XML-Parser-2.41
wget http://pkgs.fedoraproject.org/repo/pkgs/perl-XML-Parser/XML-Parser-2.41.tar.gz/c320d2ffa459e6cdc6f9f59c1185855e/XML-Parser-2.41.tar.gz
cd XML-Parser-2.41 && perl Makefile.PL && make && sudo make install

#install XML module
cpanm XML/Simple.pm

#exit
exit

#the installation is done. Now make a test run.

#To run bcl2fastq we have to switch to a less-strict PERL environment
perlbrew switch perl-5.14.4

#assume "test/Data/Intensities/BaseCalls" is the output folder from your sequecning machine
/usr/local/bin/configureBclToFastq.pl --input-dir test/Data/Intensities/BaseCalls --output-dir output

#change to new output folder "output" and start the "from bcl to fastq" conversion
cd output && make -j $(grep -c ^processor /proc/cpuinfo)

#if you find "INFO: all completed successfully" in the last line of output then the test passed
#check out the output fastq files, here the "000000000-ABCDEF" is the FCID (Flow Cell ID)
ls -al Project_000000000-ABCDEF/Sample_lane1/

##################################
UPDATE!!!

interestingly, illumina claimed
"""
Use the bcl2fastq 2.15.0 conversion software to convert NextSeq 500 or HiSeq X output.
Version 2.15.0 is only for use with NextSeq and HiSeq X data.
Use bcl2fastq 1.8.4 for MiSeq and HiSeq data conversion.
The software is available for download in either an rpm or tarball (.tar.gz) format.
"""
http://support.illumina.com/downloads.html

However I found after the MiSeq's outputs were uploaded into BaseSpace, it actually
converted by bcl2fastq 2.15.0.
One key difference is that the "SampleSheet.csv" has different formats for "bcl2fastq 1.8.4" and "bcl2fastq 2.15.0".
The "SampleSheet.csv" generated by MiSeq match the format for "bcl2fastq 2.15.0".

It is a breeze to install bcl2fastq2-v2.15.0 on Ubuntu from source code.
It is a mission impossible (amlost, I spent a whole day trying different methods then gave up)to do so with bcl2fastq-1.8.4 from source code

wget ftp://webdata2:webdata2@ussd-ftp.illumina.com/downloads/Software/bcl2fastq/bcl2fastq2-v2.15.0.4.tar.gz
tar -zxvf bcl2fastq2-v2.15.0.4.tar.gz
cd bcl2fastq && mkdir build && cd build
../src/configure --prefix=/home/hadoop/tool/bcl2fastq && make && make install

#now make a test. Assuming "exp123" is our input folder
#firstly all the ".bcl" files MUST be gzipped otherwise the job will fail (need improvements here, illumina!)

find exp123 -name "*.bcl" -exec gzip {} \;

#pull the trigger
/home/hadoop/tool/bcl2fastq/bin/bcl2fastq -R exp123 -o exp123_fastq

Nice! Sorry I take back my complaining on illumina's software engineering team this morning after frustration on installing and running bcl2fastq 1.8.4

Good job, illumina team on bcl2fastq 2.15.0. You have my love again.

Next Generation Sequencing and Data Analysis

Friday, February 20, 2015

Install bcl2fastq-1.8.4 and bcl2fastq 2.15.0 to Ubuntu 14.04 Server 64 Bit

About Me

Blog Archive