Monday, March 14, 2011

GNUMAP and novoalign benchmark

Data:
7550X1_1_1.txt.gz and 7550X1_1_2.txt.gz, pair-end, 25000 reads, 101bp per read.

chr20.fa, human genome chromosome 20, build hg19.


Command:
$gnumap-plain -g chr20.fa -o example.output -a .9 -v 1 --illumina 7550X1_1_1.txt.gz 7550X1_1_2.txt.gz



Output:

This is GNUMAP, Version 2.2.0, for public and private use.

Command Line Arguments: gnumap-plain -g chr20.fa -o example.output -a .9 -v 1 --illumina 7550X1_1_1.txt,7550X1_1_2.txt
Parameters:
Verbose: 1
Genome file(s): chr20.fa
Output file: example.output
Sequence file(s): 7550X1_1_1.txt,7550X1_1_2.txt
Align score: 0.9
Number of threads: 1
Mer size: 10
Using jump size of 5
Using Default Alignment Scores
Gap score: -4
Maximum Gaps: 3

Hashing the genome.
[0/1] gen_size=0, my_start=0, my_end=0
chr20.fa
Hashing Genome...
Reading: chr20
..............................................
Size of genome: 63025520
[0/-] Converting to Vector...
[0/-] Trying to create hash of size 1048576
[0/-] Finished create hash.
[0/-] Stats: Total hashes is 59499396, Longest hash is 69516, shortest is 1, and average is 56.743046
[0/-] Trying to create a new genome with a size of 63025520...Success!
[0/-] Trying to malloc 7878190 elements for positions array...Success!
[0/-] Finished Vector Conversion

Time to hash: 12 seconds
Matching 2 file(s):

[-/0] Matching 25000 sequences of: 7550X1_1_1.txt
Reads per processor: 128
[0/0] 33% reads complete
[0/0] 66% reads complete
[0/0] 98% reads complete

[-/0] Matching 25000 sequences of: 7550X1_1_2.txt
[0/0] 0% reads complete
[0/0] 33% reads complete
[0/0] 66% reads complete
[0/0] 98% reads complete
[0/0] 98% reads complete


[0/-] Time since start: 4890.28

[0/-] Printing output.
Finished printing to example.output.sgr

#Finished.
# Total Time: 4890.38 seconds.
# Found 49152 sequences.
# Sequences matched: 13921
# Sequences not matched: 35231
# Output written to example.output
Total wait time is 0.000000



The same data using novoalign(for fairness, remove license file to avoid threading and using unzipped fastq files):

$novoalign -o SAM $'@RG\tID:YING\tPL:ILLUMINA\tLB:LB_TEST\tSM:7851\tCN:HCI' -F ILMFQ -d chr20.nix -f 7550X1_1_1.txt 7550X1_1_2.txt >7550x1.sam

# novoalign (2.07.05 - Nov 29 2010 @ 13:34:51) - A short read aligner with qualities.
# (C) 2008 NovoCraft
# Licensed for evaluation, educational, and not-for-profit use only.
# novoalign -o SAM @RG ID:YING PL:ILLUMINA LB:LB_TEST SM:7851 CN:HCI -F ILMFQ -d chr20.nix -f 7550X1_1_1.txt 7550X1_1_2.txt
# Interpreting input files as Illumina FASTQ, Cassava Pipeline 1.3.
# Index Build Version: 2.7
# Hash length: 14
# Step size: 1
# Paired Reads: 25000
# Pairs Aligned: 368
# Read Sequences: 50000
# Aligned: 767
# Unique Alignment: 749
# Gapped Alignment: 18
# Quality Filter: 490
# Homopolymer Filter: 32
# Elapsed Time: 59.363 (sec.)
# CPU Time: 0.9 (min.)

No comments:

Post a Comment