Requirement:
Given a huge (~tens of G) FASTQ file from Illunima NGS, align the short sequences in the FASTQ file to a reference genome. Because the file is too huge, so it should be splitted into small segments then assign the small segments to parallel alignment using a multiple CPU SMP or multiple core CPU.
My current solution:
1. Split the big file on-the-fly
2. Using "multiprocessing" and "subprocess" to do the work
Made some preliminary test code on my desktop, which seems promising. The code should be finished in 3 days if everything goes well.
Yesterday I was working on a test code to parallel the executation og
No comments:
Post a Comment