Monday, February 27, 2012

Intersecting VCF

We can use vcftools (together with bgzip and tabix) to intersect VCF files. The first 4 columns for each VCF file(chr, position, ref and alt) can be easily overlapped. However it is undefined on how to "intersecting scores" in VCF files because it is not a simple arithmetic operation. To generate a VCF format compatible file, vcftools used the scores from the first VCF file only when intersecting multiple VCF files. This is simple but not a perfect solution.

Another possible solution is to merge the BAM firstly before calling variances. Therefore we will have only one VCF file after calling variances.

Is this perfect solution? Maybe. Merging alignment BAM files from different samples (individuals) is not well studied. We need a well known dataset to benchmark these two methods.

No comments:

Post a Comment