Thursday, August 23, 2012

dbSNP137 for hg19


 1. wget ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz

 2. gunzip 00-All.vcf.gz

 3. awk '/^#/ {print $0}' 00-All.vcf > head.txt

 4. sed -i 's/chrMT/chrM/g' head.txt

 5. awk '/^#/ {next}{print $0}' 00-All.vcf |  sed 's/^/chr/' > 1.vcf

 6. sed -i 's/chrMT/chrM/g' 1.vcf
 
 7. cat head.txt 1.vcf > hg19.dbsnp.vcf 
 
 8. IGVTools/igvtools index hg19.dbsnp.vcf

 9. awk '/^#/ {next}{print $1}' hg19.dbsnp.vcf | sort |uniq

1 comment:

  1. Many thanks for this point by point list of commands.
    I have tried the indexing by using Samtools index and I've got the error:
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    [bam_index_core] Invalid BAM header.[bam_index_build2] fail to index the BAM file.
    Does the IGVTools index have the same problem.
    Thanks in advance,

    Francesco Maria Calabrese
    University of Naples Federico II

    Dept. of Molecular Medicine and Medical Biotechnologies

    CEINGE Biotecnologie Avanzate
    Via Gaetano Salvatore, 486
    (giĆ  Via Comunale Margherita, 482)
    80145 Napoli Italy

    ReplyDelete