Monday, October 31, 2011
Thursday, October 27, 2011
Known SNPs
Three VCF Files for known SNPs from GATK resource
1000G_omni2.5.hg19.sites.vcf -> hg19.1000G_omni2.5.vcf
hapmap_3.3.hg19.sites.vcf -> hg19.hapmap_3.3.vcf
dbsnp_132.hg19.vcf -> hg19.dbsnp_132.vcf
Also copy corresponding ".idx" files to DATA_PATH
1000G_omni2.5.hg19.sites.vcf -> hg19.1000G_omni2.5.vcf
hapmap_3.3.hg19.sites.vcf -> hg19.hapmap_3.3.vcf
dbsnp_132.hg19.vcf -> hg19.dbsnp_132.vcf
Also copy corresponding ".idx" files to DATA_PATH
Friday, October 14, 2011
I am running out of disk space
again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again again
Multiple hits? All or None?
#-r All
57G Oct 14 09:44 1.sam
65G Oct 14 10:06 2.sam
65G Oct 14 10:26 3.sam
45G Oct 14 10:46 4.sam
56G Oct 14 11:09 5.sam
#-r None
56G Oct 11 02:10 1.sam
64G Oct 11 02:32 2.sam
63G Oct 11 02:53 3.sam
44G Oct 11 03:07 4.sam
54G Oct 11 03:25 5.sam
57G Oct 14 09:44 1.sam
65G Oct 14 10:06 2.sam
65G Oct 14 10:26 3.sam
45G Oct 14 10:46 4.sam
56G Oct 14 11:09 5.sam
#-r None
56G Oct 11 02:10 1.sam
64G Oct 11 02:32 2.sam
63G Oct 11 02:53 3.sam
44G Oct 11 03:07 4.sam
54G Oct 11 03:25 5.sam
Thursday, October 13, 2011
pre-processing of masked FASTA for novoalign
1. Convert lower case to upper case
sed -i 's/\(.*\)/\U\1/' hg19.fasta
> chr1
NNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNN
ggggggttttTGaaaaaaaCCC
Will be
> CHR1
NNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNN
GGGGGGGTTTTTTGAAAAACCC
2. covert '>CHR' to '>chr'
sed -i -e 's/>CHR/>chr/g' hg19.fasta
3. convert N to n
sed -i -e 's/N/n/g' hg19.masked.fasta
4. mask these UTR
novoindex -m hg19.nix hg19.fasta
sed -i 's/\(.*\)/\U\1/' hg19.fasta
> chr1
NNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNN
ggggggttttTGaaaaaaaCCC
Will be
> CHR1
NNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNN
GGGGGGGTTTTTTGAAAAACCC
2. covert '>CHR' to '>chr'
sed -i -e 's/>CHR/>chr/g' hg19.fasta
3. convert N to n
sed -i -e 's/N/n/g' hg19.masked.fasta
4. mask these UTR
novoindex -m hg19.nix hg19.fasta
Subscribe to:
Posts (Atom)