Thursday, October 13, 2011

pre-processing of masked FASTA for novoalign

1. Convert lower case to upper case
sed -i 's/\(.*\)/\U\1/' hg19.fasta

> chr1
NNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNN
ggggggttttTGaaaaaaaCCC

Will be


> CHR1
NNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNN
GGGGGGGTTTTTTGAAAAACCC

2. covert '>CHR' to '>chr'
sed -i -e 's/>CHR/>chr/g' hg19.fasta

3. convert N to n
sed -i -e 's/N/n/g' hg19.masked.fasta

4. mask these UTR
novoindex -m hg19.nix hg19.fasta

No comments:

Post a Comment