Friday, April 8, 2011

merge sam files by "cat"

tail -n +29 7550X2_1.sam > 1.sam
tail -n +29 7550X2_2.sam > 2.sam
tail -n +29 7550X2_3.sam > 3.sam
tail -n +29 7550X2_4.sam > 4.sam
tail -n +29 7550X2_5.sam > 5.sam
head -n 28 1.sam > header.txt
cat header.txt 1.sam 2.sam 3.sam 4.sam 5.sam > hello.sam
picard-tools-1.42/SortSam.jar INPUT=hello.sam OUTPUT=hello.sorted.sam CREATE_INDEX=false SO=coordinate COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=7500000 TMP_DIR=.
picard-tools-1.42/SortSam.jar INPUT=hello.sorted.sam OUTPUT=hello.sorted.bam CREATE_INDEX=true SO=coordinate COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=7500000 TMP_DIR=.

Wednesday, April 6, 2011

novoalign without "-k"(left) and with "-k" (right)

# Hash length: 13 # Hash length: 13
# Step size: 1 # Step size: 1
# Paired Reads: 96409627 # Paired Reads: 96409627
# Pairs Aligned: 62936012 # Pairs Aligned: 62971552
# Read Sequences: 192819254 # Read Sequences: 192819254
# Aligned: 128634931 # Aligned: 128720259
# Unique Alignment: 126260501 # Unique Alignment: 126343058
# Gapped Alignment: 2033737 # Gapped Alignment: 2039288
# Quality Filter: 1530071 # Quality Filter: 1530624
# Homopolymer Filter: 48808 # Homopolymer Filter: 48964
# Elapsed Time: 9737.411 (sec.) # Elapsed Time: 10373.065 (sec.)
# CPU Time: 3654.4 (min.) # CPU Time: 3922.4 (min.)
# Fragment Length Distribution # Fragment Length Distribution

Tuesday, April 5, 2011

A ZS field in Sam File from novoalign

ZS:Z: Novoalign alignment status. Not present for unique alignments.
ZS:Z:R Multiple alignments with similar score were found.
ZS:Z:QC The read was not aligned as it bases qualities were too low or it was a homopolymer read.
ZS:Z:NM No alignment was found.
ZS:Z:QL An alignment was found but it was below the quality threshold.

ZN:i: Number of alignments to read. Only present if there was more than one alignment.
ZO:Z: Indictaes long or short insert fragment for mate pair alignments when short insert has been enabled.
'+' Indicates pair was aligned as a short insert fragment.
'+' Pair was aligned as a long insert fragment.
This tag is only present for Illumina mate pairs when a short fragment length size has been specified with the i option and reads are aligned as a proper pair .
ZH:i: Hairpin score for miRNA a
ZL:i: In miRNA mode this is the alignment location for adjacent opposite strand alignment.


ZS:Z:R ZN:i:2 Indicates a read aligned to two locations.
ZS:Z:NM No alignment was found.
ZS:Z:QC The read failed quality checks and was not aligned.

Friday, April 1, 2011


sed -i 's/YING\../YING/g' 7550x1.merged.sam
sed -i 's/novoalign\../novoalign/g' 7550x1.merged.sam
time java -jar -Xmx30g ~/tool/picard-tools-1.41/SortSam.jar INPUT=7550x1.merged.sam OUTPUT=7550X1.merged.bam CREATE_INDEX=true SO=coordinate COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=7500000 TMP_DIR=.
time java -jar -Xmx30g ~/tool/gatk/dist/GenomeAnalysisTK.jar -T UnifiedGenotyper -R chr20.fa -B:dbsnp,VCF dbsnp132.human.vcf -o 7550X1.snps.raw.vcf -I 7550X1.merged.bam -stand_call_conf 50.0 -stand_emit_conf 20.0 -dcov 300
time java -jar -Xmx32g ~/tool/gatk/dist/GenomeAnalysisTK.jar -T VariantFiltration -R chr20.fa -B:variant,VCF 7550X1.snps.raw.moab.vcf -o 7550X1.snps.raw.moab.homosnp.vcf --maskName InDel --clusterWindowSize 10 --filterExpression "AF<1.0 || DP<10 || DP>100" --filterName "AF<1.0 or DP<10"
grep -c "PASS" 7550X1.snps.raw.moab.homosnp.vcf


#see all unique IP addresses connected to the server
netstat -nat | awk '{ print $5}' | cut -d: -f1 | sed -e '/^$/d' | uniq

#if your server is under a DOS attack
netstat -anp |grep 'tcp\|udp' | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n