Awk programs

Awk programs are a sequence of combinations of “search pattern” and “action”. Check this example, it sums up all cell counts in this table:

$ cat << EOF > testfile
# "promoter" "head" "body" "tail"
# "FBgn0001168" 0 0 2 0
# "FBgn0032600" 0 0 2 0
# "FBgn0039536" 0 0 2 0
# "FBgn0052816" 0 0 2 0
# "FBgn0085819" 0 0 1 0
# "FBgn0263993" 0 1 0 0
# EOF

$ awk '{print $1}' testfile

$ awk '
BEGIN {
 a = 0
}
NR > 1 {  # skip the first row (column header)
 a = a + $2 + $3 + $4
}
END {
 print a
} 
' testfile

Parameters

Field separators (-F)

The parameter -F is provided a regexp. To define both ‘=’ and ‘;’ as field separators, do this:

awk -F'[=;]' '{print $2}' infile.txt

Examples

Filter a tsv file by column2==”foo”

## $0 means the whole line
awk '{ if ($2=="foo") {print $0} }' inputfile.tsv

## or, for csv:
awk -F, '{ if ($2=="foo") {print $0} }' inputfile.csv

How many reads are mapped [awk, wc]

The third bit (“unaligned read”) has to be unset/zero/not 1:

awk '!and($2, 0x004)' infile.sam | wc -l

Get only aligned and only primary lines

The flags 256 (secondary alignment) and 4 (unmapped read) have to be off:

grep -v ^@ m850-short.sam | awk '!and($2, 0x104)'