What is AWK & its Tutorial with Variables, Arrays and Regular Expressions and Blocks - Cloud Network

Networking | Support | Tricks | Troubleshoot | Tips

Buymecoffe

Buy Me A Coffee

Thursday, August 28, 2014

What is AWK & its Tutorial with Variables, Arrays and Regular Expressions and Blocks

Awk Variable

Why awk? 

The Awk text-processing programming language and is a useful tool for manipulating text.
Awk recognizes the concepts of "file", "record", and "field".
 A file consists of records, which by default are the lines of the file. One line becomes one record.  Awk operates on one record at a time.
A record consists of fields, which by default are separated by any number of spaces or tabs.
Field number 1 is accessed with $1, field 2 with $2, and so forth. $0 refers to the whole record.
[awkuser@p3nlh096 ~]$ awk -help
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options: GNU long options:
-f progfile --file=progfile
-F fs --field-separator=fs
-v var=val --assign=var=val
-m[fr] val
-W compat --compat
-W copyleft --copyleft
-W copyright --copyright
-W dump-variables[=file] --dump-variables[=file]
-W exec=file --exec=file
-W gen-po --gen-po
-W help --help
-W lint[=fatal] --lint[=fatal]
-W lint-old --lint-old
-W non-decimal-data --non-decimal-data
-W profile[=file] --profile[=file]
-W posix --posix
-W re-interval --re-interval
-W source=program-text --source=program-text
-W traditional --traditional
-W usage --usage
-W version --version

To report bugs, see node `Bugs' in `gawk.info', which is
section `Reporting Problems and Bugs' in the printed version.

gawk is a pattern scanning and processing language.
By default it reads standard input and writes standard output.

Examples:
gawk '{ sum += $1 }; END { print sum }' file
gawk -F: '{ print $1 }' /etc/passwd


Now, for an explanation of the { print } code block. In awk, curly braces are used to group blocks of code together, similar to C. Inside our block of code, we have a single print command. In awk, when a print command appears by itself, the full contents of the current line are printed.
$ awk '{ print $0 }' /etc/passwd

output
-------
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
...

In awk, the $0 variable represents the entire current line, so print and print $0 do exactly the same thing.
$ awk '{ print "" }' /etc/passwd

$ awk '{ print "hello" }' /etc/passwd
Running this script will fill your screen with hello's.
AWK Variables 
awk variables are initialized to either zero or the empty string the first time they are used.
Variables
  Variable declaration is not required
  May contain any type of data, their data type may change over the life of the program
  Must begin with a letter and continuing with letters, digits and underscores
  Are case senstive
  Some of the commonly used built-in variables are:
  • NR -- The current line's sequential number
  • NF -- The number of fields in the current line
  • FS -- The input field separator; defaults to whitespace and is reset by the -F command line parameter
/test$ cat calc
3 56
567 89
/test$ awk '{d=($2-($1-4));s=($2+$1);print d/sqrt(s),d*d/s }' calc
7.42077 55.0678
-18.5066 342.494
/test$
in above example we have a file calc with two rows and two columns. Note that the final statement, a "print" in this case, does not need a semicolon. It doesn't hurt to put it in, though.

Integer variables can be used to refer to fields. If one field contains information about which other field is important, this script will print only the important field:
$ awk '{imp=$1; print $imp }' calc

The special variable NF tells you how many fields are in this record. This script prints the first and last field from each record, regardless of how many fields there are:
if now calc file is
3 56 abd
567 89 xyz
$ awk '{print $1,$NF }' calc
3 abd
567 xyz

Begin and End
Any action associated with the BEGIN pattern will happen before any line-by-line processing is done. Actions with the END pattern will happen after all lines are processed.
1.One is to just mash them together, like so:
>

awk 'BEGIN{print"fee"} $1=="foo"{print"fi"}
END{print"fo fum"}' filename

AWK Arrays
awk has arrays, but they are only indexed by strings. This can be very useful, but it can also be annoying. For example, we can count the frequency of words in a document (ignoring the icky part about printing them out):
$ awk '{for(i=1;i <=NF;i++) freq[$i]++ }' filename

The array will hold an integer value for each word that occurred in the file. Unfortunately, this treats "foo", "Foo", and "foo," as different words. Oh well. How do we print out these frequencies? awk has a special "for" construct that loops over the values in an array. This script is longer than most command lines, so it will be expressed as an executable script:
#!/usr/bin/awk -f
{for(i=1;i <=NF;i++) freq[$i]++ }
END{for(word in freq) print word, freq[word]
}

AWK Regular expressions and blocks

awk '/pattern_to_match/ {actions}' input_file

awk '/foo/ { print }' abc.txt

cat abc.txt|awk '/[0-9]+.[0-9]*/ { print }'


Expressions and blocks
fredprint

$1 == "fred" { print $3 }

root

$5 ~ /root/ { print $3 }
AWK Conditional statements
awk '{
if ( $1 ~ /root/ )
{
print $1
}
}' /etc/passwd

Both scripts function identically. In the first example, the boolean expression is placed outside the block, while in the second example, the block is executed for every input line, and we selectively perform the print command by using an if statement. Both methods are available, and you can choose the one that best meshes with the other parts of your script.
if
{
if ( $1 == "foo" ) {
if ( $2 == "foo" ) {
print "uno"
} else {
print "one"
}
} else if ($1 == "bar" ) {
print "two"
} else {
print "three"
}
}


if
! /matchme/ { print $1 $3 $4 }
{
if ( $0 !~ /matchme/ ) {
print $1 $3 $4
}
}

Both scripts will output only those lines that don't contain a matchme character sequence. Again, you can choose the method that works best for your code. They both do the same thing.

( $1 == "foo" ) && ( $2 == "bar" ) { print }

This example will print only those lines where field one equals foo and field two equals bar.


Thanking You
Hope U like it....

Ubuntu 14.04

Linux Mint 17

Ubuntu Root Password Reset

Lamp in Ubuntu 14.04

Redhat Linux 7 server
Gentoo Linux 12.1