In the last tutorial, we have seen the basics of the Linux commands. Now, we will learn some more tools which we can use to do data processing. We will see the uses of for loop, while loop, awk command, and grep command. “awk” command is really useful to process a big file is simple steps. We will see this command in details and try to cover many of its applications.
- For loop and while loop in Linux
#For Loop for i in {1..5}; do echo "i is: $i"; done >>i is: 1 i is: 2 i is: 3 i is: 4 i is: 5 #seq command to print the sequence of numbers seq 1 3 20 >>1 4 7 10 13 16 19 #using seq command in for loop for i in `seq 1 3 20`; do echo "i is $i"; done >>i is 1 i is 4 i is 7 i is 10 i is 13 i is 16 i is 19 #Defining a variable in bash and using it for arithmetic calculation val=1000 (( newval = val*10+90)) echo $newval >>10090 #Using for loop and printing a sequence of numbers after arithmetic operation for i in `seq 1 3 20`; do ((i=i*10));echo "i is $i"; done >>i is 10 i is 40 i is 70 i is 100 i is 130 i is 160 i is 190 #Defining an array in bash arrays=(val1 val2 val3) echo ${arrays[0]} #printing first element of an array >>val1 #Printing whole array in bash echo ${arrays[@]} >>val1 val2 val3 length=${#arrays[@]} #defining the length of the array echo $length >>3 #Printing the values of an array using for loop for vals in ${arrays[@]}; do echo "value is $vals"; done >>value is val1 value is val2 value is val3 #While loop in bash s=1 e=5 while [[ $s -le $e ]]; do echo $s; (( s = s + 2 )); done >>1 3 5 #Using shell variables in for loop s=1 e=5 ds=1 for i in `seq $s $ds $e`; do echo "i is $i"; done >>i is 1 i is 2 i is 3 i is 4 i is 5 #Using for loop (other way) s=1 e=5 for ((i=$s; i<=$e; i++)); do echo "i is $i"; done >>i is 1 i is 2 i is 3 i is 4 i is 5 #Using special characters in echo command echo -e "this is \t AS \n Coding Club \a" >>this is AS Coding Club #Making a text file using cat command cat >inputfile<< eof 1 2 3 10 20 30 100 200 300 eof #Reading a text file using the read command and while loop while read aa bb cc; do echo "aa is $aa, bb is $bb, cc is $cc"; done<inputfile >>aa is 1, bb is 2, cc is 3 aa is 10, bb is 20, cc is 30 aa is 100, bb is 200, cc is 300
2. Awk Basics
#printing the contents of a file
cat inputfile
>>1 2 3
10 20 30
100 200 300
#Printing the first column of a file
awk '{print $1}' inputfile
>>1
10
100
#Printing the 1st, 2nd, and 3rd columns of a file
awk '{print $1,$2,$3}' inputfile
>>1 2 3
10 20 30
100 200 300
#Printing all the columns of a file
awk '{print $0}' inputfile
>>1 2 3
10 20 30
100 200 300
awk '{print}' inputfile
>>1 2 3
10 20 30
100 200 300
#Printing the 1st and 2nd column then the 3rd column in new line
awk '{print $1,$2,"\n",$3}' inputfile
>>1 2
3
10 20
30
100 200
300
#Doing arithmetic operations on a column
awk '{print $1*1,$2*2,$3*3}' inputfile
>>1 4 9
10 40 90
100 400 900
3. Using shell variables in awk command
#Defining variable var1 and var2 and using that in the awk command
var1=20
var2=30
awk -v var1=$var1 -v var2=$var2 '{print var1,var2}' inputfile
>>20 30
20 30
20 30
4. Reading a string/word from a text file using the awk command
#Re-creating inputfile using the cat command
cat >inputfile<< eof
1 2 3 abc
10 20 30 def
100 200 300 ghi
eof
# Printing the 4th column of line 2 from inputfile
awk 'NR==2{print $4}' inputfile
>>def
5. Reading the values from a simple text file using awk
#Creating the file wiseOldOwl.txt using cat command
cat >wiseOldOwl.txt<<eof
A wise old owl lived in an oak.
The more he saw the less he spoke.
The less he spoke the more he heard.
Why can't we all be like that wise old bird?
eof
#Printing the number of fields (or columns) in each line and the line contents
awk '{print NF, $0}' wiseOldOwl.txt
>>8 A wise old owl lived in an oak.
8 The more he saw the less he spoke.
8 The less he spoke the more he heard.
10 Why can't we all be like that wise old bird?
#Printing the line having 10 columns or fields
awk 'NF==10{print NF, $0}' wiseOldOwl.txt
>>10 Why can't we all be like that wise old bird?
#Printing the last word of the line having 10 columns or fields
awk 'NF==10{print $NF}' wiseOldOwl.txt
>>bird?
#Printing the second last or penultimate word of line containing 10 fields
awk 'NF==10{print $(NF-1)}' wiseOldOwl.txt
>>old
#Print the line containing the matched word in the file
awk '/heard/{print $0}' wiseOldOwl.txt
>>The less he spoke the more he heard.
6. If-else loop in awk
#Creating a new input file using cat command cat >inputfile1<<eof 34 25 78 abc 67 89 67 def 90 34 78 ghi eof #Checking if the 1st column is greater than 40, if it is greater than 40 then #print passes otherwise failed in subject 1 awk '{if ($1 >=40){ print $4, "passed in subject 1"} else { print $4, "failed in subject 1"}}' inputfile1 >>abc failed in subject 1 def passed in subject 1 ghi passed in subject 1 #Checking if the value of column 1, 2 and 3 is greater than 40 #If all is greater than 40 then print passed otherwise failed awk '{if ($1 >=40 && $2>=40 && $3>=40){ print $4,$1,$2,$3 " => passed"} else { print $4,$1,$2,$3, " => failed"}}' inputfile1 >>abc 34 25 78 => failed def 67 89 67 => passed ghi 90 34 78 => failed
7. for/ while loop in awk
#Printing the square of the numbers from 1 to 5
awk 'BEGIN { for (i = 1; i <= 5; i++) print i*i }'
>>1
4
9
16
25
#Printing the numbers from 1 to 5
awk 'BEGIN {i = 1; while (i <= 5) { print "i is: ",i; i++ } }'
>>i is: 1
i is: 2
i is: 3
i is: 4
i is: 5
#Checking the contents of a file
cat inputfile1
>>34 25 78 abc
67 89 67 def
90 34 78 ghi
#Printing the contents of each line separated by header
awk '{print "line: ", NR; for (i=1; i<= 3; i++) print $i}' inputfile1
>>line: 1
34
25
78
line: 2
67
89
67
line: 3
90
34
78
8. Concatenation of two columns using awk command
#Printing the contents of file wiseOldOwl.txt cat wiseOldOwl.txt >>A wise old owl lived in an oak. The more he saw the less he spoke. The less he spoke the more he heard. Why can't we all be like that wise old bird? #Concatenating the 1st and 2nd column of a file awk '{print $1 $2}' wiseOldOwl.txt
9. Reading delimited file (or comma separated file) using awk command
#Creating a file testdata where each columns are separated by comma
cat >testdata<<eof
1,apple,rick
5,mango,rosita
10,pineapple,sasha
eof
#Printing column 1, 2 and 3 of testdata
awk -F, '{print $1,$2,$3}' testdata
>>1 apple rick
5 mango rosita
10 pineapple sasha
#Printing the columns by setting the field separator to ","
awk 'BEGIN {FS=","}{print $1,$2,$3}' testdata
>>1 apple rick
5 mango rosita
10 pineapple sasha
10. Using Output Field Separator (OFS) in awk
#Printing the contents of file testdata cat testdata >>1,apple,rick 5,mango,rosita 10,pineapple,sasha #Printing the columns of file testdata separated by ! awk 'BEGIN {FS=",";OFS="!"}{print $1,$2,$3}' testdata >>1!apple!rick 5!mango!rosita 10!pineapple!sasha
11. Using the Record Separator in awk
#Creating a file with name, address, other infos. #Each record is separated by a line cat >nameAddress.txt<<eof rick 10 22 address 1 Lara 4 33 43 address 2 eof #Printing 1st column of each record awk 'BEGIN {RS="\n\n";FS="\n"}{print $1}' nameAddress.txt >>rick Lara
12. Calculating average of the column data using awk command
#Printing the contents of file testdata cat testdata 1,apple,rick 5,mango,rosita 10,pineapple,sasha #Calculating the average of first column of testdata awk -F, '{total+=$1} END {print total/NR}' testdata >>5.33333
13. Printing columns to fixed decimal places or fixing the precision of data using awk command
#Printing the contents of file testdata cat testdata 1,apple,rick 5,mango,rosita 10,pineapple,sasha #Calculating the average of first column of testdata to 2 decimal places awk -F, '{total+=$1} END {printf "%.2f \n", total/NR}' testdata >>5.33
14. Matching/finding pattern in a text file using awk command
#Printing the contents of file wiseOldOwn.txt cat wiseOldOwl.txt >>A wise old owl lived in an oak. The more he saw the less he spoke. The less he spoke the more he heard. Why can't we all be like that wise old bird? #Finding and printing the line containing the word "heard" or "oak" awk '/heard|oak/{print $0}' wiseOldOwl.txt >>A wise old owl lived in an oak. The less he spoke the more he heard. #Printing the output with line number awk '/heard|oak/{print NR,$0}' wiseOldOwl.txt >>1 A wise old owl lived in an oak. 3 The less he spoke the more he heard. cat >testdata2<<eof 1 apple rick rosita 2 mango pineapple sasha 10 eof awk '/^[a-z]/{print $0}' testdata2 >>rosita 2 mango pineapple sasha 10 awk '/[a-z]$/{print $0}' testdata2 >>1 apple rick rosita 2 mango awk '/[0-9]$/{print $0}' testdata2 >>pineapple sasha 10
15. Grep command for searching a string or word
#Printing the contents of file testdata2 cat testdata2 1 apple rick rosita 2 mango pineapple sasha 10 #Searching for the string rick grep rick testdata2 1 apple rick #Finding all the lines which doesn't contain the string "rick" grep -v rick testdata2 rosita 2 mango pineapple sasha 10 #Printing the contents of file wiseOldOwl.txt cat wiseOldOwl.txt A wise old owl lived in an oak. The more he saw the less he spoke. The less he spoke the more he heard. Why can't we all be like that wise old bird? #Printing 2 lines after the search result of string wise grep -A 2 wise wiseOldOwl.txt A wise old owl lived in an oak. The more he saw the less he spoke. The less he spoke the more he heard. Why can't we all be like that wise old bird? #Printing 1 line before the search result of string wise grep -B 1 wise wiseOldOwl.txt A wise old owl lived in an oak. -- The less he spoke the more he heard. Why can't we all be like that wise old bird? #Printing 1 line before and 1 line after the search result of string wise grep -A 1 -B 1 wise wiseOldOwl.txt A wise old owl lived in an oak. The more he saw the less he spoke. The less he spoke the more he heard. Why can't we all be like that wise old bird? #Finding the string "th" in the file grep th wiseOldOwl.txt The more he saw the less he spoke. The less he spoke the more he heard. Why can't we all be like that wise old bird? #Finding the exact word "that" in the file grep -w that wiseOldOwl.txt Why can't we all be like that wise old bird?