Linux Tutorial (Data Processing)

In the last tutorial, we have seen the basics of the Linux commands. Now, we will learn some more tools which we can use to do data processing. We will see the uses of for loop, while loop, awk command, and grep command. “awk” command is really useful to process a big file is simple steps. We will see this command in details and try to cover many of its applications.

  1. For loop and while loop in Linux
#For Loop
for i in {1..5}; do echo "i is: $i"; done
>>i is: 1
i is: 2
i is: 3
i is: 4
i is: 5

#seq command to print the sequence of numbers
seq 1 3 20
>>1
4
7
10
13
16
19

#using seq command in for loop
for i in `seq 1 3 20`; do echo "i is $i"; done
>>i is 1
i is 4
i is 7
i is 10
i is 13
i is 16
i is 19

#Defining a variable in bash and using it for arithmetic calculation
val=1000
(( newval = val*10+90))
echo $newval
>>10090

#Using for loop and printing a sequence of numbers after arithmetic operation
for i in `seq 1 3 20`; do ((i=i*10));echo "i is $i"; done
>>i is 10
i is 40
i is 70
i is 100
i is 130
i is 160
i is 190

#Defining an array in bash
arrays=(val1 val2 val3)
echo ${arrays[0]} #printing first element of an array
>>val1

#Printing whole array in bash
echo ${arrays[@]}
>>val1 val2 val3

length=${#arrays[@]} #defining the length of the array
echo $length
>>3

#Printing the values of an array using for loop
for vals in ${arrays[@]}; do echo "value is $vals"; done
>>value is val1
value is val2
value is val3

#While loop in bash
s=1
e=5
while [[ $s -le $e ]]; do echo $s; (( s = s + 2 )); done
>>1
3
5

#Using shell variables in for loop
s=1
e=5
ds=1
for i in `seq $s $ds $e`; do echo "i is $i"; done
>>i is 1
i is 2
i is 3
i is 4
i is 5

#Using for loop (other way)
s=1
e=5
for ((i=$s; i<=$e; i++)); do echo "i is $i"; done
>>i is 1
i is 2
i is 3
i is 4
i is 5

#Using special characters in echo command
echo -e "this is \t AS \n Coding Club \a"
>>this is    AS 
 Coding Club

#Making a text file using cat command
cat >inputfile<< eof
1 2 3
10 20 30
100 200 300
eof

#Reading a text file using the read command and while loop
while read aa bb cc; do echo "aa is $aa, bb is $bb, cc is $cc"; done<inputfile
>>aa is 1, bb is 2, cc is 3
aa is 10, bb is 20, cc is 30
aa is 100, bb is 200, cc is 300

2. Awk Basics

#printing the contents of a file
cat inputfile
>>1 2 3
10 20 30
100 200 300

#Printing the first column of a file
awk '{print $1}' inputfile
>>1
10
100

#Printing the 1st, 2nd, and 3rd columns of a file
awk '{print $1,$2,$3}' inputfile
>>1 2 3
10 20 30
100 200 300

#Printing all the columns of a file
awk '{print $0}' inputfile
>>1 2 3
10 20 30
100 200 300

awk '{print}' inputfile
>>1 2 3
10 20 30
100 200 300

#Printing the 1st and 2nd column then the 3rd column in new line
awk '{print $1,$2,"\n",$3}' inputfile
>>1 2 
 3
10 20 
 30
100 200 
 300

#Doing arithmetic operations on a column
awk '{print $1*1,$2*2,$3*3}' inputfile 
>>1 4 9
10 40 90
100 400 900

3. Using shell variables in awk command

#Defining variable var1 and var2 and using that in the awk command
var1=20
var2=30
awk -v var1=$var1 -v var2=$var2 '{print var1,var2}' inputfile
>>20 30
20 30
20 30

4. Reading a string/word from a text file using the awk command

#Re-creating inputfile using the cat command
cat >inputfile<< eof
1 2 3 abc
10 20 30 def
100 200 300 ghi
eof

# Printing the 4th column of line 2 from inputfile
awk 'NR==2{print $4}' inputfile
>>def

5. Reading the values from a simple text file using awk

#Creating the file wiseOldOwl.txt using cat command
cat >wiseOldOwl.txt<<eof
A wise old owl lived in an oak.
The more he saw the less he spoke.
The less he spoke the more he heard.
Why can't we all be like that wise old bird?
eof

#Printing the number of fields (or columns) in each line and the line contents
awk '{print NF, $0}' wiseOldOwl.txt
>>8 A wise old owl lived in an oak.
8 The more he saw the less he spoke.
8 The less he spoke the more he heard.
10 Why can't we all be like that wise old bird?

#Printing the line having 10 columns or fields
awk 'NF==10{print NF, $0}' wiseOldOwl.txt
>>10 Why can't we all be like that wise old bird?

#Printing the last word of the line having 10 columns or fields
awk 'NF==10{print $NF}' wiseOldOwl.txt
>>bird?

#Printing the second last or penultimate word of line containing 10 fields
awk 'NF==10{print $(NF-1)}' wiseOldOwl.txt
>>old

#Print the line containing the matched word in the file
awk '/heard/{print $0}' wiseOldOwl.txt
>>The less he spoke the more he heard.

6. If-else loop in awk

#Creating a new input file using cat command
cat >inputfile1<<eof
34 25 78 abc
67 89 67 def
90 34 78 ghi
eof

#Checking if the 1st column is greater than 40, if it is greater than 40 then
#print passes otherwise failed in subject 1
awk '{if ($1 >=40){ print $4, "passed in subject 1"} else { print $4, "failed in subject 1"}}' inputfile1
>>abc failed in subject 1
def passed in subject 1
ghi passed in subject 1


#Checking if the value of column 1, 2 and 3 is greater than 40
#If all is greater than 40 then print passed otherwise failed
awk '{if ($1 >=40 && $2>=40 && $3>=40){ print $4,$1,$2,$3 " => passed"} else { print $4,$1,$2,$3, " => failed"}}' inputfile1
>>abc 34 25 78 => failed
def 67 89 67 => passed
ghi 90 34 78 => failed

 

7. for/ while loop in awk

#Printing the square of the numbers from 1 to 5
awk 'BEGIN { for (i = 1; i <= 5; i++) print i*i }'
>>1
4
9
16
25

#Printing the numbers from 1 to 5
awk 'BEGIN {i = 1; while (i <= 5) { print "i is: ",i; i++ } }'
>>i is: 1
i is: 2
i is: 3
i is: 4
i is: 5

#Checking the contents of a file
cat inputfile1
>>34 25 78 abc
67 89 67 def
90 34 78 ghi

#Printing the contents of each line separated by header
awk '{print "line: ", NR; for (i=1; i<= 3; i++) print $i}' inputfile1
>>line: 1
34
25
78
line: 2
67
89
67
line: 3
90
34
78

8. Concatenation of two columns using awk command

#Printing the contents of file wiseOldOwl.txt
cat wiseOldOwl.txt 
>>A wise old owl lived in an oak.
The more he saw the less he spoke.
The less he spoke the more he heard.
Why can't we all be like that wise old bird?

#Concatenating the 1st and 2nd column of a file
awk '{print $1 $2}' wiseOldOwl.txt

9. Reading delimited file (or comma separated file) using awk command

#Creating a file testdata where each columns are separated by comma
cat >testdata<<eof
1,apple,rick
5,mango,rosita
10,pineapple,sasha
eof

#Printing column 1, 2 and 3 of testdata
awk -F, '{print $1,$2,$3}' testdata
>>1 apple rick
5 mango rosita
10 pineapple sasha

#Printing the columns by setting the field separator to ","
awk 'BEGIN {FS=","}{print $1,$2,$3}' testdata
>>1 apple rick
5 mango rosita
10 pineapple sasha

10. Using Output Field Separator (OFS) in awk

#Printing the contents of file testdata
cat testdata
>>1,apple,rick
5,mango,rosita
10,pineapple,sasha

#Printing the columns of file testdata separated by !
awk 'BEGIN {FS=",";OFS="!"}{print $1,$2,$3}' testdata
>>1!apple!rick
5!mango!rosita
10!pineapple!sasha

11. Using the Record Separator in awk

#Creating a file with name, address, other infos.
#Each record is separated by a line
cat >nameAddress.txt<<eof
rick
10 22
address 1

Lara
4 33 43
address 2

eof

#Printing 1st column of each record
awk 'BEGIN {RS="\n\n";FS="\n"}{print $1}' nameAddress.txt
>>rick
Lara

12. Calculating average of the column data using awk command

#Printing the contents of file testdata
cat testdata
1,apple,rick
5,mango,rosita
10,pineapple,sasha

#Calculating the average of first column of testdata
awk -F, '{total+=$1} END {print total/NR}' testdata
>>5.33333

13. Printing columns to fixed decimal places or fixing the precision of data using awk command

#Printing the contents of file testdata
cat testdata
1,apple,rick
5,mango,rosita
10,pineapple,sasha

#Calculating the average of first column of testdata to 2 decimal places
awk -F, '{total+=$1} END {printf "%.2f \n", total/NR}' testdata
>>5.33

14. Matching/finding pattern in a text file using awk command

#Printing the contents of file wiseOldOwn.txt
cat wiseOldOwl.txt 
>>A wise old owl lived in an oak.
The more he saw the less he spoke.
The less he spoke the more he heard.
Why can't we all be like that wise old bird?

#Finding and printing the line containing the word "heard" or "oak"
awk '/heard|oak/{print $0}' wiseOldOwl.txt 
>>A wise old owl lived in an oak.
The less he spoke the more he heard.

#Printing the output with line number
awk '/heard|oak/{print NR,$0}' wiseOldOwl.txt 
>>1 A wise old owl lived in an oak.
3 The less he spoke the more he heard.


cat >testdata2<<eof
1 apple rick
rosita 2 mango
pineapple sasha 10
eof

awk '/^[a-z]/{print $0}' testdata2
>>rosita 2 mango
pineapple sasha 10


awk '/[a-z]$/{print $0}' testdata2
>>1 apple rick
rosita 2 mango

awk '/[0-9]$/{print $0}' testdata2
>>pineapple sasha 10

15. Grep command for searching a string or word

#Printing the contents of file testdata2
cat testdata2
1 apple rick
rosita 2 mango
pineapple sasha 10

#Searching for the string rick
grep rick testdata2
1 apple rick

#Finding all the lines which doesn't contain the string "rick"
grep -v rick testdata2
rosita 2 mango
pineapple sasha 10

#Printing the contents of file wiseOldOwl.txt
cat wiseOldOwl.txt 
A wise old owl lived in an oak.
The more he saw the less he spoke.
The less he spoke the more he heard.
Why can't we all be like that wise old bird?

#Printing 2 lines after the search result of string wise
grep -A 2 wise wiseOldOwl.txt 
A wise old owl lived in an oak.
The more he saw the less he spoke.
The less he spoke the more he heard.
Why can't we all be like that wise old bird?

#Printing 1 line before the search result of string wise
grep -B 1 wise wiseOldOwl.txt 
A wise old owl lived in an oak.
--
The less he spoke the more he heard.
Why can't we all be like that wise old bird?

#Printing 1 line before and 1 line after the search result of string wise
grep -A 1 -B 1 wise wiseOldOwl.txt 
A wise old owl lived in an oak.
The more he saw the less he spoke.
The less he spoke the more he heard.
Why can't we all be like that wise old bird?

#Finding the string "th" in the file
grep th wiseOldOwl.txt 
The more he saw the less he spoke.
The less he spoke the more he heard.
Why can't we all be like that wise old bird?

#Finding the exact word "that" in the file
grep -w that wiseOldOwl.txt 
Why can't we all be like that wise old bird?

Linux Tutorials ( For beginners )

Linux is a very powerful platform which can be used to do almost everything in computer world. It is an open-source and free operating system. There are various distributions of Linux operating system including Ubuntu, Fedora, Debian, Mint, Opensuse etc. Even the Android operating system which has revolutionised the phone industry is a distribution of Linux operating system. About 90 % of world’s fastest computer’s uses Linux OS. Linux is very light and it even makes the weak processors do more faster things than using the Windows operating system.

Here, we will start with the basics of the navigational commands and then gradually move towards scientific programming in bash.

  1. Introduction to Linux OS, simple navigational commands, pwd (print working directory), ls (list directory), rm (remove), rmdir (remove directory), mkdir (make directory)

2. cp (copy files), mv (move or rename files), file structure in linux, root directory

3. Manipulate the directory stack: pushd (saves the current working directory in memory so it can be returned to at any time) , popd (returns to the path at the top of the directory stack)
Size of the directory: du
Check disk space: df

4. find (finding files on system), finding and removing, finding and executing some operation on it.

5. cat (for displaying text file on screen, reading text file), more (Displays text, one screen at a time), less (program similar to more, but has many more features), head (display first few lines of the file), tail (display last few lines of the file)

6. Managing and editing .bashrc, .bash_profile (These files run everytime, you refresh or start a terminal). alias, functions in linux

7. for loop in Linux

8. while loop, if-elif-else loop in linux

9. Defining arrays in linux, using echo commands
echo -n
echo -e

10. Using bash/linux to read a file, defining each columns of a file to different variables, Internal Field Separater, IFS

  • Utpal Kumar (IES, Academia Sinica)