Saturday, May 4, 2013

[SED]: Remove repeated/duplicate words from a file in Linux

In this post we will see how to delete repeated words. There is a human tendency to write fast and and when we try to review our writing we will find repeated words side by side. If you observe I written "and" two times. This is human mind tendency to process before we write actual word. Its hard to read entire file for duplicate words if the file is big enough to skim the text. This even cause to skip some words. A better procedure is to use some tools like SED and Perl/Python to do this with the help of Regular Expressions.

I have a file abc.txt with following data.

cat abc.txt
Output:

This is is how it works buddy
What else else you want

 Remove repeated words with SED as given below.

sed -ri 's/(.*\ )\1/\1/g'  abc.txt

cat abc.txt

Output:

This is how it works buddy
What else you want

Let me explain sed command which we used.

-r option is for enabling Extended Regular Expression which have grouping option with () braces.
-i option for inserting the changes to original file, Be careful with this option as you can not get your original file once modified.
(.*\ ) for mentioning any group of characters and which is followed by same set of characters which is represented by \1. This concept is called back reference, where \1 can store first set of characters enclosed in first (). And these two things (.*\ )\1 is replaced by same word with \1 which is actual back reference to first (.*\ ).




 

Friday, May 3, 2013

Vi editor: Delete matched search pattern from a file

1) How can I search for a word and delete that matched word in vi editor?

This is bit tricky question. With SED its bit easy to do. In vi editor too we can search for a word and delete it with some trick.

Delete matched search term from a file

Step1: Go to command mode and search mode

Step2: Now search for your term and replace it with nothing

:%s/searchterm//g

This will help you to delete all the occurrences of your search term. Let me explain above syntax

:%s is for searching for entire file, if you want to search in a particular line we can just use :s.
/serchterm// will replace 'searchterm' with nothing, which means it will be removed from that line
g for global removal, what it mean is the search and replace operation is applicable for all the occurrences of search-term in a given line.

Delete matched search term line from a file 

Some times its require to delete entire line of your searched term. We can use below code once you go to command mode

:g/searchterm/d

Deleting reverse or inverse of search term lines from a file


We can even delete all the lines which do not contain our search term with below code.

:g!/searchterm/d

Hope this helps, check our other vi editor posts as well.

Thursday, April 25, 2013

AWK Scripting: Learn AWK Built-in variables with examples

AWK inbuilt variables: FS, OFS, RS, ORS, NR, NF, FNR, FILENAME


AWK is supplied with good number of built-in variables which come in handy when working with data files. We will see each AWK built-in variables with one or two examples to familiarize with them. Without these built-in variables it’s very much difficult to write simple AWK code. These variable are used to format output of an AWK command, as input field separator and even we can store current input file name in them for using them with in the script. Some of the AWK concepts already covered are.

AWK scripting: What is an AWK and how to use it?

AWK built-in variables:

  • NR: Current count of the number of input records.
  • NF: Keeps a count of the number of fields
  • FILENAME: The name of the current input-file.
  • FNR: No of records in current filename
  • FS: Contains the "field separator" character
  • RS: Stores the current "record separator" or Row Separator.
  • OFS: Stores the "output field separator".
  • ORS: Stores the "output record separator" or Output RS.
Our sample DB file for this post is db.txt

cat db.txt

John,29,MS,IBM,M,Married
Barbi,45,MD,JHH,F,Single
Mitch,33,BS,BofA,M,Single
Tim,39,Phd,DELL,M,Married
Lisa,22,BS,SmartDrive,F,Married

In order to make it simple we can divide above  inbuilt variables in to groups on basis of their operations.

Group1: FS(input field separator), OFS,
Group2: RS(Row separator) and ORS(Output record separator)
Group3: NR, NF and FNR
Group4: FILENAME variable

Group1: FS(input field separator), OFS


Let us start with FS and OFS built-in variables.

FS AWK variable: This variable is useful in storing the input field separator. By default AWK can understand only spaces, tabs as input and output separators. But if your file contains some other character as separator other than these mention one's, AWK cannot understand them. For example Linux password file which contain ‘:’ as a separator. So in order to mention the input filed separator we use this inbuilt variable.

We will see what issue we face if we don’t mention the field separator for our db.txt.

Example1: Print first column data from db.txt file.

awk '{print $1}' db.txt

Output:

John,29,MS,IBM,M,Married
Barbi,45,MD,JHH,F,Single
Mitch,33,BS,BofA,M,Single
Tim,39,Phd,DELL,M,Married
Lisa,22,BS,SmartDrive,F,Married

If you see entire file is displayed which indicates AWK do not understand db.txt file separator ",". We have to tell AWK what is the field separator.

Example2: List only first column data from db.txt file which have field separator as ‘,’.

awk 'BEGIN{FS=","}{print $1}' db.txt\

Output:

John
Barbi
Mitch
Tim
Lisa

Example3: We can use AWK option –F for mentioning input field separator as shown in below example for printing 4th column.

awk -F',' '{print $4}' db.txt

Output:

IBM
JHH
BofA
DELL
SmartDrive

OFS AWK variable: This variable is useful for mentioning what is your output field separator which separates output data.

Example4: Display only 1st and 4th column and the separator between at output for these columns should be $.

awk 'BEGIN{FS=",";OFS=" $ "}{print $1,$4}' db.txt

Output:

John $ IBM
Barbi $ JHH
Mitch $ BofA
Tim $ DELL
Lisa $ SmartDrive

Note: I given space before and after $ in OFS variable to show better output. You can remove the spaces if required.

I will leave printing only first and fourth columns to readers without using OFS and see the issue.

Group2: RS(Row separator) and ORS(Output record separator)


RS(Row separator) and ORS(Output record separator).

RS AWK Variable: Row Separator is helpful in defining separator between rows in a file. By default AWK takes row separator as new line. We can change this by using RS built-in variable.

Example5: I want to convert a sentence to a word per line. We can use RS variable for doing it.

echo “This is how it works” | awk ‘BEGIN{RS=” ”}{print $0}’

Output:

This
is
how
it
works

ORS(Output Record Separator): This variable is useful for defining the record separator for the AWK command output. By default ORS is set to new line.

Example6: Print all the company names in single line which are in 4th column.

awk -F',' 'BEGIN{ORS=" "}{print $4}' db.txt

Output:

IBM JHH BofA DELL SmartDrive

Group3: NF, NR and FNR

 NF AWK variable: This variable keeps information about total fields in a given row. The final value of a row can be represented with $NF.

Example7: Print number of fields each row in db.txt file.

 awk '{print NF}' db.txt




Output:

5
5
4
5
4

Example8: Print last field in each row of db.txt file.

awk '{print $NF}' db.txt



Output:

77
45
37
95
47

Note: If you observe above two examples We used Just NF for giving us the count of fields in a given row and $NF for displaying last element in each row. $NF will come handy when you are not sure what is your last column number.

NR AWK variable: This variable keeps the value of present line number. This will come handy when you want to print line numbers in a file.

Example9: Print line number for each line in a given file.

awk '{print NR, $0}' db.txt

Output:

1 Jones 2143 78 84 77
2 Gondrol 2321 56 58 45
3 RinRao 2122234 38 37
4 Edwin 253734 87 97 95
5 Dayan 24155 30 47

 This can be treated as cat command -n option for displaying line number for a file.

FNR AWK variable: This variable keeps count of number of lines present in a given file/data. This will come handy when you want to print no of line present in a given file. This command is equivalent to wc -l command.

Example10: Print total number of lines in a given file.

awk 'END{print FNR}' db.txt

Output:

5

From the above output we can conclude that number of lines present in db.txt file is 5.

Group4: FILENAME variable



FILENAME AWK variable: This variable contain file awk command is processing.

Example11: Print filename for each line in a given file.

 awk '{print FILENAME, NR, $0}' abc.txt

Output:

abc.txt 1 Jones 2143 78 84 77
abc.txt 2 Gondrol 2321 56 58 45
abc.txt 3 RinRao 2122234 38 37
abc.txt 4 Edwin 253734 87 97 95
abc.txt 5 Dayan 24155 30 47

In our next post we will see how to use ARRAY's in AWK scripting.

Wednesday, April 24, 2013

AWK Scripting: How to define awk variables

AWK variables: This is our ongoing tutorials on AWK scripting. As we mention earlier AWK is a full pledged language with all statements, arrays, control structures, functions etc. Today we will see how to define a variable in AWK and use it when it’s required. We already covered following AWK concepts

AWK scripting: What is an AWK and how to use it?
AWK scripting: 14 AWK print statment examples
AWK scripting: 8 AWK printf statements examples
AWK scripting: 10 BEGIN and END block examples

What is a Variable?

A variable is defined as storage location to store some value in it so that we can use this in a program. Variables will protect us from varying values in the storage location. This will help us to avoid hardcode a value in to program where ever its used. We can define a variable at the start of program and use the variable across the program, if we want to change the value of it, we can change it where we define it and this value will be updated where ever we use that variable.

Defining a variable in AWK

We can define variable in AWK where ever we require. Variables can be used for
Initializations for values
Arithmetic operations
And many more.

For this concept we will use below db.txt file.

cat db.txt
 
Jones 2143 78 84 77
Gondrol 2321 56 58 45
RinRao 2122234 38 37
Edwin 253734 87 97 95
Dayan 24155 30 47

 

Lets learn AWK variables with examples


Example1:  Add all values in column 3 and display for each addition from our db.txt file. Frist value should be first value in column3, second in the list should be first value + second value and so on.
awk 'BEGIN{a=0}{a=a+$3;print a}' db.txt
Output:
78
134
172
259
289

In the above example we defined a variable a and initialized to zero in the BEGIN block and did arithmetic operation in main block.
Example2: I don’t want above output, I just want to print a final value once the sum of all value in 3rd column. We can use END block for doing a final print instead of printing per iteration.
awk 'BEGIN{a=0}{a=a+$3}END{print a}' db.txt
Output:
289
Example3: Print all the lines which contain numbers greater than VAR1 in its column 4

awk 'BEGIN{VAR1=57}(VAR1<$4){print $0}' db.txt
 
Output:

Jones 2143 78 84 77
Gondrol 2321 56 58 45
Edwin 253734 87 97 95


These are simple examples to start with Varaiables. In our next post we will see how get inputs from user.

PERL: Difference between Chop and Chomp functions explained

This is our first post on Perl programming language. These two functions are useful in removing the last character of a given string, array(each element last character), hashes etc.

Before going into detail, we will see why we require chop and chomp functions. If you are aware of other languages like Shell and Python, they handle variables taken from user input very well. But Perl will take inputs from user differently. When Perl is asked to take inputs from user it will take variable content as the characters enter by users which includes even new line character in it.

Suppose try to save below code in to a file called xyz.pl and try to run it

#!/usr/bin/perl
$VAR1=<>;
print "$VAR1"

Sample output:
[email protected]:~$ perl xyz.pl
vaares

vaares
[email protected]:~$

If you see, We never given \n in above $VAR1 variable, Perl will try to take it by default. Some times its required to give only chars 'vaares' instead of 'vaares\n'. This can be achieved by using chop or chomp functions which are used to remove last character in a given string array or hash.

The only difference between chop and chomp is that chop will remove any character which is present at the end of a given string but chomp will delete only new line and return characters from end.

Learn chop() functions with examples:

chop() function syntax:

chop($somevarname)

Example1: Chopping a string/variable


perl -e '$VAR1=abc;chop($VAR1);print "$VAR1\n";'
ab


Example2: Chopping a perl Array


perl -e '@ARR1=(abc, cde, efg);@ARR2=chop(@ARR1);print "@ARR2\n@ARR1\n";'
g
ab cd ef

If you see @ARR1 is modified to ab cd ef and @ARR2 to store last chop char in it ie g in efg value.


Example3: Chopping perl Hash.


perl -e '%ages = ('Martin' => 28, 'Sharon' => 35, 'Re' => 29,);chop(%ages);print (values %ages)'
223[email protected]:~$

If you see keys are choped of their last chars.

Note: chop function will not chop of values.

Want to know more about chop and chomp functions? use below perldoc command

perldoc -f chop
perldoc -f chomp

comment your thoughts on this.

Friday, April 12, 2013

Python: Convert list to string and vise versa

Python have many data types such as string, Boolean, number, list, tipple, dictionary etc. We can not club a data type with other data type, if you do so we get errors. Suppose take lists and string data type and try to combine both, as the data types are different Python will not allow you to do so. If we do that it will get an error as shown below.

TypeError: can only concatenate list (not "str") to list

But some times its required to combine these two types, what is the solution?

The solution is to convert one data type to other before doing a combination.

How to convert a list data type to string data type in Python?

 Use inbuilt python function join to convert a list to a string

Syntax for join function:

separator.join(sequence)

"separator" from above can be any character you specify to separate the sequence of strings variable in join function "sequence"

Join function examples:

I have a following list

list1 = ["surendra", "is", "a", "good", "programmer"]

and string to separate above list once its converted to string is '-'. Now joining above list using join function:

Str1 = '-'.join(list1)

print Str1

Output:

surendra-is-a-good-programmer

You can change the separator what ever you want, even a group of chars as well.

How to convert a string data type to list data type in Python?

Use inbuilt python function "split"

Syntax for split function:

string.split('seperator')

"string" in the above syntax is a string which you want to split using split function and "separator" in the split function is to separate the list elements.

Example:

Str1='surendra-is-a-good-programmer'
list1 = Str1.split('-')
print list1

Output:

['surendra', 'is', 'a', 'good', 'programmer']

Hope it helps you people. Please share your thoughts on this.

Wednesday, April 10, 2013

Linux/Unix PS4 prompt explained with examples

This is one of the prompts available for us in Linux/Unix. The other prompts available for us are PS1, PS2, PS3. This prompt is very much useful when debugging shell scripts using -x option for set command. This prompt should be written at start of the script so that it will be available through out the script.
To know the default PS4 prompt use echo command to see whats stored in it.

echo default PS4 prompt

echo $PS4

Sample output:
+

We can change the PS4 promte to a meaning full sentence so that it will be usefull for debugging.

Changing PS4 promt

PS4='at Line number:${LINENO} #'

If you set above prompt and use it in shell scripts, it will help you to know which line in the script is throwing it? For example take below script where we set PS4 to 'at Line number:${LINENO} #'

#!/bin/bash
PS4='at Line number:${LINENO} #'
read -p "testing the test: " VAR1 VAR2
echo "VAR1 value is $VAR1"
echo "VAR2 value is $VAR2"

Save the above file as testsh.sh and execute it by enabling debugging.

bash -x testsh.sh
+ PS4='at Line number:${LINENO} #'
at Line number:3 #read -p 'testing the test: ' VAR1 VAR2
testing the test: asdfas asdfasd
at Line number:4 #echo 'VAR1 value is asdfas'
VAR1 value is asdfas
at Line number:5 #echo 'VAR2 value is asdfasd'
VAR2 value is asdfasd

If you see above example its very much usefull when executing scripts. PS4 supports System variables defination in its prompt as well as some special charecters as shown below.

\d - the date in "Weekday Month Date" format (e.g., "Tue May 26")
\e - an ASCII escape character (033)

\h - the hostname up to the first .
\H - the full hostname
\j - the number of jobs currently run in background
\l - the basename of the shells terminal device name
\n - newline
\r - carriage return
\s - the name of the shell, the basename of $0 (the portion following the final slash)
\t - the current time in 24-hour HH:MM:SS format
\T - the current time in 12-hour HH:MM:SS format
\@ - the current time in 12-hour am/pm format
\A - the current time in 24-hour HH:MM format
\u - the username of the current user
\v - the version of bash (e.g., 4.00)
\V - the release of bash, version + patch level (e.g., 4.00.0)
\w - Complete path of current working directory
\W - the basename of the current working directory
\! - the history number of this command
\# - the command number of this command
\$ - if the effective UID is 0, a #, otherwise a $
\nnn - the character corresponding to the octal number nnn
\\ - a backslash
\[ - begin a sequence of non-printing characters, which could be used to embed a terminal control sequence into the prompt
\] - end a sequence of non-printing characters
And we can do many things with this PS4 prompt. We will see them in our comming posts in detail.

Tuesday, April 9, 2013

PS3 prompt explained with examples in Linux/Unix


PS3(Prompt String 3) is one of the Shell prompts available for Linux. The other prompts are PS1, PS2 and PS4. PS3 prompt is useful in shell scripts along with select command to provide a custom prompt for the user to select a value.

When using select commands its better to use PS3 prompt to provide meaningful information to user.
Some of the "prompt commands/Alias" available for PS3 prompt are as below.

\d - the date in "Weekday Month Date" format (e.g., "Tue May 26")
\e - an ASCII escape character (033)
\h - the hostname up to the first .
\H - the full hostname
\j - the number of jobs currently run in background
\l - the basename of the shells terminal device name
\n - newline
\r - carriage return
\s - the name of the shell, the basename of $0 (the portion following the final slash)
\t - the current time in 24-hour HH:MM:SS format
\T - the current time in 12-hour HH:MM:SS format
\@ - the current time in 12-hour am/pm format
\A - the current time in 24-hour HH:MM format
\u - the username of the current user
\v - the version of bash (e.g., 4.00)
\V - the release of bash, version + patch level (e.g., 4.00.0)
\w - Complete path of current working directory
\W - the basename of the current working directory
\! - the history number of this command
\# - the command number of this command
\$ - if the effective UID is 0, a #, otherwise a $
\nnn - the character corresponding to the octal number nnn
\\ - a backslash
\[ - begin a sequence of non-printing characters, which could be used to embed a terminal control sequence into the prompt
\] - end a sequence of non-printing characters

Below is a select command script which do not use PS3 prompt in it.

#!/bin/bash
select var1 in abc ced efg hij
do
echo "Present value of var1 is $var1"
done


Save the above file as selectexe.sh and start executing above script as shown below.

bash selectexe.sh
1) abc
2) ced
3) efg
4) hij
#? 1
Present value of var1 is abc
#? 2
Present value of var1 is ced
#? 3
Present value of var1 is efg
#? 4
Present value of var1 is hij
#?


If you see you are prompted with a prompt: ‘#?’ to enter a choice, This is default prompt used by select command which is assigned to PS3 variable. If you want to change this default prompt from #? to some other we can do that as well by defining PS3 before executing select command at the prompt or in script as shown below script.

#!/bin/bash
PS3='Please enter a number from above list: '
select var1 in abc ced efg hij
do
echo "Present value of var1 is $var1"
done


Save the above file to selctexe1.bash and start executing it

bash selectexe.sh
1) abc
2) ced
3) efg
4) hij
Please enter a number from above list: 2
Present value of var1 is ced
Please enter a number from above list: 3
Present value of var1 is efg
Please enter a number from above list:


If you see the difference the prompt got changed from default #? to “Please enter a number from above list:”

We can use above mention control strings to give you a meaning full prompt when you are taking inputs from users. In our next post we will see how to use PS4 prompt.

Linux/Unix Shell scripting: Select command examples

Select command is similar to for, while and until loops in Linux Shell scripting.

What is a select command?

Select is a Linux command useful for doing iterations indefinitely in shell scripts. This will come handy when you require user to select options depending on their requirements. With select command we can present some data/options to user for interactive Shell scripts. Depending on user inputs, Select command run that option and gives back you the prompt with options once again for selection one more time.

Syntax of Select command:

select VARNAME in list
do
Commands
done

in the above syntax, list can be

1)list of file names
2)list of values(constants)
3)A flower brace
4)A file content
5)A Linux/Unix command output.


etc.

Example

#!/bin/bash
select var1 in abc ced efg hij
do
echo "Present value of var1 is $var1"
done


Save the above file as selectexe.sh and start executing above script as shown below.
bash selectexe.sh
1) abc
2) ced
3) efg
4) hij
#? 1
Present value of var1 is abc
#? 2
Present value of var1 is ced
#? 3
Present value of var1 is efg
#? 4
Present value of var1 is hij
#?


If you see you are prompted with a prompt: ‘#?’, This is default prompt used by select which is assigned to PS3 variable. If you want to change this default prompt from #? to some other we can do that as well by defining PS3 before executing select command at the prompt or in script as shown below script.

#!/bin/bash
PS3='Please enter a number from above list: '
select var1 in abc ced efg hij
do
echo "Present value of var1 is $var1"
done


Save the above file to selctexe1.bash and start executing it

bash selectexe.sh
1) abc
2) ced
3) efg
4) hij
Please enter a number from above list: 2
Present value of var1 is ced
Please enter a number from above list: 3
Present value of var1 is efg
Please enter a number from above list:


If you see the difference the prompt got changed from default #? to “Please enter a number from above list:”

Know more about PS3 prompt here.

If you observe in my definition I mention select is a indefinite loop which will not terminate until we press ctrl-c. And more over from above shell script we cannot do much. We have to combine select command with case sentence to make deadly weapon and to give more control for you.

Combining select and case sentences

Syntax:

Select VARNAME in list
do
Case $VARNAME in
Opt1) commands;;
 Opt2) commands;;
 Opt3) commands;;
 *) exit;;
done
done


Here case sentence will solve the problem of pressing ctrl-C to come out of the loop, this can be done with *) option set in case sentence. This is short introduction of select command

Monday, April 1, 2013

Linux/Unix Shell : PS2 prompt examples

PS2(Prompt String 2) is one of the prompts available in Linux/Unix. The other prompts are PS1, PS3 and PS4. This is very much useful for entering a large command in multiple lines and when you execute incomplete command, this prompt will come into picture.

Check what your default PS2 prompt by executing below command:

echo $PS2

Output:
>

Example1: We can change this prompt to different one with a bit meaning such as "continue your command here" etc as shown in below example.


PS2='Continue here..! '

Now start executing incomplete command as below.

echo "How are you 
Continue here..! and what you do
Continue here..!ok lets end it here.."

How are you
and what you do
ok lets end it here..

If you see the prompt changed from '>' to ''Continue here..!'. this will be very handy and more informative when dealing with a command which spreads on multiple lines.
Example2: Below prompt will give you more information when typing data as your log in name and server where you log in.

PS2='\u@\h ::  '  

Set the above PS2 prompt and check it your self with following data
Example output:

[[email protected] ~]$ PS2='\u@\h :: '
[[email protected] ~]$ echo "This is how

[email protected] :: it works
[email protected] :: buddy"
This is how
it works
buddy

Other special characters like \u and \h includes below strings


\d the date in "Weekday Month Date" format (e.g., "Tue May 26")
\e an ASCII escape character (033)
\h the hostname up to the first .
\H the full hostname
\j the number of jobs currently run in background
\l the basename of the shells terminal device name
\n newline
\r carriage return
\s the name of the shell, the basename of $0 (the portion following the final slash)
\t the current time in 24-hour HH:MM:SS format
\T the current time in 12-hour HH:MM:SS format
\@ the current time in 12-hour am/pm format
\A the current time in 24-hour HH:MM format
\u the username of the current user
\v the version of bash (e.g., 4.00)
\V the release of bash, version + patch level (e.g., 4.00.0)
\w Complete path of current working directory
\W the basename of the current working directory
\! the history number of this command
\# the command number of this command
\$ if the effective UID is 0, a #, otherwise a $
\nnn the character corresponding to the octal number nnn
\\ a backslash
\[ begin a sequence of non-printing characters, which could be used to embed a terminal control sequence into the prompt
\] end a sequence of non-printing characters


Explore above special characters and use them whichever suits best for you.

Example3: Setting the line number to the command you are executing

[[email protected] ~]$PS2='${LINENO} ::'
[[email protected] ~]$echo "How are you
25 ::mand
26 ::"
How are you
mand

Combining above two example you will get a good combination

[[email protected] ~]$ PS2='\u@\h :: ${LINENO} ::'
[[email protected] ~]$ echo "how are you

[email protected] :: 29 ::buddy
[email protected] :: 30 ::this is what
[email protected] :: 31 ::I am saying"
how are you
buddy
this is what
I am saying


Make this prompt(PS2) permanent by editing ~/.bashrc file for a BASH Shell. In our next post we will see how to use PS3 prompt in Shell scripts.