Text Processing in Linux: Understanding Grep, sed, and AWK

The Differences Between Grep, sed, and AWK

Grep, sed, and AWK are all standard Linux tools that are able to process text. Each of these tools can read text files line-by-line and use regular expressions to perform operations on specific parts of the file. However, each tool differs in complexity and what can be accomplished. Grep is used for finding text patterns in a file and is the simplest of the three. Sed can find and modify data, however, its syntax is a bit more complex than grep. AWK is a full-fledged programming language that can process text and perform comparison and arithmetic operations on the extracted text. This guide provides an overview of each tool with examples and includes links to guides in our library that go deeper into each tool.

The Grep Command-Line Utility

Grep is a Linux utility used to to find lines of text in files or input streams using regular expressions. It’s name is short for Global Regular Expression Pattern

Grep: Search a Log File for Errors

Grep is a good tool to use when you need to search for a text pattern in a file. For example, you may need to search for specific text in a system log file.

Grep’s syntax uses the following format:

grep [OPTIONS] PATTERN [FILES...]

For example, to find the exact location of your system’s Apache error log file, use grep to search for ErrorLog in the Apache configuration file.

grep "ErrorLog" /etc/apache2/apache2.conf

# ErrorLog: The location of the error log file.
# If you do not specify an ErrorLog directive within a <VirtualHost>
ErrorLog ${APACHE_LOG_DIR}/error.log

Grep is able to find the configuration named ErrorLog in your Apache configuration file and returns the text as output. The output returned by most Linux distributions highlights the search term in red.

To search multiple files using grep, separate each filename with a whitespace character. When grep searches multiple files, its output displays the location of each file where the search term is found.

grep "rollover" /var/log/fail2ban.log /var/log/fail2ban.log.1

/var/log/fail2ban.log:2021-08-22 00:00:17,281 fail2ban.server      [2467]: INFO    rollover performed on /var/log/fail2ban.log
/var/log/fail2ban.log.1:2021-08-15 00:00:10,103 fail2ban.server    [2467]: INFO    rollover performed on /var/log/fail2ban.log

Grep: Viewing a Search Term’s Context

Grep provides several command-line options to control the amount of text surrounding your search term results. The list below contains the available options:

-B num: displays the lines before the search term match. Replace num with the number of lines to display.
-A num: displays the lines after the search term match. Replace num with the number of lines to display
-C num: displays the lines before and after the search term match. Replace num with the number of lines to display. The default value for num is 2.

For example, the command will display the 4 lines “before” the pattern match.

grep -B 4 "fonts" /var/log/apache2/access.log

The output returns the 4 lines before the search term match:

192.0.2.0 - - [17/May/2015:10:05:47 +0000] "GET /presentations/logstash-monitorama-2013/plugin/highlight/highlight.js HTTP/1.1" 200 26185 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/198.51.100.0 Safari/537.36"
192.0.2.0 - - [17/May/2015:10:05:12 +0000] "GET /presentations/logstash-monitorama-2013/plugin/zoom-js/zoom.js HTTP/1.1" 200 7697 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/198.51.100.0 Safari/537.36"
192.0.2.0 - - [17/May/2015:10:05:07 +0000] "GET /presentations/logstash-monitorama-2013/plugin/notes/notes.js HTTP/1.1" 200 2892 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/198.51.100.0 Safari/537.36"
192.0.2.0 - - [17/May/2015:10:05:34 +0000] "GET /presentations/logstash-monitorama-2013/images/sad-medic.png HTTP/1.1" 200 430406 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/198.51.100.0 Safari/537.36"
192.0.2.0 - - [17/May/2015:10:05:57 +0000] "GET /presentations/logstash-monitorama-2013/css/fonts/Roboto-Bold.ttf HTTP/1.1" 200 38720 "http://semicomplete.com/presentations/logstash-monitorama-2013/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/198.51.100.0 Safari/537.36"

Note
To highlight your search term, add the --color option to your grep command.

Grep: List Files that Match a Pattern

To find a list of files that match a given pattern, use the -l option. The example below finds all HTML files in a directory that contain the string “copyright”. The output returns the list of filenames that contain the matched search term.

grep -l -i copyright *.html

about.html
index.html
products.html

Grep can also find files that do not match a pattern. To find all the files that don’t contain the string “copyright”, use the -L option.

grep -L -i copyright *.html

contact.html
how-to-buy.html

You can use grep’s -l option to perform more complex searches. For example, you can use it to find all files in a directory that contain the pattern “dogs” and “cats”. To do so, get a list of the HTML files that contain the word “dogs”:

grep -l "dogs" *.html

Then, search for “cats” in the existing list of files containing the word “dogs”:

grep -l cats $(grep -l dogs *.html)

The output displays a list of files that contain both.

To learn more about grep and its command-line options, see our How to Grep for Text in Files guide. The guide also shows you other useful operations, like piping command outputs to grep and how to recursively search through a directory tree.

Sed Command

Sed is short for Stream Editor and has much of the same functionality as grep. However, sed is much more flexible in how it can select lines of input. It also lets you modify the data it finds. Sed opens a file, reads it line by line, and acts on each line according to its instructions. If you are new to sed, it’s syntax can may seem difficult to understand. Once you’re used to sed’s syntax, you can accomplish powerful operations on your text files and data.

Create a Search and Replace sed Script

The most common usage of sed is to search for strings or patterns throughout a file and replace them with different strings. For instance, to replace every instance of “Copyright 2020” with “Copyright 2021” in a group of HTML files, use the following command:

sed -e' s/Copyright 2020/Copyright 2021/g' index.html > modified.html

When your sed programs become too big to easily fit on the command line, sed can accept sets of instructions from a program file. For example, you might have many replacements you want to make. Place them in a file named replacements.sed, as follows:

File: replacements.sed

1
2
3
4
s/I should of/I should have/;
s/supposably/supposedly/;
s/mute point/moot point/;
s/one in the same/one and the same/;

Apply the changes to all your text files with the following command:

sed -f replacements.sed -i.bak *.txt

sed: Prepend and Append Lines

The i option in sed inserts text before a given line. The a option places text after a given line. For example, the command below inserts a line before line number 4.

sed '4 i #This is the extra line' sedtest.txt

You should see a similar output after the command executes:

This is line #1
This is line #2
This is line #3
#This is the extra line
This is line #4
This is line #5
This is line #6
This is line #7
This is line #8
This is line #9
This is line #10

To save the output to a new file named newsedtest.txt, use a redirection operator, as follows:

sed '4 i #This is the extra line' sedtest.txt > newsedtest.txt

To insert the line before every line where a pattern match is found, use the following command:

sed '/8/ i #This line is inserted using sed' sedtest.txt

To learn more about sed, see our Manipulate Text from the Command Line with sed guide. The guide shows you how to change file extensions with sed, delete lines from files using sed, and more.

AWK Command

AWK is the most powerful of the three tools explored in this guide. It is a full-featured programming language with functions, variables, flow control, and support for different data types.

AWK’s primary use cases are the following:

Processing field-oriented data
Numeric comparisons and calculations
Modifying data based on calculations

AWK: Process Data in Multiple Fields

The examples in this section rely on the data in the names.txt file below. The data contains information about cars owned by a group of different people.

File: names.txt

1
2
3
4
Vince       Lombardi    Toyota      Fordham     1913
Betty       Ford        Chevrolet   Bennington  1918
Harrison    Ford        Toyota      Ripon       1942
Mike        Rowe        Ford        Towson      1962

You can use AWK to find all the people that drive a Toyota. AWK automatically breaks up each line into fields and columns using whitespaces as the delimiter. The first field is stored in variable $1, the second in $2, and so on. To search for the people who own a “Toyota”, use the field stored in the $3 variable, as follows:

awk '($3 == "Toyota") {print}' names.txt

You should see a similar output:

Vince       Lombardi    Toyota      Fordham     1913
Harrison    Ford        Toyota      Ripon       1942

The part between the single quotes, ($3 == "Toyota") {print} is a very simple program in the AWK language. It consists of a condition to test against, and an action to take. This is similar to how sed programs match each line against a pattern.

AWK: Numeric Comparisons and Calculations Using AWK

You can accomplish much more with AWK when searching for text patterns, because it supports comparison and arithmetic operators. For example, to print the last name of people who were born before 1945 (using the data in names.txt) use the following command:

awk '($5 < 1945) {print $2}' names.txt

You should see a similar output:

Lombardi
Ford
Ford

AWK can also perform arithmetic on the fields it works with. To find the birth year average for all people in the names.txt file, use the following AWK command:

awk '{total += $5} END {print total/NR}' names.txt

The output returns the following average:

1933.75

On each line, AWK adds the value of the fifth column to the variable total. At the end of the file, it prints total divided by NR, a special variable where AWK keeps the number of records it has read.

To take a deep dive into the AWK programming language, refer to our Learn the AWK Programming Language guide.

Conclusion

Grep, sed, and AWK can each be used to search for and process text on a Linux system. However, each tool provides its own strengths. For simple text pattern searches within a directory, grep can complete the task. It’s syntax is uncomplicated and it also provides many useful command-line options to enhance its search capabilities. Sed can search for text in files and can also replace the text. This provides you with more out-of-the-box functionality than grep. Finally, AWK is a full-fledged programming language, so you can accomplish a lot more with it. However, it does require a higher learning curve.

Featured

Why Linode

Get to Know Us Better

Featured

Compute

Storage

Databases

Services

Networking

Developer Tools

Featured

Solutions

Featured

Pricing

Free eBook

Community

Engage With Us

Search Results

No Results

Filters

Learn to Process Text in Linux using Grep, sed, and AWK

The Differences Between Grep, sed, and AWK

The Grep Command-Line Utility

Grep: Search a Log File for Errors

Grep: Viewing a Search Term’s Context

Grep: List Files that Match a Pattern

Sed Command

Create a Search and Replace sed Script

sed: Prepend and Append Lines

AWK Command

AWK: Process Data in Multiple Fields

AWK: Numeric Comparisons and Calculations Using AWK

Conclusion

Your Feedback Is Important