awk & sed

9.3. awk & sed#

9.3.1. Common Pitfalls#

1. Using grep when sed could do both filter and modify

# Multi-step: filter then modify
grep pattern file.txt | sed 's/old/new/'

# Better: do both in sed
sed -n '/pattern/{s/old/new/p;}' file.txt

2. Forgetting awk field numbering starts at 1, not 0

# Wrong: tries to access field 0 (whole line)
awk '{print $0}' file.txt  # This works but $0 is whole line

# Right understanding:
awk '{print $1}' file.txt  # First field

3. Not escaping special regex characters

# Wrong: treats . as any character instead of literal dot
sed 's/file.txt/file_txt/' file.txt

# Right: escape the dot
sed 's/file\.txt/file_txt/' file.txt

4. Using sed on binary files or without -i flag

# Dangerous: modifies original file!
sed 's/old/new/g' file.txt

# Better: use -i for in-place editing (test first!)
sed -i.bak 's/old/new/g' file.txt  # Creates .bak backup

9.3.2. Real-World Example: Log File Analysis#

#!/bin/bash

# Analyze Apache access logs

log_file=$1

echo "=== Log Analysis for $log_file ==="

# Count total requests
echo -n "Total requests: "
wc -l < "$log_file"

# Top 10 IPs
echo -e "\nTop 10 requesting IPs:"
awk '{print $1}' "$log_file" | sort | uniq -c | sort -rn | head -10

# Response codes breakdown
echo -e "\nResponse code distribution:"
awk '{print $9}' "$log_file" | sort | uniq -c | sort -rn

# Error requests (4xx and 5xx)
echo -e "\nError requests (4xx, 5xx):"
grep -E "[45][0-9]{2}" "$log_file" | wc -l

# Most accessed resources
echo -e "\nTop 10 resources:"
awk '{print $7}' "$log_file" | sort | uniq -c | sort -rn | head -10

# Requests per hour
echo -e "\nRequests per hour:"
awk '{print substr($4,2,2)}' "$log_file" | sort | uniq -c | sort -n

9.3.3. Grep: Pattern Matching#

grep finds lines matching a pattern:

9.3.3.1. Basic grep Usage#

# Find pattern in file
grep "pattern" file.txt

# Case-insensitive search (-i)
grep -i "pattern" file.txt

# Invert match - show non-matching lines (-v)
grep -v "pattern" file.txt

# Count matching lines (-c)
grep -c "pattern" file.txt

# Show line numbers (-n)
grep -n "pattern" file.txt

# Extended regex (-E)
grep -E "pattern1|pattern2" file.txt

9.3.3.2. grep Advanced Options#

# Only show the matched part (-o)
grep -o "pattern" file.txt

# Recursive search in directory (-r)
grep -r "pattern" directory/

# Show context: lines before and after (-B, -A, -C)
grep -B 2 -A 2 "pattern" file.txt
grep -C 3 "pattern" file.txt  # 3 lines before and after

9.3.4. Awk: Text Processing and Data Extraction#

awk is a powerful language for pattern scanning and processing:

9.3.4.1. Basic awk Patterns#

# Print entire file
awk '{print}' file.txt

# Print specific fields (default delimiter: whitespace)
awk '{print $1, $3}' file.txt

# Print with condition
awk '$1 == "John" {print $0}' file.txt

# Using -F for custom field separator
awk -F: '{print $1, $3}' /etc/passwd  # Print username and UID

9.3.4.2. awk Field and Record Operations#

# NF = number of fields, NR = number of records (line)
awk '{print NR, NF, $0}' file.txt

# Print last field
awk '{print $NF}' file.txt

# Print second-to-last field
awk '{print $(NF-1)}' file.txt

# Sum a column
awk '{sum += $2} END {print sum}' data.txt

# Count matching lines
awk '/pattern/ {count++} END {print count}' file.txt

9.3.4.3. awk BEGIN and END Blocks#

# Initialize before processing
awk 'BEGIN {sum=0} {sum+=$1} END {print "Total:", sum}' numbers.txt

# Print header and footer
awk 'BEGIN {print "Name\tAge"} {print $1 "\t" $3} END {print "---end---"}' people.txt

9.3.5. Sed: Stream Editor#

sed performs text transformations on a stream:

9.3.5.1. Basic sed Substitution#

# Replace first occurrence on each line
sed 's/old/new/' file.txt

# Replace all occurrences on each line (g flag)
sed 's/old/new/g' file.txt

# Replace and save to file (in-place with -i)
sed -i 's/old/new/g' file.txt

# Case-insensitive replacement (i flag)
sed 's/old/new/gi' file.txt

9.3.5.2. sed Address Ranges#

# Replace only on line 5
sed '5s/old/new/' file.txt

# Replace on lines 2-10
sed '2,10s/old/new/g' file.txt

# Replace from pattern to pattern
sed '/start/,/end/s/old/new/g' file.txt

# Replace on lines matching pattern
sed '/pattern/s/old/new/g' file.txt

9.3.5.3. Advanced sed Operations#

# Delete lines
sed '/pattern/d' file.txt

# Print only matching lines
sed -n '/pattern/p' file.txt

# Insert text before matching line
sed '/pattern/i\ New text here' file.txt

# Append text after matching line
sed '/pattern/a\ New text here' file.txt

# Multiple operations
sed -e 's/old1/new1/g' -e 's/old2/new2/g' file.txt

9.3.6. Regular Expressions Basics#

Regular expressions match patterns in text. Key elements:

Syntax	Meaning
`.`	Any single character
`*`	Zero or more of preceding
`+`	One or more of preceding (extended regex)
`?`	Zero or one of preceding (extended regex)
`^`	Start of line
`$`	End of line
`[abc]`	Any of a, b, or c
`[^abc]`	Not a, b, or c
`[a-z]`	Any lowercase letter
`$pattern$`	Group/capture (basic regex)
`(pattern)`	Group/capture (extended regex)

9.3.6.1. Common Patterns#

# Email-like pattern
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

# IP address pattern
[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}

# Date pattern (YYYY-MM-DD)
[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}