9.3. awk & sed#
9.3.1. Common Pitfalls#
1. Using grep when sed could do both filter and modify
# Multi-step: filter then modify
grep pattern file.txt | sed 's/old/new/'
# Better: do both in sed
sed -n '/pattern/{s/old/new/p;}' file.txt
2. Forgetting awk field numbering starts at 1, not 0
# Wrong: tries to access field 0 (whole line)
awk '{print $0}' file.txt # This works but $0 is whole line
# Right understanding:
awk '{print $1}' file.txt # First field
3. Not escaping special regex characters
# Wrong: treats . as any character instead of literal dot
sed 's/file.txt/file_txt/' file.txt
# Right: escape the dot
sed 's/file\.txt/file_txt/' file.txt
4. Using sed on binary files or without -i flag
# Dangerous: modifies original file!
sed 's/old/new/g' file.txt
# Better: use -i for in-place editing (test first!)
sed -i.bak 's/old/new/g' file.txt # Creates .bak backup
9.3.2. Real-World Example: Log File Analysis#
#!/bin/bash
# Analyze Apache access logs
log_file=$1
echo "=== Log Analysis for $log_file ==="
# Count total requests
echo -n "Total requests: "
wc -l < "$log_file"
# Top 10 IPs
echo -e "\nTop 10 requesting IPs:"
awk '{print $1}' "$log_file" | sort | uniq -c | sort -rn | head -10
# Response codes breakdown
echo -e "\nResponse code distribution:"
awk '{print $9}' "$log_file" | sort | uniq -c | sort -rn
# Error requests (4xx and 5xx)
echo -e "\nError requests (4xx, 5xx):"
grep -E "[45][0-9]{2}" "$log_file" | wc -l
# Most accessed resources
echo -e "\nTop 10 resources:"
awk '{print $7}' "$log_file" | sort | uniq -c | sort -rn | head -10
# Requests per hour
echo -e "\nRequests per hour:"
awk '{print substr($4,2,2)}' "$log_file" | sort | uniq -c | sort -n
9.3.3. Grep: Pattern Matching#
grep finds lines matching a pattern:
9.3.3.1. Basic grep Usage#
# Find pattern in file
grep "pattern" file.txt
# Case-insensitive search (-i)
grep -i "pattern" file.txt
# Invert match - show non-matching lines (-v)
grep -v "pattern" file.txt
# Count matching lines (-c)
grep -c "pattern" file.txt
# Show line numbers (-n)
grep -n "pattern" file.txt
# Extended regex (-E)
grep -E "pattern1|pattern2" file.txt
9.3.3.2. grep Advanced Options#
# Only show the matched part (-o)
grep -o "pattern" file.txt
# Recursive search in directory (-r)
grep -r "pattern" directory/
# Show context: lines before and after (-B, -A, -C)
grep -B 2 -A 2 "pattern" file.txt
grep -C 3 "pattern" file.txt # 3 lines before and after
9.3.4. Awk: Text Processing and Data Extraction#
awk is a powerful language for pattern scanning and processing:
9.3.4.1. Basic awk Patterns#
# Print entire file
awk '{print}' file.txt
# Print specific fields (default delimiter: whitespace)
awk '{print $1, $3}' file.txt
# Print with condition
awk '$1 == "John" {print $0}' file.txt
# Using -F for custom field separator
awk -F: '{print $1, $3}' /etc/passwd # Print username and UID
9.3.4.2. awk Field and Record Operations#
# NF = number of fields, NR = number of records (line)
awk '{print NR, NF, $0}' file.txt
# Print last field
awk '{print $NF}' file.txt
# Print second-to-last field
awk '{print $(NF-1)}' file.txt
# Sum a column
awk '{sum += $2} END {print sum}' data.txt
# Count matching lines
awk '/pattern/ {count++} END {print count}' file.txt
9.3.4.3. awk BEGIN and END Blocks#
# Initialize before processing
awk 'BEGIN {sum=0} {sum+=$1} END {print "Total:", sum}' numbers.txt
# Print header and footer
awk 'BEGIN {print "Name\tAge"} {print $1 "\t" $3} END {print "---end---"}' people.txt
9.3.5. Sed: Stream Editor#
sed performs text transformations on a stream:
9.3.5.1. Basic sed Substitution#
# Replace first occurrence on each line
sed 's/old/new/' file.txt
# Replace all occurrences on each line (g flag)
sed 's/old/new/g' file.txt
# Replace and save to file (in-place with -i)
sed -i 's/old/new/g' file.txt
# Case-insensitive replacement (i flag)
sed 's/old/new/gi' file.txt
9.3.5.2. sed Address Ranges#
# Replace only on line 5
sed '5s/old/new/' file.txt
# Replace on lines 2-10
sed '2,10s/old/new/g' file.txt
# Replace from pattern to pattern
sed '/start/,/end/s/old/new/g' file.txt
# Replace on lines matching pattern
sed '/pattern/s/old/new/g' file.txt
9.3.5.3. Advanced sed Operations#
# Delete lines
sed '/pattern/d' file.txt
# Print only matching lines
sed -n '/pattern/p' file.txt
# Insert text before matching line
sed '/pattern/i\ New text here' file.txt
# Append text after matching line
sed '/pattern/a\ New text here' file.txt
# Multiple operations
sed -e 's/old1/new1/g' -e 's/old2/new2/g' file.txt
9.3.6. Regular Expressions Basics#
Regular expressions match patterns in text. Key elements:
Syntax |
Meaning |
|---|---|
|
Any single character |
|
Zero or more of preceding |
|
One or more of preceding (extended regex) |
|
Zero or one of preceding (extended regex) |
|
Start of line |
|
End of line |
|
Any of a, b, or c |
|
Not a, b, or c |
|
Any lowercase letter |
|
Group/capture (basic regex) |
|
Group/capture (extended regex) |
9.3.6.1. Common Patterns#
# Email-like pattern
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
# IP address pattern
[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}
# Date pattern (YYYY-MM-DD)
[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}