4.3. Searching & Filtering#
Now that you can view files, you need to find specific data within them. This section covers grep and find—two of the most powerful tools in Unix.
4.3.1. Common Pitfalls with grep#
4.3.1.1. ❌ Not considering case sensitivity#
# Misses "ERROR", "Error", "error"
$ grep "error" log.txt
# Better for logs
$ grep -i "error" log.txt
4.3.1.2. ❌ Using grep without anchors for security checks#
# WRONG: Finds "readable" in filenames
$ find . -perm 644
# May match unexpected files
# RIGHT: Know what permissions mean
$ find . -perm -004 # Others can read
$ find . -perm /077 # Others have any permissions
4.3.1.3. ❌ Forgetting to escape regex characters#
# WRONG: . matches any character
$ grep "1.0" version.txt
# Matches "1.0", "1X0", "1a0", etc.
# RIGHT: Escape the dot
$ grep "1\.0" version.txt
# Matches only "1.0"
4.3.2. Common Pitfalls#
4.3.3. Quick Reference#
Tool |
Purpose |
Example |
|---|---|---|
|
Find pattern in file |
|
|
Extended regex |
|
|
Find files |
|
|
Extract columns |
|
|
Sort lines |
|
|
Find/remove duplicates |
|
4.3.3.1. sort and uniq: Order and Deduplicate#
# Sort lines alphabetically
$ sort file.txt
# Sort numerically
$ sort -n numbers.txt
# Find unique lines (first sort!)
$ sort file.txt | uniq
# Count occurrences
$ sort file.txt | uniq -c | sort -rn
42 error
15 warning
8 info
4.3.3.2. cut: Extract Columns#
Extract specific fields from structured data:
# Extract username from /etc/passwd (colon-separated)
$ cut -d: -f1 /etc/passwd
root
daemon
nobody
alice
# Extract columns 1-5 from CSV
$ cut -d, -f1-5 data.csv
# Extract character positions 1-10
$ cut -c1-10 file.txt
4.3.4. Other Useful Commands#
4.3.5. Combining grep and find#
Powerful searches combining file discovery and content matching:
# Find all Python files, then search for imports
$ find . -name "*.py" -type f -exec grep -l "import requests" {} \;
# -l = list filenames only
# Find Java files, count uses of System.out
$ find . -name "*.java" -exec grep -c "System.out" {} \;
# Find config files, show lines with passwords (for audit)
$ find . -name "*.conf" -exec grep -n "password" {} \;
4.3.5.1. Real-World Examples#
# Find all Python files modified today
$ find . -name "*.py" -mtime 0
# Find large log files to clean up
$ find /var/log -type f -size +100M
# Find files with world-readable permissions (potential security issue)
$ find . -perm -004
# Returns files where "others" can read
# Count total files
$ find . -type f | wc -l
# Find and list details of matches
$ find . -name "*.py" -exec ls -lh {} \;
4.3.5.2. Useful Options#
# Find files larger than 100MB
$ find . -size +100M
# Find files smaller than 1KB
$ find . -size -1k
# Find with specific permissions
$ find . -perm 644
# Find files owned by specific user
$ find . -user alice
# Find and execute command on results
$ find . -name "*.log" -exec rm {} \;
# Remove all .log files
4.3.5.3. Basic Usage#
# Find files by name
$ find . -name "*.txt"
./file1.txt
./docs/readme.txt
./archive/old.txt
# Find files by name (case-insensitive)
$ find . -iname "*.TXT"
# Matches *.txt, *.TXT, *.Txt, etc.
# Find directories only
$ find . -type d
./src
./tests
./data
# Find files only
$ find . -type f
# Find files modified in last 24 hours
$ find . -mtime -1
4.3.6. find: Search for Files#
find searches for files by name, type, permissions, etc.
4.3.6.1. Extended Regex (-E flag)#
For more powerful patterns, use extended regex:
# Find email addresses
$ grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" file.txt
alice@example.com
bob.smith@company.org
# Find repeated words
$ grep -E "([a-z]+) \1" file.txt
# Matches "the the", "and and"
# Find dates (YYYY-MM-DD format)
$ grep -E "[0-9]{4}-[0-9]{2}-[0-9]{2}" log.txt
2025-01-15
2025-01-16
4.3.6.2. Regex Examples#
# Find lines starting with "ERROR"
$ grep "^ERROR" app.log
# Find lines ending with a number
$ grep "[0-9]$" data.txt
# Find lines with exactly 3 digits
$ grep -E "[0-9]{3}" file.txt
# -E = extended regex
# Find IPv4 addresses (basic pattern)
$ grep -E "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" log.txt
192.168.1.1
10.0.0.1
172.16.0.5
4.3.6.3. Basic Patterns#
Pattern |
Matches |
Example |
|---|---|---|
|
Any character |
|
|
0 or more of previous |
|
|
1 or more of previous |
|
|
0 or 1 of previous |
|
|
Start of line |
|
|
End of line |
|
|
Character class |
|
|
Not in class |
|
4.3.7. Regular Expressions (regex)#
grep supports patterns more complex than literal strings.
4.3.7.1. Practical Examples#
# Find all users in /etc/passwd (show login names)
$ grep bash /etc/passwd | cut -d: -f1
root
user1
user2
# Find failed login attempts
$ grep "Failed password" /var/log/auth.log
# Find all Python files in project
$ grep -r "import requests" src/
# -r = recursive (search subdirectories)
# Count TODO comments in code
$ grep -r "TODO" . --include="*.py" | wc -l
42
# Find lines with EXACTLY "error" (not "errors")
$ grep -w "error" file.txt
# -w = whole word only
4.3.7.2. Basic Usage#
# Find lines containing "error"
$ grep "error" app.log
[ERROR] Database connection failed
[ERROR] Timeout waiting for response
# Find lines containing "error" (case-insensitive)
$ grep -i "error" app.log
[ERROR] Database connection failed
[ERROR] error in configuration
error: Invalid request
# Count matching lines
$ grep -c "error" app.log
42
# Show line numbers
$ grep -n "error" app.log
15: [ERROR] Database connection failed
237: [ERROR] Timeout waiting for response
# Invert match (lines NOT containing pattern)
$ grep -v "INFO" app.log
[ERROR] Database connection failed
[DEBUG] Starting module X
4.3.8. grep: Find Patterns#
grep searches for lines matching a pattern.