4.5. Pipes & Redirection#
This is where Unix power comes together. Pipes connect tools, redirection sends data to files. Master these, and you can accomplish complex tasks with simple commands.
4.5.1. Output Redirection#
Redirect stdout to files:
4.5.1.1. The Basics#
# Send output to file (overwrites)
$ echo "Hello" > output.txt
# Creates/overwrites output.txt with "Hello"
# Append to file (doesn't overwrite)
$ echo "World" >> output.txt
# Appends "World" to output.txt
# File now contains:
# Hello
# World
# Redirect to /dev/null (throw away)
$ command > /dev/null
# Output is discarded
4.5.1.2. Multiple Outputs#
# Redirect different streams
$ command > out.txt 2> err.txt
# stdout → out.txt
# stderr → err.txt
# Combine stdout and stderr
$ command > output.txt 2>&1
# Both go to output.txt
# Redirect stderr only
$ command 2> error.txt
# stderr → error.txt
# stdout still displays on screen
# Redirect stdout to file, stderr to another, keep some on screen
$ command > out.txt 2> err.txt
# Original stdout/stderr still visible
4.5.2. Input Redirection#
Redirect stdin from files:
# Read from file instead of stdin
$ sort < unsorted.txt
# Equivalent to: sort unsorted.txt
# Multiple input sources (less common)
$ cat < file1.txt > output.txt
4.5.3. Pipes: The Magic of Unix#
Connect stdout of one command to stdin of another:
$ command1 | command2 | command3
stdout stdin/stdout stdin
4.5.3.1. Simple Example#
# Without pipe (intermediate file)
$ grep "error" app.log > temp.txt
$ wc -l temp.txt
$ rm temp.txt
# With pipe (clean, fast)
$ grep "error" app.log | wc -l
42
# Multiple pipes
$ cat data.csv | grep "error" | wc -l
4.5.3.2. Real-World Pipelines#
# Find largest files in a directory
$ find . -type f -exec ls -l {} \; | awk '{print $5, $9}' | sort -rn | head -10
# Count word frequency in documents
$ cat *.txt | tr '[:space:]' '\n' | sort | uniq -c | sort -rn | head -20
# Monitor system resource use (top 5 memory hogs)
$ ps aux | sort -k4 -rn | head -6
# Extract and analyze Apache access logs
$ cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
# IP address, request count
# Find most common error types
$ grep "ERROR" app.log | sed 's/.*ERROR: //' | cut -d' ' -f1 | sort | uniq -c | sort -rn
4.5.4. Combining Input and Output Redirection#
Use them together for powerful operations:
# Read from file, process, write to another
$ grep "pattern" input.txt | sed 's/old/new/g' > output.txt
# Use tee to save intermediate results AND continue pipe
$ cat data.csv | tee intermediate.txt | awk '{print $1}' > final.txt
# intermediate.txt has raw data
# final.txt has processed output
# Process, show on screen, and save
$ ./generate_report.sh | tee report.txt
# Shows on screen AND saves to file
4.5.5. Here Documents and Here Strings#
Pass multi-line input without creating a file:
4.5.5.1. Here Document (<<)#
# Send multiple lines to command
$ cat << EOF
This is line 1
This is line 2
This is line 3
EOF
# Use in scripts for file creation
$ cat > myfile.txt << 'EOF'
Line 1
Line 2
Line 3
EOF
# Creates myfile.txt with the content
# Send data to command
$ wc -l << EOF
First line
Second line
Third line
EOF
# Output: 3
4.5.5.2. Here String (<<<)#
# Send single string to command
$ wc -w <<< "one two three four"
# Output: 4
# Use in scripts
$ grep "pattern" <<< "$variable"
4.5.6. Process Substitution#
Pass command output as if it were a file:
# Compare output of two commands
$ diff <(sort file1.txt) <(sort file2.txt)
# Feed multiple sources to one command
$ cat <(echo "file1 content") <(echo "file2 content") | sort
4.5.7. Common Patterns#
4.5.7.1. Extract-Transform-Load (ETL)#
# Load (read), Transform (process), Load (write)
$ cat raw_data.csv | \
sed 's/bad_value/good_value/g' | \
awk -F, '{print $1, $3, $5}' | \
sort > processed_data.txt
4.5.7.2. Filter-Aggregate-Report#
# Filter logs for specific type, aggregate counts, show report
$ grep "ERROR" app.log | \
sed 's/.*ERROR: //' | \
awk '{print $1}' | \
sort | uniq -c | \
sort -rn | \
awk '{print $2 ": " $1 " occurrences"}'
4.5.7.3. Data Validation#
# Check data quality before processing
$ awk -F, 'NR>1 && NF != 5 {print "Bad line " NR ": " $0}' data.csv
# Find missing values
$ awk -F, '{for (i=1; i<=NF; i++) if ($i == "") print "Empty field in row " NR}' data.csv
4.5.8. Special Files#
4.5.8.1. /dev/null: The Bit Bucket#
# Discard unwanted output
$ grep "debug" app.log > /dev/null 2>&1
# Output and errors are discarded
# Silence error messages
$ command 2>/dev/null
4.5.8.2. /dev/stdin, /dev/stdout, /dev/stderr#
# Explicitly reference standard streams
$ tee /dev/stderr < input.txt
# Show input on stderr while processing
# Useful in scripts for clarity
$ echo "Error message" > /dev/stderr
4.5.9. Debugging Pipelines#
When a pipeline isn’t working:
# Add tee at each step to inspect data
$ cat file.txt | tee debug1.txt | \
grep "pattern" | tee debug2.txt | \
awk '{print $1}' | tee debug3.txt | \
sort > final.txt
# Check debug files to see where data is lost
$ wc -l debug*.txt
100 debug1.txt (100 input lines)
50 debug2.txt (50 matched pattern)
50 debug3.txt (50 fields extracted)
50 final.txt
4.5.10. Quick Reference#
Operator |
Name |
Function |
Example |
|---|---|---|---|
|
Redirect |
Send stdout to file |
|
|
Append |
Add stdout to file |
|
|
Input |
Read from file |
|
|
Pipe |
Connect commands |
|
|
Stderr redirect |
Send errors to file |
|
|
Merge streams |
Combine stdout/stderr |
|
|
Pipe all streams |
Pipe stdout and stderr |
|
4.5.11. Pitfalls#
4.5.11.1. ❌ Forgetting about buffering#
# Pipeline stops if program buffers output
$ python my_script.py | some_command
# If my_script.py buffers, output appears in chunks
# Use unbuffered mode
$ python -u my_script.py | some_command
4.5.11.2. ❌ Breaking long pipelines#
# Hard to read and debug
$ cat file.txt | grep a | awk '{print $1}' | sort | uniq | wc -l
# Better: Use line continuation
$ cat file.txt | \
grep a | \
awk '{print $1}' | \
sort | uniq | wc -l
4.5.11.3. ❌ Assuming order in parallel redirection#
# Which runs first?
$ (command1 > file1.txt) & (command2 > file2.txt) &
# Be explicit with wait
$ command1 > file1.txt & pid1=$!
$ command2 > file2.txt & pid2=$!
$ wait $pid1 $pid2
4.5.11.4. ❌ Using > in loops without appending#
# WRONG: Overwrites previous iterations
$ for i in 1 2 3; do echo $i > output.txt; done
# output.txt only contains "3"
# RIGHT: Append
$ for i in 1 2 3; do echo $i >> output.txt; done
# output.txt contains 1, 2, 3