Pipes & Redirection

4.5. Pipes & Redirection#

This is where Unix power comes together. Pipes connect tools, redirection sends data to files. Master these, and you can accomplish complex tasks with simple commands.

4.5.1. Output Redirection#

Redirect stdout to files:

4.5.1.1. The Basics#

# Send output to file (overwrites)
$ echo "Hello" > output.txt
# Creates/overwrites output.txt with "Hello"

# Append to file (doesn't overwrite)
$ echo "World" >> output.txt
# Appends "World" to output.txt
# File now contains:
# Hello
# World

# Redirect to /dev/null (throw away)
$ command > /dev/null
# Output is discarded

4.5.1.2. Multiple Outputs#

# Redirect different streams
$ command > out.txt 2> err.txt
# stdout → out.txt
# stderr → err.txt

# Combine stdout and stderr
$ command > output.txt 2>&1
# Both go to output.txt

# Redirect stderr only
$ command 2> error.txt
# stderr → error.txt
# stdout still displays on screen

# Redirect stdout to file, stderr to another, keep some on screen
$ command > out.txt 2> err.txt
# Original stdout/stderr still visible

4.5.2. Input Redirection#

Redirect stdin from files:

# Read from file instead of stdin
$ sort < unsorted.txt
# Equivalent to: sort unsorted.txt

# Multiple input sources (less common)
$ cat < file1.txt > output.txt

4.5.3. Pipes: The Magic of Unix#

Connect stdout of one command to stdin of another:

$ command1 | command2 | command3
   stdout      stdin/stdout     stdin

4.5.3.1. Simple Example#

# Without pipe (intermediate file)
$ grep "error" app.log > temp.txt
$ wc -l temp.txt
$ rm temp.txt

# With pipe (clean, fast)
$ grep "error" app.log | wc -l
42

# Multiple pipes
$ cat data.csv | grep "error" | wc -l

4.5.3.2. Real-World Pipelines#

# Find largest files in a directory
$ find . -type f -exec ls -l {} \; | awk '{print $5, $9}' | sort -rn | head -10

# Count word frequency in documents
$ cat *.txt | tr '[:space:]' '\n' | sort | uniq -c | sort -rn | head -20

# Monitor system resource use (top 5 memory hogs)
$ ps aux | sort -k4 -rn | head -6

# Extract and analyze Apache access logs
$ cat access.log | awk '{print $1}' | sort | uniq -c | sort -rn | head -10
# IP address, request count

# Find most common error types
$ grep "ERROR" app.log | sed 's/.*ERROR: //' | cut -d' ' -f1 | sort | uniq -c | sort -rn

4.5.4. Combining Input and Output Redirection#

Use them together for powerful operations:

# Read from file, process, write to another
$ grep "pattern" input.txt | sed 's/old/new/g' > output.txt

# Use tee to save intermediate results AND continue pipe
$ cat data.csv | tee intermediate.txt | awk '{print $1}' > final.txt
# intermediate.txt has raw data
# final.txt has processed output

# Process, show on screen, and save
$ ./generate_report.sh | tee report.txt
# Shows on screen AND saves to file

4.5.5. Here Documents and Here Strings#

Pass multi-line input without creating a file:

4.5.5.1. Here Document (<<)#

# Send multiple lines to command
$ cat << EOF
This is line 1
This is line 2
This is line 3
EOF

# Use in scripts for file creation
$ cat > myfile.txt << 'EOF'
Line 1
Line 2
Line 3
EOF
# Creates myfile.txt with the content

# Send data to command
$ wc -l << EOF
First line
Second line
Third line
EOF
# Output: 3

4.5.5.2. Here String (<<<)#

# Send single string to command
$ wc -w <<< "one two three four"
# Output: 4

# Use in scripts
$ grep "pattern" <<< "$variable"

4.5.6. Process Substitution#

Pass command output as if it were a file:

# Compare output of two commands
$ diff <(sort file1.txt) <(sort file2.txt)

# Feed multiple sources to one command
$ cat <(echo "file1 content") <(echo "file2 content") | sort

4.5.7. Common Patterns#

4.5.7.1. Extract-Transform-Load (ETL)#

# Load (read), Transform (process), Load (write)
$ cat raw_data.csv | \
    sed 's/bad_value/good_value/g' | \
    awk -F, '{print $1, $3, $5}' | \
    sort > processed_data.txt

4.5.7.2. Filter-Aggregate-Report#

# Filter logs for specific type, aggregate counts, show report
$ grep "ERROR" app.log | \
    sed 's/.*ERROR: //' | \
    awk '{print $1}' | \
    sort | uniq -c | \
    sort -rn | \
    awk '{print $2 ": " $1 " occurrences"}'

4.5.7.3. Data Validation#

# Check data quality before processing
$ awk -F, 'NR>1 && NF != 5 {print "Bad line " NR ": " $0}' data.csv

# Find missing values
$ awk -F, '{for (i=1; i<=NF; i++) if ($i == "") print "Empty field in row " NR}' data.csv

4.5.8. Special Files#

4.5.8.1. /dev/null: The Bit Bucket#

# Discard unwanted output
$ grep "debug" app.log > /dev/null 2>&1
# Output and errors are discarded

# Silence error messages
$ command 2>/dev/null

4.5.8.2. /dev/stdin, /dev/stdout, /dev/stderr#

# Explicitly reference standard streams
$ tee /dev/stderr < input.txt
# Show input on stderr while processing

# Useful in scripts for clarity
$ echo "Error message" > /dev/stderr

4.5.9. Debugging Pipelines#

When a pipeline isn’t working:

# Add tee at each step to inspect data
$ cat file.txt | tee debug1.txt | \
    grep "pattern" | tee debug2.txt | \
    awk '{print $1}' | tee debug3.txt | \
    sort > final.txt

# Check debug files to see where data is lost
$ wc -l debug*.txt
  100 debug1.txt  (100 input lines)
   50 debug2.txt  (50 matched pattern)
   50 debug3.txt  (50 fields extracted)
   50 final.txt

4.5.10. Quick Reference#

Operator	Name	Function	Example
`>`	Redirect	Send stdout to file	`cmd > file.txt`
`>>`	Append	Add stdout to file	`cmd >> file.txt`
`<`	Input	Read from file	`cmd < input.txt`
`\|`	Pipe	Connect commands	`cmd1 \| cmd2`
`2>`	Stderr redirect	Send errors to file	`cmd 2> err.txt`
`2>&1`	Merge streams	Combine stdout/stderr	`cmd > out.txt 2>&1`
`\|&`	Pipe all streams	Pipe stdout and stderr	`cmd1 \|& cmd2`

4.5.11. Pitfalls#

4.5.11.1. ❌ Forgetting about buffering#

# Pipeline stops if program buffers output
$ python my_script.py | some_command
# If my_script.py buffers, output appears in chunks

# Use unbuffered mode
$ python -u my_script.py | some_command

4.5.11.2. ❌ Breaking long pipelines#

# Hard to read and debug
$ cat file.txt | grep a | awk '{print $1}' | sort | uniq | wc -l

# Better: Use line continuation
$ cat file.txt | \
    grep a | \
    awk '{print $1}' | \
    sort | uniq | wc -l

4.5.11.3. ❌ Assuming order in parallel redirection#

# Which runs first?
$ (command1 > file1.txt) & (command2 > file2.txt) &

# Be explicit with wait
$ command1 > file1.txt & pid1=$!
$ command2 > file2.txt & pid2=$!
$ wait $pid1 $pid2

4.5.11.4. ❌ Using > in loops without appending#

# WRONG: Overwrites previous iterations
$ for i in 1 2 3; do echo $i > output.txt; done
# output.txt only contains "3"

# RIGHT: Append
$ for i in 1 2 3; do echo $i >> output.txt; done
# output.txt contains 1, 2, 3