9.2. Advanced Redirection#

9.2.1. Common Pitfalls#

1. Using cat unnecessarily

# Inefficient: useless use of cat
cat file.txt | grep pattern

# Better: let grep read the file
grep pattern file.txt

2. Forgetting that pipes destroy exit codes

# This will always succeed, even if command fails
command | tee output.log
echo $?  # Shows status of tee, not command!

# Better: check PIPESTATUS
command | tee output.log
echo "${PIPESTATUS[0]}"  # Status of first command

3. Expecting xargs to preserve argument grouping

# Problem: xargs breaks up the arguments
echo "arg1 arg2" | xargs command  # Runs: command arg1 arg2

# Fix: Use -I to treat entire input as one argument
echo "arg1 arg2" | xargs -I {} command "{}"

4. Forgetting null delimiter with filenames

# Breaks on filenames with spaces
find . -name "*.txt" | xargs rm

# Better: use null delimiter
find . -name "*.txt" -print0 | xargs -0 rm

9.2.2. Real-World Example: Data Processing Pipeline#

#!/bin/bash

# Process a web server log file
process_logs() {
  local log_file=$1
  local output_dir=$2
  
  mkdir -p "$output_dir"
  
  # Extract and analyze various metrics
  cat "$log_file" | \
    tee >(awk '{print $1}' | sort | uniq -c | sort -rn > "$output_dir/ip_count.txt") \
    >(grep "error" | wc -l > "$output_dir/error_count.txt") \
    >(awk '{print $9}' | sort | uniq -c > "$output_dir/http_codes.txt") | \
    awk '{print $4}' | \
    cut -d: -f1 | \
    sort | uniq -c > "$output_dir/request_time_count.txt"
  
  # Summary
  echo "Log processing complete:"
  echo "  Total requests: $(wc -l < "$log_file")"
  echo "  Unique IPs: $(wc -l < "$output_dir/ip_count.txt")"
  echo "  Errors: $(cat "$output_dir/error_count.txt")"
}

process_logs access.log analysis/

9.2.3. Tee: Splitting Output to Multiple Destinations#

The tee command sends data to both stdout and files:

9.2.3.1. Basic tee Usage#

# Display and save output
ls /tmp | tee file_list.txt

# Append instead of overwriting
du -sh * | tee -a disk_usage.log

# Write to multiple files
date | tee log1.txt log2.txt

# Discard screen output, just save to file
long_command | tee /dev/null > output.log  # Actually just: long_command > output.log

9.2.3.2. Pipe to Multiple Commands#

# Process output in parallel paths
cat data.txt | tee >(sort > sorted.txt) >(wc -l) | grep pattern

# Save raw output and processed versions
command | tee raw.log | grep "ERROR" > errors.log

9.2.4. Xargs: Passing Output as Arguments#

The xargs command converts stdin into command arguments:

9.2.4.1. Basic xargs Usage#

# List files, pass each to a command
find . -name "*.log" | xargs rm

# Rename files with spaces in names
find . -name "*.txt" | xargs -I {} mv {} {}.bak

# Parallel processing (run 4 jobs at a time)
find . -name "*.jpg" | xargs -P 4 -I {} convert {} {}.png

9.2.4.2. Handling Special Characters#

# Use null delimiter for filenames with spaces
find . -name "*.txt" -print0 | xargs -0 wc -l

# Display each argument before executing
find . -name "*.old" -print0 | xargs -0 -v rm

# Confirm before each execution
find . -name "*.tmp" -print0 | xargs -0 -p rm

9.2.4.3. Common xargs Options#

# -I {} : Replace {} with each argument
echo "file1 file2 file3" | xargs -I {} echo "Processing: {}"

# -n 2 : Pass 2 arguments per execution
echo -e "1\n2\n3\n4\n5" | xargs -n 2 echo

# -P 0 : No limit on parallel jobs
find . -name "*.txt" | xargs -P 0 -I {} process_file {}

9.2.5. Process Substitution#

Process substitution allows you to use command output as if it were a file:

9.2.5.1. Using <() for Input#

# Compare output of two commands
diff <(ls dir1) <(ls dir2)

# Combine multiple outputs into one command
sort -m <(sort file1.txt) <(sort file2.txt)

# Pass command output as a file argument
tar -czf backup.tar.gz <(find . -name "*.txt") <(find . -name "*.md")

9.2.5.2. Using >() for Output#

# Write output of one command to multiple destinations
tee >(command1) >(command2) < input.txt

# Example: log output while also displaying it
long_running_command | tee >(cat > output.log) >(cat > verbose_log.txt)

9.2.5.3. Practical Use Case#

# Compare server configurations
diff <(ssh server1 "cat /etc/config.conf") <(ssh server2 "cat /etc/config.conf")

9.2.6. Pipes: Connecting Commands#

The pipe operator | sends the stdout of one command to the stdin of another.

9.2.6.1. Basic Piping#

# Count the number of files in /tmp
ls /tmp | wc -l

# Find lines containing "error"
cat logfile.txt | grep "error"

# Sort and count unique values
cat data.txt | sort | uniq -c | sort -rn

9.2.6.2. Chaining Multiple Pipes#

# Extract usernames from /etc/passwd, sort, remove duplicates
cut -d: -f1 /etc/passwd | sort | uniq

# Find large files, showing only the filename
ls -lR / | grep "^-" | awk '{print $NF, $5}' | sort -rn | head -10

# Count occurrences of each word
cat document.txt | tr ' ' '\n' | grep -v '^$' | sort | uniq -c | sort -rn

9.2.6.3. Avoiding Unnecessary Pipes#

# Inefficient: using grep in a pipe
cat file.txt | grep pattern | wc -l

# Better: let grep do the counting
grep -c pattern file.txt

# Inefficient: unnecessary cat
cat file.txt | sort

# Better: redirect directly
sort < file.txt