9.5. Lab: Parsing Structured Data#
9.5.1. Exercise 6: Database Query Result Formatter#
Create a tool to format SQL query results into various output formats.
Requirements:
Accept SQL query results from stdin (CSV-like format)
Support multiple output formats:
Plain text table (aligned columns)
CSV export
JSON export
HTML table
Markdown table
Include options for:
Column selection/reordering
Row filtering
Sorting by column
Header row with separators
Summary statistics (counts, sums, averages)
Example Input (from SQL query):
id,name,department,salary
1,Alice,Engineering,95000
2,Bob,Sales,65000
3,Carol,Engineering,98000
4,Dave,HR,60000
Example Outputs:
Plain Text:
ID Name Department Salary
-- ---- ---------- ------
1 Alice Engineering 95000
2 Bob Sales 65000
Markdown:
| ID | Name | Department | Salary |
|----|-------|--------------|--------|
| 1 | Alice | Engineering | 95000 |
Hints:
Use separate functions for each output format
Use
awkfor parsing and calculationsUse
printffor aligned column outputConsider using
columncommand for table formatting
9.5.2. Exercise 5: DNS and Network Log Analyzer#
Create a tool to analyze DNS query and network traffic logs.
Requirements:
Parse DNS query logs (or create sample)
Find most queried domains
Identify suspicious patterns (rapid queries, failed lookups)
Filter by query type (A, AAAA, MX, TXT)
Generate statistics on query response times
Alert on anomalies
Sample DNS Log Format:
2024-12-19 14:23:45 query domain=example.com type=A client=192.168.1.1 response_time=12ms result=OK
2024-12-19 14:23:46 query domain=example.com type=A client=192.168.1.2 response_time=11ms result=OK
2024-12-19 14:23:47 query domain=malicious.com type=A client=192.168.1.3 response_time=5000ms result=TIMEOUT
Analysis Output:
Top Queried Domains:
example.com: 1523 queries
google.com: 892 queries
...
Slow Queries (>1000ms):
malicious.com: 5 queries
...
Suspicious Clients (>100 queries/minute):
192.168.1.100: 250 queries
Hints:
Use
awkto extract fields and calculate statisticsUse
sort -rnfor ranking by frequencyUse
awkwith time calculations for rate detectionCreate functions for each analysis type
9.5.3. Exercise 4: Configuration File Validator#
Create a tool to validate and extract configuration from files.
Requirements:
Parse key=value configuration files
Validate required keys exist
Check value formats (e.g., integer, email, IP, URL)
Generate a report of valid/invalid entries
Export validated config to JSON or CSV format
Sample Config File (app.conf):
app_name=MyApp
app_port=8080
db_host=localhost
db_port=5432
admin_email=admin@example.com
api_timeout=30
Validation Rules:
app_portmust be numeric 1-65535admin_emailmust match email patterndb_hostmust be non-emptyapp_timeoutmust be numeric
Output:
Configuration Validation Report
================================
✓ app_name=MyApp
✓ app_port=8080 (valid: 1-65535)
✓ db_host=localhost
✗ admin_email=invalid (not an email)
Hints:
Use
grepand regex for pattern matchingUse
[[ =~ ]]bash regex matchingSeparate validation logic from output formatting
Create lookup tables for required vs optional keys
9.5.4. Exercise 3: Text Transformation Pipeline#
Build a multi-stage text processing script.
Requirements:
Accept input file and output file as arguments
Stage 1: Remove empty lines and leading/trailing whitespace
Stage 2: Convert all text to lowercase
Stage 3: Replace multiple spaces with single space
Stage 4: Number all lines
Stage 5: Sort and remove duplicates
Stage 6: Generate report showing:
Original line count
Final line count
Most frequent lines
Example:
$ ./transform.sh input.txt output.txt
Processing input.txt...
Original lines: 1000
After cleanup: 950
After dedup: 823
Output saved to output.txt
Hints:
Chain
sedcommands for multiple transformationsUse
uniq -c | sort -rnfor frequency analysisPreserve output at each stage for debugging
Use pipes and temporary files strategically
9.5.5. Exercise 2: Log File Parser#
Create a tool to extract and analyze data from log files.
Requirements:
Parse web server access logs (or create sample log)
Extract unique IP addresses
Count requests per IP
Find the most frequent request paths
Filter by date range
Generate summary report with:
Total requests
Top 5 IPs
Top 5 requested resources
Response code distribution
Sample Log Format (Apache Combined Log):
192.168.1.1 - - [19/Dec/2024:14:23:45 +0000] "GET /index.html HTTP/1.1" 200 1234
192.168.1.2 - - [19/Dec/2024:14:24:12 +0000] "POST /api/data HTTP/1.1" 201 567
Output Example:
=== Log Analysis Report ===
Total Requests: 1500
Top IPs:
192.168.1.1: 450 requests
192.168.1.5: 320 requests
...
Hints:
Use
awkto extract fields from the log formatUse
sort | uniq -c | sort -rnfor rankingUse
grepand regex for date filtering
9.5.6. Exercise 1: CSV Data Parser#
Create a script to parse and analyze CSV files.
Requirements:
Accept a CSV filename as argument
Count total rows
Extract and display specific columns (e.g., columns 1 and 3)
Filter rows based on a condition (e.g., values > 100 in column 2)
Export filtered data to a new CSV file
Handle quoted fields with commas
Example Data (data.csv):
Name,Age,Salary
Alice,30,75000
Bob,25,60000
Carol,35,95000
Dave,28,70000
Example Run:
$ ./csv_parser.sh data.csv
Total rows: 4
Column 1 and 2:
Alice,30
Bob,25
...
Hints:
Use
awkwith-F,for CSV parsingHandle quoted fields carefully with
sedorawkregexConsider using
cutfor simple column extraction