9.5. Lab: Parsing Structured Data#

9.5.1. Exercise 6: Database Query Result Formatter#

Create a tool to format SQL query results into various output formats.

Requirements:

  • Accept SQL query results from stdin (CSV-like format)

  • Support multiple output formats:

    • Plain text table (aligned columns)

    • CSV export

    • JSON export

    • HTML table

    • Markdown table

  • Include options for:

    • Column selection/reordering

    • Row filtering

    • Sorting by column

    • Header row with separators

    • Summary statistics (counts, sums, averages)

Example Input (from SQL query):

id,name,department,salary
1,Alice,Engineering,95000
2,Bob,Sales,65000
3,Carol,Engineering,98000
4,Dave,HR,60000

Example Outputs:

Plain Text:

ID   Name        Department    Salary
--   ----        ----------    ------
1    Alice       Engineering   95000
2    Bob         Sales         65000

Markdown:

| ID | Name  | Department   | Salary |
|----|-------|--------------|--------|
| 1  | Alice | Engineering  | 95000  |

Hints:

  • Use separate functions for each output format

  • Use awk for parsing and calculations

  • Use printf for aligned column output

  • Consider using column command for table formatting

9.5.2. Exercise 5: DNS and Network Log Analyzer#

Create a tool to analyze DNS query and network traffic logs.

Requirements:

  • Parse DNS query logs (or create sample)

  • Find most queried domains

  • Identify suspicious patterns (rapid queries, failed lookups)

  • Filter by query type (A, AAAA, MX, TXT)

  • Generate statistics on query response times

  • Alert on anomalies

Sample DNS Log Format:

2024-12-19 14:23:45 query domain=example.com type=A client=192.168.1.1 response_time=12ms result=OK
2024-12-19 14:23:46 query domain=example.com type=A client=192.168.1.2 response_time=11ms result=OK
2024-12-19 14:23:47 query domain=malicious.com type=A client=192.168.1.3 response_time=5000ms result=TIMEOUT

Analysis Output:

Top Queried Domains:
  example.com: 1523 queries
  google.com: 892 queries
  ...

Slow Queries (>1000ms):
  malicious.com: 5 queries
  ...

Suspicious Clients (>100 queries/minute):
  192.168.1.100: 250 queries

Hints:

  • Use awk to extract fields and calculate statistics

  • Use sort -rn for ranking by frequency

  • Use awk with time calculations for rate detection

  • Create functions for each analysis type

9.5.3. Exercise 4: Configuration File Validator#

Create a tool to validate and extract configuration from files.

Requirements:

  • Parse key=value configuration files

  • Validate required keys exist

  • Check value formats (e.g., integer, email, IP, URL)

  • Generate a report of valid/invalid entries

  • Export validated config to JSON or CSV format

Sample Config File (app.conf):

app_name=MyApp
app_port=8080
db_host=localhost
db_port=5432
admin_email=admin@example.com
api_timeout=30

Validation Rules:

  • app_port must be numeric 1-65535

  • admin_email must match email pattern

  • db_host must be non-empty

  • app_timeout must be numeric

Output:

Configuration Validation Report
================================
✓ app_name=MyApp
✓ app_port=8080 (valid: 1-65535)
✓ db_host=localhost
✗ admin_email=invalid (not an email)

Hints:

  • Use grep and regex for pattern matching

  • Use [[ =~ ]] bash regex matching

  • Separate validation logic from output formatting

  • Create lookup tables for required vs optional keys

9.5.4. Exercise 3: Text Transformation Pipeline#

Build a multi-stage text processing script.

Requirements:

  • Accept input file and output file as arguments

  • Stage 1: Remove empty lines and leading/trailing whitespace

  • Stage 2: Convert all text to lowercase

  • Stage 3: Replace multiple spaces with single space

  • Stage 4: Number all lines

  • Stage 5: Sort and remove duplicates

  • Stage 6: Generate report showing:

    • Original line count

    • Final line count

    • Most frequent lines

Example:

$ ./transform.sh input.txt output.txt
Processing input.txt...
Original lines: 1000
After cleanup: 950
After dedup: 823
Output saved to output.txt

Hints:

  • Chain sed commands for multiple transformations

  • Use uniq -c | sort -rn for frequency analysis

  • Preserve output at each stage for debugging

  • Use pipes and temporary files strategically

9.5.5. Exercise 2: Log File Parser#

Create a tool to extract and analyze data from log files.

Requirements:

  • Parse web server access logs (or create sample log)

  • Extract unique IP addresses

  • Count requests per IP

  • Find the most frequent request paths

  • Filter by date range

  • Generate summary report with:

    • Total requests

    • Top 5 IPs

    • Top 5 requested resources

    • Response code distribution

Sample Log Format (Apache Combined Log):

192.168.1.1 - - [19/Dec/2024:14:23:45 +0000] "GET /index.html HTTP/1.1" 200 1234
192.168.1.2 - - [19/Dec/2024:14:24:12 +0000] "POST /api/data HTTP/1.1" 201 567

Output Example:

=== Log Analysis Report ===
Total Requests: 1500
Top IPs:
  192.168.1.1: 450 requests
  192.168.1.5: 320 requests
...

Hints:

  • Use awk to extract fields from the log format

  • Use sort | uniq -c | sort -rn for ranking

  • Use grep and regex for date filtering

9.5.6. Exercise 1: CSV Data Parser#

Create a script to parse and analyze CSV files.

Requirements:

  • Accept a CSV filename as argument

  • Count total rows

  • Extract and display specific columns (e.g., columns 1 and 3)

  • Filter rows based on a condition (e.g., values > 100 in column 2)

  • Export filtered data to a new CSV file

  • Handle quoted fields with commas

Example Data (data.csv):

Name,Age,Salary
Alice,30,75000
Bob,25,60000
Carol,35,95000
Dave,28,70000

Example Run:

$ ./csv_parser.sh data.csv
Total rows: 4
Column 1 and 2:
Alice,30
Bob,25
...

Hints:

  • Use awk with -F, for CSV parsing

  • Handle quoted fields carefully with sed or awk regex

  • Consider using cut for simple column extraction