10.5. Lab: System Monitoring#

10.5.1. Lab Exercise 6: Resource Usage Reporter#

Create a detailed reporting tool for system administrators.

10.5.1.1. Requirements#

Write resource_report.sh that:

  1. Accepts optional time range (today/week/month/custom)

  2. Collects and aggregates resource usage data

  3. Generates comprehensive multi-section report:

    • System overview (uptime, kernel, cores)

    • Peak resource times

    • Average resource usage

    • Top processes (by CPU and memory)

    • Disk usage breakdown

    • Network statistics (if available)

    • Process lifecycle events (crashed services, restarts)

  4. Exports to multiple formats: text, CSV, HTML, JSON

  5. Includes trend analysis with recommendations

  6. Can email report to admin

  7. Can archive reports for audit trail

10.5.1.2. Report Sections#

System Resource Report
Date Range: 2024-01-01 to 2024-01-15
Generated: 2024-01-15 14:32:45

1. SYSTEM OVERVIEW
   Uptime: 45 days
   Kernel: Linux 5.15.0
   CPUs: 4 cores
   Memory: 16GB
   Root: 100GB

2. RESOURCE USAGE SUMMARY
   Avg CPU load: 1.8 (45% util)
   Peak CPU load: 3.9 (98% util) - Jan 13, 14:30
   Avg Memory: 11.2GB (70%)
   Peak Memory: 15.3GB (95%) - Jan 14, 10:15
   Avg Disk: 450GB / 500GB (90%)

3. TOP PROCESSES (by peak CPU)
   chrome (18% avg, 45% peak)
   java (12% avg, 38% peak)
   
4. RECOMMENDATIONS
   - Monitor java process (erratic CPU)
   - Clean logs (/var is at 92%)
   - Consider memory upgrade (frequent >85% states)

10.5.1.3. Hints#

  • Read from system logs or collect data periodically

  • Parse /proc for accurate process info

  • Use arrays to track historical data

  • Format output with awk, sed, printf

  • Implement multiple export formats

  • Add email capability with mail command

10.5.1.4. Testing#

bash resource_report.sh --period week --format html --output report.html
bash resource_report.sh --period today --email admin@example.com

10.5.2. Lab Exercise 5: Performance Analyzer#

Create a tool that identifies system bottlenecks.

10.5.2.1. Requirements#

Write perf_analyzer.sh that:

  1. Takes a sample period in seconds (e.g., 60)

  2. Collects CPU, memory, I/O, and process data

  3. Identifies the bottleneck type:

    • CPU-bound: High load, low idle CPU

    • Memory-bound: High memory usage, swap usage

    • I/O-bound: High disk utilization

    • Healthy: All resources normal

  4. Shows which processes contribute most to bottleneck

  5. Provides optimization suggestions

  6. Can compare before/after performance

  7. Outputs detailed report with graphs (ASCII)

10.5.2.2. Example Output#

PERFORMANCE ANALYSIS (60-second sample)
Started: 2024-01-15 14:30:00
Ended:   2024-01-15 14:31:00

System Resources:
  CPU Load: 3.2 (4 cores = 80% utilized) ↗ Increasing
  Memory:   13.2GB / 16GB (82.5%) ↗ High
  Disk I/O: 45% utilization (moderate)

🔴 BOTTLENECK: CPU-BOUND

Top CPU consumers:
  1. gcc      (38%) - Compiling source code
  2. ffmpeg   (28%) - Video encoding
  3. chrome   (18%) - Browser

Recommendations:
  - Reduce concurrent builds
  - Move transcoding to off-peak hours
  - Close unnecessary browser tabs
  - Consider parallel processing for compilation

Load trend:
  14:30 ▁▂▃▄▄▅ 14:31  (steady increase)

10.5.2.3. Hints#

  • Collect samples at regular intervals

  • Use uptime, free, iostat, ps

  • Analyze bottleneck using comparative thresholds

  • Create ASCII graphs with awk

  • Save data for trend analysis

10.5.2.4. Testing#

bash perf_analyzer.sh 60

10.5.3. Lab Exercise 4: Memory Alert System#

Create a memory monitoring daemon with adaptive thresholds.

10.5.3.1. Requirements#

Write memory_alert.sh that:

  1. Monitors system memory usage continuously

  2. Identifies memory-hungry processes

  3. Sends alerts when memory pressure builds

  4. Suggests processes to stop or kill (worst offenders)

  5. Tracks memory trends (is it getting worse?)

  6. Generates hourly reports

  7. Has tunable warning thresholds (default: 70%, 85%, 95%)

  8. Can log to syslog or email alerts to admin

10.5.3.2. Alert Levels#

70-80%:   Yellow warning (memory creeping up)
80-90%:   Orange caution (memory very high)
90-95%:   Red critical (system struggling)
95%+:     Emergency (immediate action needed)

10.5.3.3. Example Report#

MEMORY ALERT - 2024-01-15 14:32:45
Current usage: 14.2GB / 16GB (88.8%) 🟠 CAUTION

Top 5 memory consumers:
  1. chrome     (2.5GB, 15.6%) - Browser can be restarted
  2. java       (2.1GB, 13.1%) - Application restart needed
  3. postgres   (1.8GB, 11.3%) - Database (critical!)
  4. apache2    (1.3GB, 8.1%)  - Web server
  5. nodejs     (0.9GB, 5.6%)  - Service process

Trend: +200MB in last hour (↗ increasing)

Recommendation: Restart non-critical services or add RAM

10.5.3.4. Hints#

  • Use free for total memory

  • Use ps for process memory

  • Calculate trends from /proc/meminfo over time

  • Implement configurable thresholds

  • Log to file for trend analysis

10.5.3.5. Testing#

bash memory_alert.sh --threshold 70

10.5.4. Lab Exercise 3: Disk Space Monitor#

Create a monitoring service that tracks disk usage and alerts.

10.5.4.1. Requirements#

Write disk_monitor.sh that:

  1. Monitors all mounted filesystems

  2. Tracks usage percentage over time (stores in log file)

  3. Alerts when usage exceeds thresholds:

    • Yellow warning at 75%

    • Red critical at 90%

  4. Shows usage trends (increasing/stable/decreasing)

  5. Identifies largest directories on full filesystems

  6. Generates daily summary report

  7. Can run continuously (background) or on-demand

10.5.4.2. Example Alert Output#

DISK MONITOR ALERT
Generated: 2024-01-15 14:32:45

🟡 WARNING: /home at 78% (was 71% yesterday - trending up)
  Largest directories:
  1. alice/      - 450GB
  2. backup/     - 280GB
  3. shared/     - 200GB
  
  Recommendation: Clean old files or add storage

🔴 CRITICAL: /var at 92% (was 89% - rapidly increasing)
  Largest directories:
  1. log/        - 145GB
  
  Recommendation: Immediately archive or delete old logs

10.5.4.3. Hints#

  • Use df -h for filesystem info

  • Store daily readings in /tmp or log file

  • Calculate trends from historical data

  • Find large dirs with du -sh

  • Use conditional logic for thresholds

10.5.4.4. Testing#

# Test on-demand
bash disk_monitor.sh

# Test background monitoring
bash disk_monitor.sh &
sleep 10 && killall disk_monitor.sh

10.5.5. Lab Exercise 2: Process Killer Utility#

Create an interactive process selection and termination tool.

10.5.5.1. Requirements#

Write kill_menu.sh that:

  1. Lists all user processes (exclude system processes)

  2. Shows PID, user, CPU%, memory%, and command name

  3. Lets user select processes by number

  4. Offers graceful terminate (SIGTERM) or force kill (SIGKILL)

  5. Confirms before killing

  6. Reports success or failure

  7. Repeats until user chooses to exit

10.5.5.2. Example Session#

Select processes to manage:
[1] firefox    (PID: 2345, CPU: 12%, Mem: 1.5GB)
[2] node       (PID: 3456, CPU: 8%, Mem: 512MB)
[3] python     (PID: 4567, CPU: 2%, Mem: 256MB)

Enter process numbers (space-separated) or 'q' to quit: 1 3

You selected:
  - firefox (PID 2345)
  - python (PID 4567)

Kill type: (g)raceful, (f)orce, (c)ancel: g

Sending SIGTERM to:
  ✓ firefox (2345) - terminating
  ✓ python (4567) - terminating

Continue? (y/n): n

10.5.5.3. Hints#

  • Use ps aux --sort=-%mem for listings

  • Prompt user for selection

  • Use read for interactive input

  • Check if process exists before killing

  • Handle errors gracefully

10.5.5.4. Testing#

bash kill_menu.sh
# Test various inputs, verify kills work

10.5.6. Lab Exercise 1: System Health Check Script#

Create a script that generates a comprehensive system health report.

10.5.6.1. Requirements#

Write system_health.sh that outputs:

  1. Timestamp of report

  2. System uptime and load average

  3. CPU core count and current load percentage

  4. Memory usage (total, used, free, percentage)

  5. Top 3 processes by CPU usage

  6. Top 3 processes by memory usage

  7. Disk usage for all mounted filesystems (> 1GB)

  8. Number of running, sleeping, stopped, and zombie processes

  9. Alert if any resource exceeds 80% threshold

10.5.6.2. Example Output#

=== System Health Report ===
Generated: 2024-01-15 14:32:45
Uptime: 45 days, 3:12

CPU: 4 cores, Load average: 1.23 (30.8% utilized)

Memory: 16GB total | 12.3GB used (76.9%) | 3.7GB free
⚠ Memory usage > 75%

Top CPU consumers:
  1. firefox (12.3%) - 2048MB
  2. chrome (8.5%) - 1536MB
  3. node (5.2%) - 512MB

Disk Usage:
  /     : 45.2GB / 100GB (45.2%)
  /var  : 23.1GB / 50GB (46.2%)
  /home : 89.5GB / 500GB (17.9%)

Processes: Running: 185 | Sleeping: 342 | Stopped: 0 | Zombie: 0

10.5.6.3. Hints#

  • Use uptime, uname, df, ps commands

  • Format output nicely with printf or awk

  • Use /proc filesystem for detailed stats

  • Add color codes for alerts (awk with ANSI colors)

10.5.6.4. Testing#

bash system_health.sh

Verify that:

  • All values update correctly

  • Alerts appear when thresholds exceeded

  • Output is well-formatted