13.5. Lab: Remote Sync & Networking#
13.5.1. Lab 6: Complete Disaster Recovery Exercise#
Design and implement a complete disaster recovery procedure with testing.
Requirements:
Document complete backup and recovery procedures
Create backup of all critical data (configs, databases, application)
Test restoration to alternative environment
Create recovery runbook with step-by-step instructions
Establish RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
Practice full recovery procedure from scratch
Document recovery time and validate completeness
Recovery Targets:
Application data: RPO = 24 hours, RTO = 1 hour
Database: RPO = 1 hour, RTO = 30 minutes
Configuration: RPO = immediate, RTO = 15 minutes
Backup Components:
Critical Systems:
1. Application code and config
2. Database (daily snapshots)
3. User data (incremental backups)
4. System configuration (/etc)
5. SSL certificates
6. SSH keys and secrets
Recovery Procedure:
Verify backup integrity
Provision recovery infrastructure
Restore application code
Restore database to specific point-in-time
Restore user data
Verify all services functional
Run health checks
Document recovery metrics
Recovery Documentation:
DISASTER RECOVERY RUNBOOK
Application: MyApp
Last Tested: 2024-01-15
Status: PASSED
Recovery Time: 32 minutes (target: 60 min) ✓
Data Loss: 15 minutes (target: 1440 min) ✓
Step-by-step recovery instructions...
Contact: dba@example.com
Validation:
Backups are restorable (tested regularly)
Recovery completes within RTO
No data loss exceeds RPO
All services operational after recovery
Runbook is current and accurate
Bonus:
Implement automated recovery testing
Geographic disaster recovery (to different region)
Ransomware recovery (air-gap backup testing)
Database point-in-time recovery to specific timestamp
Application failover testing
13.5.2. Lab 5: SSH Key Management and Access Control#
Create an SSH key management system that securely distributes and rotates SSH keys across servers.
Requirements:
Generate ED25519 SSH key pairs for deployment user
Securely copy public keys to multiple remote servers
Verify key-based authentication works on all servers
Implement key rotation (replace old keys with new ones)
Audit SSH access (log all connections)
Remove/disable old keys safely
Document who has access to which servers
Key Management Operations:
Generate new deployment key pair
Copy public key to authorized_keys on all servers
Test authentication on each server
Archive old keys with deletion dates
Disable old keys by removing from authorized_keys
Audit: List all users and their keys per server
Key Rotation Workflow:
# Step 1: Generate new key
ssh-keygen -t ed25519 -f ~/.ssh/deploy_key_new -N ""
# Step 2: Add new key to all servers
for server in "${SERVERS[@]}"; do
cat ~/.ssh/deploy_key_new.pub | \
ssh -i ~/.ssh/deploy_key user@$server 'cat >> .ssh/authorized_keys'
done
# Step 3: Test new key works on all servers
# Step 4: Remove old key from all servers
# Step 5: Archive old key with metadata
Audit Report:
Access Control Report - 2024-01-15
Server: web1
deploy key (ed25519): active since 2024-01-01
backup key (rsa): active since 2023-10-15 ⚠ OLD
root login: disabled ✓
Server: db1
deploy key (ed25519): active since 2024-01-01
backup key (rsa): disabled 2023-12-31
Root login: disabled ✓
Validation:
Keys are securely generated (passphrase protected)
Authentication works on all servers
Rotation completes without downtime
Old keys are properly archived
Audit trail shows all access
Bonus:
Implement certificate-based SSH auth
Integration with secret management (HashiCorp Vault)
Automated audit reports with expiry warnings
Enforce key age limits
13.5.3. Lab 4: Automated Server Health Monitoring#
Build a health monitoring system that periodically checks multiple servers and generates alerts.
Requirements:
Monitor CPU, memory, disk usage on 3+ servers
Check service status (web server, database, etc.)
Monitor log files for errors
Generate hourly/daily health reports
Alert on threshold violations
Track metrics over time for trend analysis
Provide dashboard showing current status
Monitored Metrics:
Per Server:
- CPU usage (alert if > 80%)
- Memory usage (alert if > 85%)
- Disk usage (alert if > 90%)
- Load average
- Process count
- Open file descriptors
Services:
- HTTP/HTTPS responsiveness
- SSH accessibility
- Application-specific health checks
Alert Actions:
Email to admin
Log to monitoring database
Dashboard indicator
Metrics recorded for trend analysis
Health Report Example:
=== Daily Health Report ===
Date: 2024-01-15
Server: web1
CPU: 42% ✓
Memory: 65% ✓
Disk: 72% ✓
Services: All healthy ✓
Server: db1
CPU: 78% ✓
Memory: 88% ⚠ (alert)
Disk: 45% ✓
Services: Database OK, Backup failed ✗
Alerts: 1 memory warning, 1 service failure
Validation:
Monitoring captures expected metrics
Alerts trigger at correct thresholds
Reports are generated on schedule
Historical data accumulates for trending
Bonus:
Implement trend analysis (predict capacity issues)
Add predictive alerting (alert before threshold)
Create web-based dashboard
Integration with external monitoring (Datadog, New Relic)
13.5.4. Lab 3: Data Synchronization with Rsync#
Implement an automated data synchronization system using rsync with verification and conflict resolution.
Requirements:
Create daily incremental backup from remote server to local
Implement full weekly backups
Use checksums to verify file integrity
Handle partial transfers gracefully
Maintain backup history with date-based directories
Implement retention policy (keep 7 full backups, 30 days of daily)
Generate sync report with statistics
Backup Strategy:
/backups/
weekly/
2024-W03/
2024-W02/
daily/
2024-01-15/
2024-01-14/
latest -> symlink to most recent backup
Sync Operations:
# Daily incremental
rsync -avz --checksum remote:/data /backups/daily/$(date +%Y-%m-%d)
# Weekly full with verification
rsync -avz --checksum --dry-run first, then without --dry-run
Report Output:
Daily Sync Report - 2024-01-15
Source: remote:/data (125.3 MB)
Destination: /backups/daily/2024-01-15
Transfer Stats:
- Files synced: 342
- New files: 23
- Updated files: 45
- Deleted: 2
- Total transferred: 12.5 MB
- Time: 2m 15s
- Status: SUCCESS
Validation:
All files transferred correctly (checksum verified)
Partial transfers can be resumed
Old backups are cleaned up per policy
Space usage is reasonable
Bonus:
Implement differential backups (vs incremental)
Add compression for archived backups
Email detailed sync report
Implement restore testing (verify backups can be restored)
13.5.5. Lab 2: Multi-Server Configuration Deployment#
Create a deployment script that safely distributes configuration files to multiple servers with rollback capability.
Requirements:
Deploy a configuration file (e.g., nginx.conf or app.config) to 3+ servers
Validate configuration on remote servers before activation
Create automatic backups of existing configs
Support rollback to previous configuration
Verify successful deployment with service health checks
Log all deployment actions
Deployment Process:
Pre-deployment: Check all servers reachable, disk space sufficient
Backup: Archive current configuration on each server
Deploy: Copy new configuration to all servers
Validate: Test configuration syntax on remote servers
Activate: Reload/restart service
Health Check: Verify service is operational
Rollback: On failure, restore previous configuration
Test Configuration:
# Create simple test config
cat > nginx.test.conf << 'EOF'
server {
listen 80;
server_name _;
location / {
return 200 "OK";
}
}
EOF
Validation:
Deployment succeeds on all servers
Service is accessible after deployment
Rollback restores previous config and service works
Backup files are created and dated
Bonus:
Implement canary deployment (deploy to 1 server first, monitor, then deploy to rest)
Send notification email on successful/failed deployment
Compare configurations to detect drift
13.5.6. Lab 1: Network Connectivity Verification#
Create a comprehensive network diagnostics script that checks connectivity to multiple hosts and services.
Requirements:
Test connectivity to 5+ different hosts using ping
Test HTTP/HTTPS connectivity using curl
Test DNS resolution
Test TCP connectivity to specific ports (SSH, HTTP, HTTPS, DNS)
Generate a report showing which services are reachable
Handle timeouts gracefully
Log results with timestamps
Test Targets:
declare -a HOSTS=("8.8.8.8" "1.1.1.1" "github.com")
declare -A SERVICES=(
["google.com:443"]="HTTPS"
["github.com:22"]="GitHub SSH"
["dns.google:53"]="DNS"
)
Report Output:
=== Network Diagnostics Report ===
Timestamp: 2024-01-15 10:30:00
Host Connectivity:
✓ 8.8.8.8 - 25.3ms
✓ google.com - 12.1ms
✗ unreachable.host - timeout
Service Accessibility:
✓ google.com:443 - HTTPS accessible
✓ github.com:22 - SSH accessible
✗ dns.google:53 - timeout
Summary: 5/6 services reachable
Bonus:
Add traceroute output for unreachable hosts
Perform DNS lookups and show all returned IPs
Check latency trends over time