Lab: Remote Sync & Networking

13.5. Lab: Remote Sync & Networking#

13.5.1. Lab 6: Complete Disaster Recovery Exercise#

Design and implement a complete disaster recovery procedure with testing.

Requirements:

Document complete backup and recovery procedures
Create backup of all critical data (configs, databases, application)
Test restoration to alternative environment
Create recovery runbook with step-by-step instructions
Establish RTO (Recovery Time Objective) and RPO (Recovery Point Objective)
Practice full recovery procedure from scratch
Document recovery time and validate completeness

Recovery Targets:

Application data: RPO = 24 hours, RTO = 1 hour
Database: RPO = 1 hour, RTO = 30 minutes
Configuration: RPO = immediate, RTO = 15 minutes

Backup Components:

Critical Systems:
Application code and config
Database (daily snapshots)
User data (incremental backups)
System configuration (/etc)
SSL certificates
SSH keys and secrets

Recovery Procedure:

Verify backup integrity
Provision recovery infrastructure
Restore application code
Restore database to specific point-in-time
Restore user data
Verify all services functional
Run health checks
Document recovery metrics

Recovery Documentation:

DISASTER RECOVERY RUNBOOK
Application: MyApp
Last Tested: 2024-01-15
Status: PASSED

Recovery Time: 32 minutes (target: 60 min) ✓
Data Loss: 15 minutes (target: 1440 min) ✓

Step-by-step recovery instructions...
Contact: dba@example.com

Validation:

Backups are restorable (tested regularly)
Recovery completes within RTO
No data loss exceeds RPO
All services operational after recovery
Runbook is current and accurate

Bonus:

Implement automated recovery testing
Geographic disaster recovery (to different region)
Ransomware recovery (air-gap backup testing)
Database point-in-time recovery to specific timestamp
Application failover testing

13.5.2. Lab 5: SSH Key Management and Access Control#

Create an SSH key management system that securely distributes and rotates SSH keys across servers.

Requirements:

Generate ED25519 SSH key pairs for deployment user
Securely copy public keys to multiple remote servers
Verify key-based authentication works on all servers
Implement key rotation (replace old keys with new ones)
Audit SSH access (log all connections)
Remove/disable old keys safely
Document who has access to which servers

Key Management Operations:

Generate new deployment key pair
Copy public key to authorized_keys on all servers
Test authentication on each server
Archive old keys with deletion dates
Disable old keys by removing from authorized_keys
Audit: List all users and their keys per server

Key Rotation Workflow:

# Step 1: Generate new key
ssh-keygen -t ed25519 -f ~/.ssh/deploy_key_new -N ""

# Step 2: Add new key to all servers
for server in "${SERVERS[@]}"; do
  cat ~/.ssh/deploy_key_new.pub | \
    ssh -i ~/.ssh/deploy_key user@$server 'cat >> .ssh/authorized_keys'
done

# Step 3: Test new key works on all servers
# Step 4: Remove old key from all servers
# Step 5: Archive old key with metadata

Audit Report:

Access Control Report - 2024-01-15

Server: web1
  deploy key (ed25519): active since 2024-01-01
  backup key (rsa): active since 2023-10-15 ⚠ OLD
  root login: disabled ✓

Server: db1
  deploy key (ed25519): active since 2024-01-01
  backup key (rsa): disabled 2023-12-31
  Root login: disabled ✓

Validation:

Keys are securely generated (passphrase protected)
Authentication works on all servers
Rotation completes without downtime
Old keys are properly archived
Audit trail shows all access

Bonus:

Implement certificate-based SSH auth
Integration with secret management (HashiCorp Vault)
Automated audit reports with expiry warnings
Enforce key age limits

13.5.3. Lab 4: Automated Server Health Monitoring#

Build a health monitoring system that periodically checks multiple servers and generates alerts.

Requirements:

Monitor CPU, memory, disk usage on 3+ servers
Check service status (web server, database, etc.)
Monitor log files for errors
Generate hourly/daily health reports
Alert on threshold violations
Track metrics over time for trend analysis
Provide dashboard showing current status

Monitored Metrics:

Per Server:
- CPU usage (alert if > 80%)
- Memory usage (alert if > 85%)
- Disk usage (alert if > 90%)
- Load average
- Process count
- Open file descriptors

Services:
- HTTP/HTTPS responsiveness
- SSH accessibility
- Application-specific health checks

Alert Actions:

Email to admin
Log to monitoring database
Dashboard indicator
Metrics recorded for trend analysis

Health Report Example:

=== Daily Health Report ===
Date: 2024-01-15

Server: web1
  CPU: 42% ✓
  Memory: 65% ✓
  Disk: 72% ✓
  Services: All healthy ✓

Server: db1
  CPU: 78% ✓
  Memory: 88% ⚠ (alert)
  Disk: 45% ✓
  Services: Database OK, Backup failed ✗

Alerts: 1 memory warning, 1 service failure

Validation:

Monitoring captures expected metrics
Alerts trigger at correct thresholds
Reports are generated on schedule
Historical data accumulates for trending

Bonus:

Implement trend analysis (predict capacity issues)
Add predictive alerting (alert before threshold)
Create web-based dashboard
Integration with external monitoring (Datadog, New Relic)

13.5.4. Lab 3: Data Synchronization with Rsync#

Implement an automated data synchronization system using rsync with verification and conflict resolution.

Requirements:

Create daily incremental backup from remote server to local
Implement full weekly backups
Use checksums to verify file integrity
Handle partial transfers gracefully
Maintain backup history with date-based directories
Implement retention policy (keep 7 full backups, 30 days of daily)
Generate sync report with statistics

Backup Strategy:

/backups/
  weekly/
    2024-W03/
    2024-W02/
  daily/
    2024-01-15/
    2024-01-14/
  latest -> symlink to most recent backup

Sync Operations:

# Daily incremental
rsync -avz --checksum remote:/data /backups/daily/$(date +%Y-%m-%d)

# Weekly full with verification
rsync -avz --checksum --dry-run first, then without --dry-run

Report Output:

Daily Sync Report - 2024-01-15
Source: remote:/data (125.3 MB)
Destination: /backups/daily/2024-01-15

Transfer Stats:
- Files synced: 342
- New files: 23
- Updated files: 45
- Deleted: 2
- Total transferred: 12.5 MB
- Time: 2m 15s
- Status: SUCCESS

Validation:

All files transferred correctly (checksum verified)
Partial transfers can be resumed
Old backups are cleaned up per policy
Space usage is reasonable

Bonus:

Implement differential backups (vs incremental)
Add compression for archived backups
Email detailed sync report
Implement restore testing (verify backups can be restored)

13.5.5. Lab 2: Multi-Server Configuration Deployment#

Create a deployment script that safely distributes configuration files to multiple servers with rollback capability.

Requirements:

Deploy a configuration file (e.g., nginx.conf or app.config) to 3+ servers
Validate configuration on remote servers before activation
Create automatic backups of existing configs
Support rollback to previous configuration
Verify successful deployment with service health checks
Log all deployment actions

Deployment Process:

Pre-deployment: Check all servers reachable, disk space sufficient
Backup: Archive current configuration on each server
Deploy: Copy new configuration to all servers
Validate: Test configuration syntax on remote servers
Activate: Reload/restart service
Health Check: Verify service is operational
Rollback: On failure, restore previous configuration

Test Configuration:

# Create simple test config
cat > nginx.test.conf << 'EOF'
server {
  listen 80;
  server_name _;
  location / {
    return 200 "OK";
  }
}
EOF

Validation:

Deployment succeeds on all servers
Service is accessible after deployment
Rollback restores previous config and service works
Backup files are created and dated

Bonus:

Implement canary deployment (deploy to 1 server first, monitor, then deploy to rest)
Send notification email on successful/failed deployment
Compare configurations to detect drift

13.5.6. Lab 1: Network Connectivity Verification#

Create a comprehensive network diagnostics script that checks connectivity to multiple hosts and services.

Requirements:

Test connectivity to 5+ different hosts using ping
Test HTTP/HTTPS connectivity using curl
Test DNS resolution
Test TCP connectivity to specific ports (SSH, HTTP, HTTPS, DNS)
Generate a report showing which services are reachable
Handle timeouts gracefully
Log results with timestamps

Test Targets:

declare -a HOSTS=("8.8.8.8" "1.1.1.1" "github.com")
declare -A SERVICES=(
  ["google.com:443"]="HTTPS"
  ["github.com:22"]="GitHub SSH"
  ["dns.google:53"]="DNS"
)

Report Output:

=== Network Diagnostics Report ===
Timestamp: 2024-01-15 10:30:00

Host Connectivity:
✓ 8.8.8.8 - 25.3ms
✓ google.com - 12.1ms
✗ unreachable.host - timeout

Service Accessibility:
✓ google.com:443 - HTTPS accessible
✓ github.com:22 - SSH accessible
✗ dns.google:53 - timeout

Summary: 5/6 services reachable

Bonus:

Add traceroute output for unreachable hosts
Perform DNS lookups and show all returned IPs
Check latency trends over time

Lab: Remote Sync & Networking

Contents

13.5. Lab: Remote Sync & Networking#

13.5.1. Lab 6: Complete Disaster Recovery Exercise#

13.5.2. Lab 5: SSH Key Management and Access Control#

13.5.3. Lab 4: Automated Server Health Monitoring#

13.5.4. Lab 3: Data Synchronization with Rsync#

13.5.5. Lab 2: Multi-Server Configuration Deployment#

13.5.6. Lab 1: Network Connectivity Verification#