How Nested Infrastructure Changes the Way You Monitor Systems
Modern infrastructure is hierarchical: services run on servers, servers in clusters, clusters in regions. Flat monitoring tools can't represent this. Here's why structure matters.
Wakestack Team
Engineering Team
Infrastructure isn't flat—it's nested. Services run on servers. Servers live in clusters. Clusters exist in regions. But most monitoring tools show everything in one flat list: 50 monitors, no structure, no relationships.
When something fails, you're left correlating manually: "Which of these 12 failing monitors are actually the same problem?"
The Flat Monitoring Problem
Traditional Approach
Monitors (flat list):
├── API health check
├── API server CPU
├── API server memory
├── Database connection check
├── Database server CPU
├── Database server disk
├── Worker queue depth
├── Worker server CPU
├── Redis ping
├── Redis server memory
└── ... 40 more monitors
When something fails:
Alerts:
├── API health check: TIMEOUT
├── API server CPU: 98%
├── Worker queue depth: BACKING UP
├── Database connection check: TIMEOUT
Question: What's the root cause?
Answer: You have to figure it out manually.
The Mental Work Required
With flat monitoring, during every incident you must:
- See which monitors are failing
- Remember which services run where
- Manually correlate failures
- Deduce the root cause
This works with 10 monitors. It breaks at 50.
Infrastructure Is Hierarchical
Real infrastructure has structure:
Production Environment
├── Region: US-East
│ ├── Server: api-prod-01
│ │ ├── Service: API (port 3000)
│ │ └── Service: Background Worker
│ │
│ ├── Server: api-prod-02
│ │ └── Service: API (port 3000)
│ │
│ └── Server: db-prod-01
│ └── Service: PostgreSQL
│
└── Region: EU-West
├── Server: api-eu-01
│ └── Service: API
└── Server: cache-eu-01
└── Service: Redis
Failures cascade down this hierarchy:
- If
api-prod-01fails → API and Worker are both affected - If
US-Eastnetwork fails → Everything in that region is affected - If
db-prod-01fails → All services depending on it are affected
Nested Monitoring Structure
Nested monitoring represents this hierarchy in your monitoring tool:
Wakestack Dashboard:
Production
├── 🖥️ api-prod-01 (Server)
│ ├── CPU: 45%
│ ├── Memory: 62%
│ ├── Disk: 55%
│ ├── 🌐 API Health (/health) - 200 OK
│ └── 🌐 Worker Health (/worker/health) - 200 OK
│
├── 🖥️ api-prod-02 (Server)
│ ├── CPU: 38%
│ ├── Memory: 58%
│ └── 🌐 API Health (/health) - 200 OK
│
└── 🖥️ db-prod-01 (Server)
├── CPU: 22%
├── Memory: 78%
├── Disk: 65%
└── 🌐 PostgreSQL (port 5432) - Connected
What This Enables
Immediate root cause visibility:
Before (flat):
├── API /health: TIMEOUT
├── Worker /health: TIMEOUT
├── Some CPU metric: HIGH
After (nested):
├── 🖥️ api-prod-01: ⚠️ WARNING
│ ├── CPU: 98% ← Root cause visible
│ ├── 🌐 API Health: TIMEOUT
│ └── 🌐 Worker Health: TIMEOUT
One server problem, affecting two services.
Clear immediately.
Benefits of Hierarchical Monitoring
1. Instant Correlation
When failures are grouped by their host:
Incident: Multiple services down
Flat view:
├── API timeout
├── Worker timeout
├── Cache errors
└── "3 unrelated failures?"
Nested view:
├── 🖥️ api-prod-01: DOWN
│ ├── API: TIMEOUT (caused by server)
│ └── Worker: TIMEOUT (caused by server)
└── 🖥️ cache-prod-01: OK
"1 server down, 2 services affected."
2. Cascading Status
The parent reflects child status:
🖥️ api-prod-01: ⚠️ WARNING
├── CPU: 92% ← This triggers server warning
├── Memory: 60%
├── API: OK
└── Worker: OK
Server shows warning even though services still respond.
You see the problem before it becomes an outage.
3. Organized at Scale
At 100+ monitors, hierarchy is essential:
Production
├── US-East (12 hosts, all healthy)
├── US-West (8 hosts, 1 warning)
│ └── cache-west-02: Memory 88%
├── EU-West (6 hosts, all healthy)
└── Asia (4 hosts, all healthy)
Collapse regions that are healthy.
Expand the one with issues.
Scale from overview to detail.
4. Meaningful Status Pages
Hierarchy maps to status page components:
Internal structure → Public status page
🖥️ api-prod-01 → "API"
🖥️ api-prod-02 →
🖥️ db-prod-01 → "Database"
🖥️ cache-prod-01 → "Core Services"
🖥️ worker-prod-01 →
Aggregate related infrastructure into customer-facing components.
5. Smarter Alerting
Alert on the root cause, not symptoms:
Without hierarchy:
Alert 1: API timeout
Alert 2: Worker timeout
Alert 3: CPU high on api-prod-01
With hierarchy:
Alert: api-prod-01 CPU critical
(API and Worker failures are symptoms, not separate alerts)
Fewer alerts, clearer signal.
Common Patterns
Pattern 1: Service per Server
Simple deployments where each server runs one thing:
Production
├── 🖥️ web-server
│ └── 🌐 Website health
├── 🖥️ api-server
│ └── 🌐 API health
└── 🖥️ database
└── 🌐 PostgreSQL connection
Pattern 2: Multiple Services per Server
Common for smaller teams:
Production
└── 🖥️ main-server
├── 🌐 Website (/health)
├── 🌐 API (/api/health)
├── 🌐 Worker (/worker/health)
└── 🌐 PostgreSQL (port 5432)
Pattern 3: Load-Balanced Services
Multiple servers behind a load balancer:
Production
├── 🌐 Load Balancer (external check)
├── 🖥️ api-01
│ └── 🌐 API direct health
├── 🖥️ api-02
│ └── 🌐 API direct health
└── 🖥️ api-03
└── 🌐 API direct health
Pattern 4: Regional Deployment
Multi-region with location-based grouping:
Production
├── 📍 US-East
│ ├── 🖥️ api-us-east-01
│ └── 🖥️ api-us-east-02
├── 📍 EU-West
│ ├── 🖥️ api-eu-west-01
│ └── 🖥️ api-eu-west-02
└── 📍 Asia
└── 🖥️ api-asia-01
Pattern 5: Kubernetes-Style
Cluster → Namespace → Workload:
Production Cluster
├── 📦 Namespace: api
│ ├── Deployment: api-server
│ └── Deployment: api-worker
├── 📦 Namespace: data
│ ├── Deployment: postgres
│ └── Deployment: redis
└── 📦 Namespace: monitoring
└── Deployment: prometheus
How Wakestack Implements This
Wakestack uses nested hosts:
Create Parent Hosts
Parent: "Production API"
├── Type: Group
└── Children: api-01, api-02, api-03
Add Server Hosts with Agent
Host: api-01
├── Type: Server
├── Agent: Installed
├── Metrics: CPU, Memory, Disk
└── Parent: Production API
Add Monitors Under Hosts
Host: api-01
├── Monitor: HTTP /health
├── Monitor: HTTP /api/status
└── Monitor: TCP 5432 (database)
Result: Structured View
Dashboard:
Production API ✓
├── 🖥️ api-01 ✓
│ ├── CPU: 42% | Memory: 58% | Disk: 45%
│ ├── 🌐 /health → 200 OK (145ms)
│ └── 🌐 /api/status → 200 OK (89ms)
├── 🖥️ api-02 ✓
│ └── ...
└── 🖥️ api-03 ⚠️
├── CPU: 89% ← Warning
└── ...
Try nested monitoring — Organize your infrastructure the way it actually works.
Migration from Flat Monitoring
Step 1: Identify Your Hierarchy
Map your actual infrastructure:
- What servers do you have?
- What services run on each?
- Are there logical groupings?
Step 2: Create Structure
Build from bottom up:
- Create server hosts
- Assign monitors to their servers
- Group servers into logical parents
Step 3: Verify Relationships
Check that failures correlate correctly:
- Server issue → its monitors affected
- All child monitors up → parent shows healthy
Step 4: Update Alerting
Take advantage of hierarchy:
- Alert on server status, not individual monitor failures
- Reduce duplicate alerts
Key Takeaways
- Infrastructure is naturally hierarchical
- Flat monitoring loses relationships
- Nested monitoring shows root causes immediately
- Hierarchy enables smarter alerting (root cause, not symptoms)
- Scale requires structure—flat breaks at 50+ monitors
- Status pages benefit from component grouping
Related Resources
Frequently Asked Questions
What is nested infrastructure monitoring?
Nested infrastructure monitoring organizes hosts and monitors hierarchically, matching how systems actually deploy: services run on servers, servers in clusters, clusters in regions. This structure shows relationships and enables cascading status—when a server fails, its services are automatically affected.
Why does infrastructure hierarchy matter for monitoring?
Hierarchy matters because failures cascade. When a server goes down, every service on it is affected. Flat monitoring lists show unrelated failures; hierarchical monitoring shows root cause relationships. You see 'server failed' rather than '12 unrelated service alerts.'
How do I organize monitors for microservices?
Organize by deployment location: group monitors under the server or container where services run. For Kubernetes, organize by cluster → namespace → pod. This structure lets you quickly identify whether issues are service-specific or infrastructure-wide.
Related Articles
Infrastructure-Aware Uptime Monitoring: Beyond Simple Checks
Learn how infrastructure-aware monitoring combines uptime checks with server metrics. Understand why knowing your endpoints isn't enough without knowing your infrastructure.
Read moreNested Host Monitoring: Organize Monitors by Infrastructure
Learn how nested host monitoring helps you understand infrastructure relationships. Group monitors by server, see impact at a glance, and diagnose issues faster.
Read moreServer Monitoring: Complete Guide to Infrastructure Visibility
Learn how to monitor your servers effectively - CPU, memory, disk, and processes. Understand why server monitoring matters and how it complements uptime monitoring.
Read moreReady to monitor your uptime?
Start monitoring your websites, APIs, and services in minutes. Free forever for small projects.