Back to Blog
Industry Insights
infrastructure monitoring
nested monitoring

How Nested Infrastructure Changes the Way You Monitor Systems

Modern infrastructure is hierarchical: services run on servers, servers in clusters, clusters in regions. Flat monitoring tools can't represent this. Here's why structure matters.

WT

Wakestack Team

Engineering Team

7 min read

Infrastructure isn't flat—it's nested. Services run on servers. Servers live in clusters. Clusters exist in regions. But most monitoring tools show everything in one flat list: 50 monitors, no structure, no relationships.

When something fails, you're left correlating manually: "Which of these 12 failing monitors are actually the same problem?"

The Flat Monitoring Problem

Traditional Approach

Monitors (flat list):
├── API health check
├── API server CPU
├── API server memory
├── Database connection check
├── Database server CPU
├── Database server disk
├── Worker queue depth
├── Worker server CPU
├── Redis ping
├── Redis server memory
└── ... 40 more monitors

When something fails:

Alerts:
├── API health check: TIMEOUT
├── API server CPU: 98%
├── Worker queue depth: BACKING UP
├── Database connection check: TIMEOUT

Question: What's the root cause?
Answer: You have to figure it out manually.

The Mental Work Required

With flat monitoring, during every incident you must:

  1. See which monitors are failing
  2. Remember which services run where
  3. Manually correlate failures
  4. Deduce the root cause

This works with 10 monitors. It breaks at 50.

Infrastructure Is Hierarchical

Real infrastructure has structure:

Production Environment
├── Region: US-East
│   ├── Server: api-prod-01
│   │   ├── Service: API (port 3000)
│   │   └── Service: Background Worker
│   │
│   ├── Server: api-prod-02
│   │   └── Service: API (port 3000)
│   │
│   └── Server: db-prod-01
│       └── Service: PostgreSQL
│
└── Region: EU-West
    ├── Server: api-eu-01
    │   └── Service: API
    └── Server: cache-eu-01
        └── Service: Redis

Failures cascade down this hierarchy:

  • If api-prod-01 fails → API and Worker are both affected
  • If US-East network fails → Everything in that region is affected
  • If db-prod-01 fails → All services depending on it are affected

Nested Monitoring Structure

Nested monitoring represents this hierarchy in your monitoring tool:

Wakestack Dashboard:

Production
├── 🖥️ api-prod-01 (Server)
│   ├── CPU: 45%
│   ├── Memory: 62%
│   ├── Disk: 55%
│   ├── 🌐 API Health (/health) - 200 OK
│   └── 🌐 Worker Health (/worker/health) - 200 OK
│
├── 🖥️ api-prod-02 (Server)
│   ├── CPU: 38%
│   ├── Memory: 58%
│   └── 🌐 API Health (/health) - 200 OK
│
└── 🖥️ db-prod-01 (Server)
    ├── CPU: 22%
    ├── Memory: 78%
    ├── Disk: 65%
    └── 🌐 PostgreSQL (port 5432) - Connected

What This Enables

Immediate root cause visibility:

Before (flat):
├── API /health: TIMEOUT
├── Worker /health: TIMEOUT
├── Some CPU metric: HIGH

After (nested):
├── 🖥️ api-prod-01: ⚠️ WARNING
│   ├── CPU: 98% ← Root cause visible
│   ├── 🌐 API Health: TIMEOUT
│   └── 🌐 Worker Health: TIMEOUT

One server problem, affecting two services.
Clear immediately.

Benefits of Hierarchical Monitoring

1. Instant Correlation

When failures are grouped by their host:

Incident: Multiple services down

Flat view:
├── API timeout
├── Worker timeout
├── Cache errors
└── "3 unrelated failures?"

Nested view:
├── 🖥️ api-prod-01: DOWN
│   ├── API: TIMEOUT (caused by server)
│   └── Worker: TIMEOUT (caused by server)
└── 🖥️ cache-prod-01: OK

"1 server down, 2 services affected."

2. Cascading Status

The parent reflects child status:

🖥️ api-prod-01: ⚠️ WARNING
├── CPU: 92% ← This triggers server warning
├── Memory: 60%
├── API: OK
└── Worker: OK

Server shows warning even though services still respond.
You see the problem before it becomes an outage.

3. Organized at Scale

At 100+ monitors, hierarchy is essential:

Production
├── US-East (12 hosts, all healthy)
├── US-West (8 hosts, 1 warning)
│   └── cache-west-02: Memory 88%
├── EU-West (6 hosts, all healthy)
└── Asia (4 hosts, all healthy)

Collapse regions that are healthy.
Expand the one with issues.
Scale from overview to detail.

4. Meaningful Status Pages

Hierarchy maps to status page components:

Internal structure → Public status page

🖥️ api-prod-01      →  "API"
🖥️ api-prod-02      →

🖥️ db-prod-01       →  "Database"

🖥️ cache-prod-01    →  "Core Services"
🖥️ worker-prod-01   →

Aggregate related infrastructure into customer-facing components.

5. Smarter Alerting

Alert on the root cause, not symptoms:

Without hierarchy:
Alert 1: API timeout
Alert 2: Worker timeout
Alert 3: CPU high on api-prod-01

With hierarchy:
Alert: api-prod-01 CPU critical
(API and Worker failures are symptoms, not separate alerts)

Fewer alerts, clearer signal.

Common Patterns

Pattern 1: Service per Server

Simple deployments where each server runs one thing:

Production
├── 🖥️ web-server
│   └── 🌐 Website health
├── 🖥️ api-server
│   └── 🌐 API health
└── 🖥️ database
    └── 🌐 PostgreSQL connection

Pattern 2: Multiple Services per Server

Common for smaller teams:

Production
└── 🖥️ main-server
    ├── 🌐 Website (/health)
    ├── 🌐 API (/api/health)
    ├── 🌐 Worker (/worker/health)
    └── 🌐 PostgreSQL (port 5432)

Pattern 3: Load-Balanced Services

Multiple servers behind a load balancer:

Production
├── 🌐 Load Balancer (external check)
├── 🖥️ api-01
│   └── 🌐 API direct health
├── 🖥️ api-02
│   └── 🌐 API direct health
└── 🖥️ api-03
    └── 🌐 API direct health

Pattern 4: Regional Deployment

Multi-region with location-based grouping:

Production
├── 📍 US-East
│   ├── 🖥️ api-us-east-01
│   └── 🖥️ api-us-east-02
├── 📍 EU-West
│   ├── 🖥️ api-eu-west-01
│   └── 🖥️ api-eu-west-02
└── 📍 Asia
    └── 🖥️ api-asia-01

Pattern 5: Kubernetes-Style

Cluster → Namespace → Workload:

Production Cluster
├── 📦 Namespace: api
│   ├── Deployment: api-server
│   └── Deployment: api-worker
├── 📦 Namespace: data
│   ├── Deployment: postgres
│   └── Deployment: redis
└── 📦 Namespace: monitoring
    └── Deployment: prometheus

How Wakestack Implements This

Wakestack uses nested hosts:

Create Parent Hosts

Parent: "Production API"
├── Type: Group
└── Children: api-01, api-02, api-03

Add Server Hosts with Agent

Host: api-01
├── Type: Server
├── Agent: Installed
├── Metrics: CPU, Memory, Disk
└── Parent: Production API

Add Monitors Under Hosts

Host: api-01
├── Monitor: HTTP /health
├── Monitor: HTTP /api/status
└── Monitor: TCP 5432 (database)

Result: Structured View

Dashboard:
Production API ✓
├── 🖥️ api-01 ✓
│   ├── CPU: 42% | Memory: 58% | Disk: 45%
│   ├── 🌐 /health → 200 OK (145ms)
│   └── 🌐 /api/status → 200 OK (89ms)
├── 🖥️ api-02 ✓
│   └── ...
└── 🖥️ api-03 ⚠️
    ├── CPU: 89% ← Warning
    └── ...

Try nested monitoring — Organize your infrastructure the way it actually works.

Migration from Flat Monitoring

Step 1: Identify Your Hierarchy

Map your actual infrastructure:

  • What servers do you have?
  • What services run on each?
  • Are there logical groupings?

Step 2: Create Structure

Build from bottom up:

  1. Create server hosts
  2. Assign monitors to their servers
  3. Group servers into logical parents

Step 3: Verify Relationships

Check that failures correlate correctly:

  • Server issue → its monitors affected
  • All child monitors up → parent shows healthy

Step 4: Update Alerting

Take advantage of hierarchy:

  • Alert on server status, not individual monitor failures
  • Reduce duplicate alerts

Key Takeaways

  • Infrastructure is naturally hierarchical
  • Flat monitoring loses relationships
  • Nested monitoring shows root causes immediately
  • Hierarchy enables smarter alerting (root cause, not symptoms)
  • Scale requires structure—flat breaks at 50+ monitors
  • Status pages benefit from component grouping

About the Author

WT

Wakestack Team

Engineering Team

Frequently Asked Questions

What is nested infrastructure monitoring?

Nested infrastructure monitoring organizes hosts and monitors hierarchically, matching how systems actually deploy: services run on servers, servers in clusters, clusters in regions. This structure shows relationships and enables cascading status—when a server fails, its services are automatically affected.

Why does infrastructure hierarchy matter for monitoring?

Hierarchy matters because failures cascade. When a server goes down, every service on it is affected. Flat monitoring lists show unrelated failures; hierarchical monitoring shows root cause relationships. You see 'server failed' rather than '12 unrelated service alerts.'

How do I organize monitors for microservices?

Organize by deployment location: group monitors under the server or container where services run. For Kubernetes, organize by cluster → namespace → pod. This structure lets you quickly identify whether issues are service-specific or infrastructure-wide.

Related Articles

Ready to monitor your uptime?

Start monitoring your websites, APIs, and services in minutes. Free forever for small projects.