Back to Blog
Guides
infrastructure monitoring
uptime monitoring

Infrastructure-Aware Uptime Monitoring: Beyond Simple Checks

Learn how infrastructure-aware monitoring combines uptime checks with server metrics. Understand why knowing your endpoints isn't enough without knowing your infrastructure.

WT

Wakestack Team

Engineering Team

6 min read

Who This Is For

This guide is for SREs, DevOps engineers, and platform teams who want to understand not just when services fail, but why. If you're tired of knowing something is broken but not knowing the cause, infrastructure-aware monitoring solves this.

What Is Infrastructure-Aware Uptime Monitoring?

Traditional uptime monitoring answers: "Is it up?"

Infrastructure-aware monitoring answers: "Is it up, and if not, why?"

Traditional Approach

┌──────────────────────┐
│   Uptime Monitor     │
│                      │
│   ✗ API is down      │
│                      │
│   (Why? No idea)     │
└──────────────────────┘

Infrastructure-Aware Approach

┌──────────────────────────────────────────┐
│   Infrastructure-Aware Monitor            │
│                                          │
│   ✗ API is down                          │
│                                          │
│   Server Metrics:                        │
│   • CPU: 98% ← Likely cause              │
│   • Memory: 85%                          │
│   • Disk: 72%                            │
│   • Process: node (consuming 95% CPU)    │
└──────────────────────────────────────────┘

The Problem with Traditional Uptime Monitoring

You Know THAT, Not WHY

When you get an alert "API is down," you start guessing:

  • Did we deploy bad code?
  • Is the server overloaded?
  • Did the database crash?
  • Is it a network issue?

Then you SSH in and start investigating.

No Context = Slow Resolution

Without infrastructure context:

  1. Get alert (T+0)
  2. Log into server (T+2 min)
  3. Run diagnostic commands (T+5 min)
  4. Find the problem (T+10 min)
  5. Start fixing (T+10 min)

MTTR: 10+ minutes just to diagnose

With infrastructure context:

  1. Get alert with server metrics (T+0)
  2. See CPU at 98% (T+0)
  3. Start fixing (T+1 min)

MTTR: Under 2 minutes to diagnose

Separate Tools = Context Switching

Many teams use:

  • UptimeRobot or Pingdom for uptime
  • Datadog or New Relic for infrastructure
  • A status page tool for communication

During incidents, you're switching between tabs, correlating timestamps, losing time.

Wakestack's Infrastructure-Aware Approach

Wakestack combines:

1. Uptime Monitoring

  • HTTP/HTTPS checks
  • TCP port monitoring
  • DNS verification
  • SSL certificate checks

2. Server Monitoring

  • CPU usage and load
  • Memory consumption
  • Disk space and I/O
  • Process monitoring

3. Nested Organization

  • Group monitors by server
  • See relationships
  • Understand blast radius

4. Status Pages

  • Communicate with users
  • Automatic component status
  • Incident management

All In One Dashboard

Production API Server (api-prod-01)
├── Uptime Checks
│   ├── ✗ /api/health (503 error)
│   ├── ✗ /api/users (timeout)
│   └── ✗ /api/orders (timeout)
│
├── Server Metrics
│   ├── CPU: 98% ⚠️ CRITICAL
│   ├── Memory: 72%
│   └── Disk: 45%
│
└── Processes
    ├── node (CPU: 95%)  ← Found it
    ├── nginx (CPU: 1%)
    └── postgres (CPU: 2%)

How Infrastructure-Aware Monitoring Works

Step 1: Install Server Agent

Deploy Wakestack's lightweight Go agent:

curl -sSL https://wakestack.co.uk/install.sh | bash

The agent collects:

  • System metrics every 30 seconds
  • Process list and resource usage
  • Disk I/O and network stats

Step 2: Create Uptime Monitors

Add monitors for your endpoints:

  • API health checks
  • Website availability
  • Database ports

Associate monitors with their servers:

api.example.com/health → API Server Host

Step 4: See the Combined View

When issues occur, you see everything together:

  • Which endpoints are affected
  • What server resources look like
  • Which processes are consuming resources

Real-World Examples

Example 1: Memory Leak

Alert: API response time degraded

Traditional approach:

  1. Check uptime monitor: "Yes, it's slow"
  2. SSH into server
  3. Run free -h, see low memory
  4. Run top, find memory-heavy process
  5. Restart or fix

Infrastructure-aware approach:

  1. Check dashboard: See memory at 95%
  2. See process list: node at 8GB RSS
  3. Restart or fix

Time saved: 5-10 minutes

Example 2: Disk Full

Alert: Database connection failing

Traditional approach:

  1. Check uptime monitor: "TCP 5432 not responding"
  2. SSH into database server
  3. Try psql, see errors
  4. Check logs, see disk errors
  5. Run df -h, see disk full
  6. Clear logs

Infrastructure-aware approach:

  1. Check dashboard: See disk at 100%
  2. Clear logs

Time saved: 5-8 minutes

Example 3: Traffic Spike

Alert: Multiple endpoints slow

Traditional approach:

  1. Check multiple monitors individually
  2. Notice they're all on the same server
  3. SSH in, see high CPU
  4. Check if it's attack or legitimate
  5. Scale or block

Infrastructure-aware approach:

  1. See server group showing high CPU
  2. All child monitors affected
  3. Network stats show traffic spike
  4. Scale or block

Time saved: 5-10 minutes

Comparing Approaches

Single-Purpose Uptime Tools

Tools: UptimeRobot, Pingdom, Better Stack

What they do:

  • Check endpoints externally
  • Alert when down
  • Some offer status pages

What they don't do:

  • Server resource monitoring
  • Root cause visibility
  • Infrastructure relationships

Full Observability Platforms

Tools: Datadog, New Relic, Dynatrace

What they do:

  • Everything (APM, logs, metrics, traces, synthetics)
  • Deep infrastructure visibility
  • Complex dashboards

What they cost:

  • $15-50+ per host per month
  • Often thousands monthly

Infrastructure-Aware Uptime (Wakestack)

What it does:

  • Uptime monitoring
  • Essential server metrics
  • Nested host organization
  • Status pages

What it costs:

  • $29/month (Pro)

Trade-off: Less deep than Datadog, more context than UptimeRobot

Who Needs Infrastructure-Aware Monitoring?

You Need It If:

  • ✅ You manage your own servers
  • ✅ You SSH into boxes during incidents
  • ✅ You have 10+ monitors to organize
  • ✅ You want faster incident resolution
  • ✅ You don't need full APM/tracing

You Don't Need It If:

  • ❌ You use serverless/PaaS exclusively
  • ❌ You already have Datadog/New Relic
  • ❌ You only have 1-2 simple endpoints
  • ❌ You never need to know why things fail

Setting Up Infrastructure-Aware Monitoring

Step 1: Sign Up

Create your free account at wakestack.co.uk/signup

Step 2: Add Your Servers as Hosts

  1. Create a host for each server
  2. Install the agent
  3. Verify metrics flowing

Step 3: Add Monitors Under Hosts

  1. Create uptime monitors
  2. Link to parent hosts
  3. See combined view

Step 4: Configure Alerts

Set thresholds for:

  • Endpoint failures
  • High CPU (>80%)
  • Low memory (>85% used)
  • Low disk (under 20% free)

Step 5: Create Status Page

Add components that auto-update based on monitors.

Try Infrastructure-Aware Monitoring

See how combining uptime and infrastructure changes incident response.

  • Free tier to try it
  • Server agent included
  • Status pages included
  • No credit card required

Get Started →

About the Author

WT

Wakestack Team

Engineering Team

Frequently Asked Questions

What is infrastructure-aware uptime monitoring?

Infrastructure-aware monitoring combines traditional uptime checks (is the endpoint responding?) with server metrics (CPU, memory, disk). This gives you both the what and the why when issues occur.

Why isn't regular uptime monitoring enough?

Regular uptime monitoring tells you THAT something is down. Infrastructure-aware monitoring tells you WHY—was it a server resource issue, network problem, or application bug?

Do I need separate tools for uptime and infrastructure monitoring?

Traditionally yes, but tools like Wakestack combine both. This integration reduces context switching and provides faster root cause analysis.

Related Articles

Ready to monitor your uptime?

Start monitoring your websites, APIs, and services in minutes. Free forever for small projects.