Back to Blog
Industry Insights
agentless monitoring
agent-based monitoring

Why Agentless Monitoring Fails at Scale

Agentless monitoring seems simpler, but it creates blind spots as infrastructure grows. Learn why agent-based monitoring becomes essential at scale.

WT

Wakestack Team

Engineering Team

7 min read

Agentless monitoring looks simpler—no installation, no agent updates, no extra processes. But that simplicity has a cost: blind spots that grow with your infrastructure. What works for 5 servers becomes a liability at 50.

Here's why agentless approaches hit a wall, and when you should make the switch.

How Agentless Monitoring Works

Agentless monitoring checks systems from outside:

Monitoring Server → Target System
        │
        ├── HTTP check: GET /health
        ├── TCP check: Connect to port 5432
        ├── SNMP poll: Request system metrics
        └── SSH probe: Run remote command

No software installed on the target. Everything happens over the network.

Common Agentless Methods

MethodWhat It ChecksLimitation
HTTP/HTTPSWeb endpointsOnly sees HTTP responses
TCPPort availabilityOnly sees "open or closed"
SNMPDevice metricsLimited metrics, security concerns
SSHRemote commandsRequires credentials, adds latency
Cloud APIsProvider metricsLimited to what cloud exposes

Why Agentless Seems Attractive

For small deployments, agentless monitoring wins on:

  • No installation — Just point at endpoints
  • No maintenance — No agent updates
  • Quick setup — Monitoring in minutes
  • No footprint — Nothing running on target systems

At 5 servers, these benefits are real.

Where Agentless Breaks Down

Problem 1: Limited Visibility

Agentless can only see what's exposed externally:

What agentless sees:

Server: api-prod-01
├── HTTP /health: 200 OK
├── Port 443: Open
└── Ping: 15ms

What's actually happening:

Server: api-prod-01
├── HTTP /health: 200 OK
├── CPU: 94% ← Problem
├── Memory: 88% ← Problem
├── Disk: 92% ← Problem
├── Swap: Active ← Big problem
├── Load: 12.5 (8 cores)
└── Processes:
    ├── node: 85% CPU
    ├── postgres: Waiting on I/O
    └── zombie workers: 15 ← Problem

The server looks "up" but is actually in trouble.

Problem 2: Polling Overhead Scales Linearly

With agentless monitoring, every check is a network request:

5 servers × 10 metrics × 1 check/minute = 50 requests/minute
50 servers × 10 metrics × 1 check/minute = 500 requests/minute
500 servers × 10 metrics × 1 check/minute = 5,000 requests/minute

This creates:

  • Network overhead on monitoring server
  • Load on target systems (handling probe requests)
  • Latency in metric collection
  • Credential management at scale

Problem 3: Network Dependency

Agentless monitoring fails when:

  • Network path is congested
  • Firewall rules change
  • Target network is isolated
  • DNS fails
Scenario: Network hiccup

Agentless result:
├── Server 1: Timeout (actually fine)
├── Server 2: Timeout (actually fine)
├── Server 3: Timeout (actually fine)
└── Alert storm: "3 servers down!"

Reality: Network switch flapped for 30 seconds

Problem 4: Security at Scale

Agentless methods need access to targets:

MethodRequires
SNMPCommunity strings (often insecure)
SSHCredentials on monitoring server
Cloud APIsIAM keys/tokens

At scale, managing these credentials becomes a security concern. A compromised monitoring server could access everything.

Problem 5: Internal Services Are Invisible

Services behind firewalls can't be reached agentlessly:

Internet
    │
    │   Firewall
    │   ─────────────────────
    │
    ├── Internal API: Not reachable
    ├── Database: Not reachable
    ├── Cache: Not reachable
    └── Workers: Not reachable

You'd need to punch firewall holes—bad for security.

Agent-Based Monitoring at Scale

Agents flip the model:

Target System
    │
    Agent (runs locally)
    │
    ├── Collects metrics internally
    ├── Full system visibility
    └── Pushes to → Monitoring Server

Why This Scales Better

1. Metric collection is local

Agent on server:
├── Read /proc/stat (CPU) - instant
├── Read /proc/meminfo (Memory) - instant
├── Read /sys/block/*/stat (Disk) - instant
└── Local process list - instant

vs

Agentless from monitoring server:
├── SSH connection - 50-200ms
├── Run command - 100-500ms
├── Parse output - variable
└── Per metric, per server

2. Network efficient

Agent: Collect 50 metrics locally → Send 1 payload → Monitoring server

Agentless: 50 separate network requests per server

3. No credential sprawl

Agent: One API key per agent → Pushes outbound
Agentless: Monitor needs credentials to every system

4. Works behind firewalls

Agent: Initiates outbound HTTPS → Works through firewalls
Agentless: Requires inbound access → Firewall holes needed

The Scale Tipping Point

ScaleRecommendation
1-5 serversAgentless is fine
5-20 serversConsider agents for deeper visibility
20-100 serversAgents strongly recommended
100+ serversAgents essential

Signs You've Outgrown Agentless

  • Frequent "false positive" alerts from network issues
  • Can't diagnose WHY servers are slow
  • Missing metrics for internal services
  • Credential management is painful
  • Alert storms during network hiccups

The Hybrid Approach

Best practice: combine both methods.

External (Agentless)

  • HTTP checks from outside your network
  • SSL certificate monitoring
  • DNS verification
  • TCP port checks

Purpose: "Can users reach us?"

Internal (Agent-Based)

  • Server metrics (CPU, memory, disk)
  • Process monitoring
  • Internal service health
  • Application metrics

Purpose: "Are our systems healthy?"

Complete picture:
├── External: Can users reach the API? ✓
└── Internal: Is the server healthy?
    ├── CPU: 45%
    ├── Memory: 62%
    ├── Disk: 55%
    └── API process: Running

Common Objections to Agents

"Agents are overhead"

Modern agents are lightweight:

  • Wakestack agent: ~10MB memory, negligible CPU
  • Runs once per 30 seconds
  • Less overhead than answering SSH probes

"More things to manage"

Agents are simpler than credential management at scale:

  • Install once
  • Auto-updates
  • No firewall changes
  • No credential rotation

"What if the agent crashes?"

What if SSH access fails? What if SNMP times out?

Both approaches have failure modes. Agents failing is visible and recoverable. Network issues create ambiguous states.

"We use cloud provider monitoring"

Cloud monitoring (CloudWatch, GCP Monitoring) is useful but:

  • Metrics can lag 5+ minutes
  • Limited to what the provider exposes
  • Doesn't cover non-cloud resources
  • Often expensive at scale

Wakestack's Approach

Wakestack uses both approaches:

External Monitoring (Agentless)

  • HTTP/HTTPS, TCP, DNS, Ping
  • Multi-region verification
  • SSL monitoring
  • No installation needed

Server Agent (Agent-Based)

  • Lightweight Go binary
  • CPU, memory, disk, process metrics
  • 30-second granularity
  • Outbound-only communication

Combined View

Production Server
├── External: HTTP check → 200 OK (145ms)
└── Agent: Server metrics
    ├── CPU: 42%
    ├── Memory: 68%
    ├── Disk: 55%
    └── Processes: All running

Both views, one dashboard.

See the difference — Try agent-based monitoring alongside your uptime checks.

Migration Path

If you're currently agentless-only:

Step 1: Keep External Checks

Don't remove HTTP/TCP monitoring. It provides the user perspective.

Step 2: Add Agents to Critical Servers

Start with:

  • Production application servers
  • Database servers
  • Any server where you've had "mystery" slowdowns

Step 3: Correlate Data

When alerts fire, check both:

  • External: Is it down?
  • Agent: What's the system state?

Step 4: Expand Coverage

Add agents to remaining servers as you see value.

Key Takeaways

  • Agentless monitoring works at small scale
  • It fails at scale due to: limited visibility, polling overhead, network dependency
  • Agent-based monitoring scales efficiently with local collection
  • Best practice: use both—external for user perspective, agents for system health
  • The tipping point is usually 5-20 servers
  • Modern agents are lightweight—overhead concerns are outdated

About the Author

WT

Wakestack Team

Engineering Team

Frequently Asked Questions

What is agentless monitoring?

Agentless monitoring checks systems from outside without installing software on them. It uses protocols like HTTP, SNMP, SSH, or APIs to gather data remotely. Examples include HTTP uptime checks and cloud provider API monitoring.

Why does agentless monitoring fail at scale?

Agentless monitoring fails at scale because: it can't see inside systems deeply, polling overhead increases linearly with hosts, it depends on network availability, and it misses granular metrics like process health or disk I/O. It becomes both less effective and less efficient as you grow.

When should I use agent-based monitoring?

Use agent-based monitoring when: you have more than 5-10 servers, you need process-level visibility, you're monitoring systems behind firewalls, or you need detailed metrics beyond basic availability. The upfront installation cost is offset by better visibility and more efficient data collection.

Related Articles

Ready to monitor your uptime?

Start monitoring your websites, APIs, and services in minutes. Free forever for small projects.