What Is an SLA vs SLO vs SLI? (Clear Comparison)
SLAs, SLOs, and SLIs are related but different. SLIs measure, SLOs target, and SLAs promise. Learn the differences with clear examples and when to use each.
Wakestack Team
Engineering Team
The Quick Answer
| Term | What It Is | Example |
|---|---|---|
| SLI | A measurement | "Availability is 99.95%" |
| SLO | A target | "Availability should be ≥ 99.9%" |
| SLA | A promise with consequences | "If availability < 99.5%, customer gets 10% credit" |
- SLI = What you measure
- SLO = What you aim for internally
- SLI = What you promise externally
SLI: Service Level Indicator
An SLI is a metric that quantifies some aspect of your service.
Common SLIs
Availability
SLI = (Successful requests / Total requests) × 100
Example: 99.95% of requests succeeded
Latency
SLI = Response time at a percentile
Example: p99 latency is 180ms
Error Rate
SLI = (Failed requests / Total requests) × 100
Example: 0.1% of requests failed
Throughput
SLI = Requests per second
Example: System handles 5,000 RPS
What Makes a Good SLI?
Good SLIs are:
- Measurable: You can collect the data
- Meaningful: They reflect user experience
- Actionable: You can improve them
- Specific: Clear definition, no ambiguity
Bad SLI: "The system is fast" Good SLI: "p95 response time in milliseconds"
SLIs Measure User Experience
The best SLIs measure what users actually experience:
- Can they reach the service?
- Is it responding quickly?
- Are requests succeeding?
Internal metrics (CPU, memory) aren't SLIs—they're diagnostic data. SLIs measure outcomes, not internals.
SLO: Service Level Objective
An SLO is a target value for an SLI. It defines "good enough."
SLO Examples
| SLI | SLO |
|---|---|
| Availability | ≥ 99.9% |
| p99 latency | ≤ 200ms |
| Error rate | ≤ 0.1% |
| Data freshness | ≤ 5 minutes stale |
SLOs Are Internal Targets
SLOs are for your team, not your customers. They define:
- When to prioritise reliability work
- When to slow down feature development
- When the service is "healthy enough"
Setting Good SLOs
Too aggressive: 99.99% availability when you can barely hit 99.5%
Too loose: 95% availability when users expect 99.9%
Good SLOs are:
- Achievable with current architecture
- Aligned with user expectations
- Ambitious enough to drive improvement
Error Budgets
If your SLO is 99.9% availability, you have an error budget of 0.1%.
Over a month (43,200 minutes):
- 0.1% budget = 43.2 minutes of allowed downtime
Error budgets let you:
- Ship features (spending budget on risk)
- Prioritise reliability (when budget is exhausted)
- Have objective conversations about trade-offs
SLA: Service Level Agreement
An SLA is a contract that specifies consequences for missing service levels.
SLA Examples
"If monthly availability falls below 99.5%, affected customers receive a 10% service credit."
"If p95 latency exceeds 500ms for more than 1 hour, customer may terminate without penalty."
SLAs vs SLOs
| Aspect | SLO | SLA |
|---|---|---|
| Audience | Internal team | External customers |
| Consequences | Prioritisation decisions | Financial/legal penalties |
| Typical level | Stricter | Looser |
| Negotiation | Engineering decision | Business/legal decision |
Why SLAs Are Looser Than SLOs
Smart companies set SLAs below their SLOs:
- SLO: 99.9% availability (internal target)
- SLA: 99.5% availability (external promise)
This buffer means:
- You can miss your SLO without SLA violations
- Customers still get reliable service
- You have room for unexpected issues
SLAs Need Teeth
An SLA without consequences isn't an agreement—it's marketing.
Real SLAs define:
- What's measured and how
- The threshold for violation
- What happens when violated (credits, refunds, termination rights)
- How violations are reported and claimed
How They Work Together
SLI (Measurement)
↓
SLO (Target)
↓
SLA (Promise)
Example: An API Service
SLI Definition:
- Availability = (2xx responses) / (total responses)
- Latency = p99 response time
- Measured every minute, aggregated monthly
SLO Targets:
- Availability ≥ 99.9%
- p99 latency ≤ 150ms
SLA Promise:
- Availability ≥ 99.5% or 10% credit
- p99 latency ≤ 300ms or 5% credit
The Flow
- You measure availability (SLI): Currently 99.85%
- You compare to target (SLO): Below 99.9%, need attention
- You check against promise (SLA): Above 99.5%, no violation
Even though you missed your internal target, customers aren't impacted from an SLA perspective.
Common Mistakes
Mistake 1: No SLIs
Setting SLOs without clear measurement definitions leads to arguments about whether you're meeting them.
Fix: Define exactly how each SLI is calculated before setting SLOs.
Mistake 2: SLOs = SLAs
If your SLO equals your SLA, every near-miss is an SLA violation.
Fix: Build a buffer. SLA should be achievable even when you miss SLO.
Mistake 3: Too Many SLOs
Tracking 50 SLOs means none get focus.
Fix: 3-5 SLOs that capture user experience. Everything else is metrics, not objectives.
Mistake 4: SLOs Without Error Budgets
SLOs without error budgets are just numbers. There's no framework for decisions.
Fix: Calculate error budgets. Use them to balance reliability and velocity.
Mistake 5: Ignoring the User
SLOs based on internal metrics (CPU, memory) miss the point.
Fix: Base SLOs on what users experience—availability, latency, correctness.
Practical Implementation
Step 1: Choose Your SLIs
Start with availability and latency. Add more only if needed.
Step 2: Measure Baseline
What are your current SLI values? You need this before setting targets.
Step 3: Set SLOs
Based on:
- Current performance
- User expectations
- Business requirements
Step 4: Calculate Error Budgets
Error budget = (1 - SLO) × time period
For 99.9% availability over 30 days:
Budget = 0.1% × 43,200 minutes = 43.2 minutes
Step 5: Define SLAs (If Needed)
Only if you have external customers. Set lower than SLOs.
Step 6: Monitor and Alert
- Dashboard showing SLI values
- Alerts when approaching SLO thresholds
- Error budget burn rate tracking
Summary
SLI (Service Level Indicator): A measurement of service behaviour.
- Example: "Availability is 99.95%"
SLO (Service Level Objective): An internal target for an SLI.
- Example: "Availability should be ≥ 99.9%"
SLA (Service Level Agreement): An external promise with consequences.
- Example: "If availability < 99.5%, customer gets credit"
The hierarchy:
- SLIs measure what matters
- SLOs set targets for measurements
- SLAs make promises based on those targets
Start with SLIs, set realistic SLOs, and only make SLA promises you can keep.
Frequently Asked Questions
What is an SLI?
An SLI (Service Level Indicator) is a metric that measures your service's behaviour. Examples include availability percentage, response time, and error rate.
What is an SLO?
An SLO (Service Level Objective) is a target value for an SLI. For example, 'availability should be 99.9%' or 'p99 latency should be under 200ms.'
What is an SLA?
An SLA (Service Level Agreement) is a contract with consequences. It defines what happens (refunds, credits, penalties) if you fail to meet certain service levels.
What's the relationship between SLA, SLO, and SLI?
SLIs measure service behaviour. SLOs set internal targets for those measurements. SLAs make external promises with consequences for missing them. SLAs are typically looser than SLOs to allow a safety buffer.
Related Articles
What Does '99.9% Uptime' Actually Mean in Real Life?
99.9% uptime sounds impressive until you calculate the downtime. Learn what different uptime percentages actually mean and how to set realistic SLA targets.
Read moreWhat Is Mean Time to Detect (MTTD)?
Mean Time to Detect (MTTD) measures how long it takes to discover a problem after it starts. Learn how to calculate MTTD, why it matters, and how to improve it.
Read moreWhat Is Mean Time to Resolve (MTTR)?
Mean Time to Resolve (MTTR) measures how long it takes to fix a problem completely. Learn how to calculate MTTR, what affects it, and strategies to reduce it.
Read moreReady to monitor your uptime?
Start monitoring your websites, APIs, and services in minutes. Free forever for small projects.