Catchpoint unveiled its
annual site reliability engineering (SRE) report for 2025. The
industry-leading report offers unique insights from over 300
professionals spanning the global IT and reliability community,
including engineers, managers, architects, and executives.
Download the 2025 SRE Report
Now in its seventh year, the SRE Report is widely considered the
authentic, independent voice of the reliability community and
underscores the role of SRE as an indispensable practice in maintaining
high-performing, resilient digital services and applications. This
year's report highlights valuable insights into the challenges and
opportunities facing SRE teams in an era marked by rapid technological
advancement and escalating performance expectations.
"Success starts with individuals owning their role in the bigger
picture, and that starts with embracing SRE as more than a technical
enhancement," said Mehdi Daoudi, CEO and co-founder of Catchpoint. "When
teams understand how their work drives outcomes, it becomes easier to
align around the opportunities that matter and the steps to seize them,
and what's a major concern this year is that organizations are feeling
pressured to prioritize release schedules over reliability."
Key findings from the report include:
-
Slow is the new down: 53% of organizations agree that poor performance is as harmful as downtime, elevating user experience to a key reliability metric.
-
Toil levels rise despite AI: After five years of steady decline,
the median reported percentage of work spent on toil has increased to
30% from 25% in 2024, raising questions about AI's impact on daily
workloads.
-
Organizational priorities under pressure: Over two thirds of
respondents acknowledge frequently feeling pressured to prioritize
release schedules over reliability, reflecting the ongoing struggle
between agility and stability.
-
Multiple monitoring tools are the norm: Most organizations use
between 2-10 monitoring or observability tools, emphasizing a "value
over cost" mindset for effective monitoring across technology stacks.
-
AI training in demand, but time-constrained: 30% of respondents
prioritized technical training on AI. As the second most selected
sentiment, this highlights a strong desire for upskilling, even as the
top sentiment (37%) reflects a cautious approach to AI implementations.
-
Incidents as a certainty: 40% of respondents reported handling
between 1 and 5 incidents in the last 30 days. Notably, incident
response is a shared responsibility across all levels, with higher-level
managers as involved as individual contributors.
-
Continued misalignment on reliability priorities: While overall
responses paint a positive picture of reliability practices, significant
differences emerge when analyzed by managerial responsibility,
highlighting a gap in alignment on priorities and approaches.
"What was most eye opening from our report findings this year was that,
for most teams, it seems the burden of operational tasks has grown for
the first time in five years," said Leo Vasiliou, Director, Product
Marketing at Catchpoint and author of the SRE Report. "The expectation
was that AI would reduce toil, not exacerbate it."