System status.
Last checked: updated automatically on page load.
Incident history.
Webhook delivery delays
PagerDuty and Slack webhooks delayed by 8–20 minutes due to queue backup on the delivery worker. Duration: 15 minutes. No data loss. Resolved by restarting the delivery consumer and increasing queue concurrency.
Dashboard latency spike
Dashboard P99 load time elevated to 4.2 seconds (normal: under 800ms) due to an unoptimized query introduced in v0.9.0. Duration: 8 minutes. Fixed with index addition and query rewrite in hotfix v0.9.0-patch1.
Ingestion API elevated error rate
OTLP gRPC endpoint returning 5xx errors for approximately 12% of requests during a burst load period. Duration: 22 minutes. Root cause: connection pool exhaustion on ingestion database. Fixed by increasing pool size and adding circuit breaker.