Replies: 1 comment
-
A good article. Felt the same about time-consuming loaders and laggy animations being irritating to the users during development. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Mobile app reliability is more nuanced than many of us realise. While we obsess over crash-free rates and ANRs, there are many other types of reliability issues that users face and are often missed by our traditional metrics.
Think about reliability from a user's perspective. When does an app feel unreliable? It's not just when it crashes. It's when:
Each of these scenarios represents a different type of reliability failure. Yet traditional monitoring often misses these "soft failures" entirely.
Moving Beyond Binary Metrics
Traditional crash monitoring is too simplistic. It only tells you if your app is running or crashed. But real app reliability is more complex.
Consider this: Your app might be "running" but taking 10 seconds to load content. No crash reported, but users are frustrated.
This is just one example. Your app might have twenty such issues; few examples listed above.
Here's the real problem: These issues rarely affect the same users. While each problem might impact only 5% of your users, different groups experience different issues. The result? A much larger percentage of your user base is having a poor experience.
It's death by a thousand paper cuts – each issue seems small in isolation, but together they create a significant reliability problem.
This is where Service Level Indicators (SLIs) come in. Rather than tracking simple up/down states, effective SLIs measure success rates of specific user interactions:
The key insight? Track what users actually experience.
How do you do that? By connecting three critical data streams:
When these data streams align, you get a complete picture of user experience. For instance, you might discover that users on specific Android devices experience frame drops during scroll animations, or iOS users on older devices face longer load times for image-heavy screens.
The challenge? Traditionally, this data lives in different tools and dashboards. Consider consolidating your monitoring stack to see these patterns more clearly. Modern monitoring solutions can capture this data automatically, helping you spot reliability issues before users report them.
Setting Meaningful Reliability Goals
Once we have meaningful measurements, we can set realistic Service Level Objectives (SLOs). Target metrics that actually matter:
These goals directly reflect user experience and give us actionable targets for improvement.
The Role of Client-Side Telemetry
Server logs can't tell us if a button press felt responsive or if an animation stuttered. Only client-side telemetry can capture these crucial user experience metrics.
The key areas to monitor:
Remember to respect user privacy when collecting this data – only gather what you need, and be transparent about it.
Making Data-Driven Reliability Decisions
The real power of good reliability metrics is in decision making. When should you pause feature development to focus on performance? When is reliability "good enough"? These decisions become much clearer with solid data.
If your SLOs consistently show high reliability, you might have room to move faster on features. If metrics are trending down, it might be time to invest in performance optimisation.
What reliability metrics matter most for your app? How do you balance reliability work against new feature development?
⭐ If you like this post, please checkout Measure. It's an open source tool to monitor mobile apps. It captures crashes, ANRs, navigation events, API requests, and much more to create detailed session timelines that help find patterns and get to the root cause of issues. Check it out here and feel free to star it for updates!
Beta Was this translation helpful? Give feedback.
All reactions