Why sizing goes wrong in both directions
Under-provisioning causes outages and slow response times right when traffic matters most. Over-provisioning quietly burns budget every month on capacity you never use. Both mistakes come from the same root cause: sizing infrastructure by guesswork or by copying whatever instance size looked reasonable in a tutorial, instead of reasoning from your actual workload.
Getting sizing right doesn't require deep infrastructure expertise — it requires answering three questions clearly: what kind of workload is this, how many users does it actually serve, and how much redundancy does it need.
Workload type changes everything
The same user count needs very different resources depending on what the application actually does per request:
- Web app / API — request-driven, typically stateless, modest compute per request. This is the lightest workload class per user.
- Database-heavy — apps with significant query or reporting load hold larger working sets in memory and do more disk I/O, so they need more RAM and storage per user than a simple API.
- Batch / background jobs — queue workers, scheduled jobs, and ETL pipelines are bursty and less latency-sensitive, but can spike CPU hard during processing windows.
- ML inference / analytics — the heaviest class per request. Model serving and analytics workloads are CPU or GPU intensive, and sizing here often dominates the whole infrastructure budget.
Picking the wrong workload profile is the single most common sizing mistake — treating a database-heavy app like a lightweight API leads to under-provisioning that only shows up under real load.
Peak load, not average load
Average traffic is not the number that matters — peak traffic is. If your infrastructure is sized exactly to average load, any spike above that average degrades performance or causes outages. A buffer of roughly 30% above average is a reasonable default for steady-state applications. Push that buffer to 50-100%+ if you have predictable spikes — marketing campaigns, month-end batch runs — or unpredictable ones, like viral traffic risk.
Redundancy: single, HA pair, or cluster
- Single instance — no failover. Fine for development, staging, or genuinely low-stakes internal tools where downtime is an inconvenience, not an incident.
- HA pair (active/passive) — two instances for failover. This is the practical minimum for anything customer-facing in production; a single instance means a single point of failure for your entire business.
- Cluster (3+ nodes) — horizontal scale plus resilience. Needed once traffic exceeds what two nodes can comfortably handle, or when uptime SLAs are strict enough that losing one node can't mean losing service.
Redundancy roughly multiplies your compute needs (an HA pair is close to double a single instance), but storage scales less than 1:1 across replicas since it's often shared or replicated more efficiently than compute.
Get a sizing recommendation for your app
Enter your user count, workload type, and redundancy needs to see recommended vCPU/RAM/storage and a monthly cost range.
Turning an estimate into a real instance
Once you have a recommended vCPU and RAM figure, match it to the closest general-purpose instance size from your chosen provider — AWS's m-series, GCP's e2 family, or general-purpose droplets on a budget VPS provider — then add the recommended storage as attached block storage. Provider pricing varies significantly by tier: budget VPS providers, major hyperscalers, and managed/premium tiers all price the same vCPU/RAM/storage very differently, so the "right" provider tier depends on how much hands-on management you want versus how much you're willing to do yourself.
Need help translating a sizing estimate into an actual architecture, or reviewing an existing deployment for right-sizing?
Talk to us about infrastructure →