The Five Second Contract
Queueing looks harmless at first. A slow call holds a few threads, then a few more, and if left unchecked it can tip the whole system over.
Players have an internal clock. If something they tap takes more than a few seconds it already feels uncertain. At five seconds most people abandon the flow. They do not complain. They simply adjust their behaviour. The pain shows up in conversion long before it appears in error logs.
That five second line is not guidance. It is the boundary the product has to respect. When we cross it we lose the player and we strain the backend at the same time.
The quiet cost of waiting
Most of our services still rely on synchronous calls for important work. In a synchronous model the request thread waits for the backend. Waiting is not free. It is capacity held in limbo.
The relationship is simple.
threads_in_use = RPS × request_time_seconds
A one second call at scale is a nuisance. A ten or twenty second call becomes the dominant cost. With no timeout the pressure has no ceiling and the system behaves accordingly.
Surges turn delay into load
Take a surge of five thousand requests per second. The thread counts below show what would be required to keep up.
| Timeout or latency | Threads at 50 RPS | Threads at 200 RPS | Threads at 5 000 RPS |
|---|---|---|---|
| 0.1 s | 5 | 20 | 500 |
| 1 s | 50 | 200 | 5 000 |
| 2 s | 100 | 400 | 10 000 |
| 4 s | 200 | 800 | 20 000 |
| 8 s | 400 | 1 600 | 40 000 |
| 16 s | 800 | 3 200 | 80 000 |
| 30 s | 1 500 | 6 000 | 150 000 |
| No timeout | unbounded | unbounded | unbounded |
The last row is where the picture tilts. With no timeout the system has no upper limit. Threads accumulate until the runtime slows, the autoscaler reacts or the service folds under its own weight. In practice it goes bang long before the maths finishes.
Why long timeouts rarely help
A long timeout can feel gentle. It gives the backend more room to reply. The trouble is that the player has usually moved on long before it expires. The extra time rarely improves the experience. It does, however, hold the thread for longer and raise the load during busy periods. The return is small. The cost is not.
A note on the five second line
It is worth being honest about this boundary. Five seconds is not a comfort zone. A one second delay is unnoticed. Two seconds is tolerable. Around three seconds people begin to doubt the product. Four seconds erodes confidence. By five seconds the experience has already slipped.
So the five second line is not something to aim for. It is the point where we can no longer pretend the interaction is intact. We use it because it is memorable, because it exposes genuine structural issues, and because anything that approaches it deserves a design conversation, not a longer timeout.
The design question
Once an operation stretches beyond five seconds the timeout stops being the interesting part. The shape of the work becomes the question.
At that point the synchronous model shows its limits. Holding a player while a backend completes a long or unpredictable task is rarely a good trade. The experience becomes fragile, retries pile up and the backend carries threads that could serve fresh requests instead.
A better pattern is to change the choreography. Let the player tell us what they want to do. Confirm that we have received it. Move the long work into an asynchronous worker. When the job completes, update the player. The action still happens, but the wait no longer sits on the critical path of their session.
This looks like a small shift in code. It feels larger in production.
Money, Money, Money
There is a financial side to all of this. In AWS we pay for capacity whether it is doing real work or sitting blocked. A waiting thread ties up an instance even though nothing useful is happening. The platform carries that cost simply because the backend might eventually reply.
To make this concrete, consider a c6a.large in São Paulo at roughly 0.1179 USD per hour, about 86 USD per month. Assume a service can handle around one thousand waiting threads per instance before context switching and memory pressure become uncomfortable. These numbers are illustrative, but the pattern is real.
| Timeout or latency | Approx. concurrency | Instances needed (≈ 1 000 waiting threads each) | Approx. monthly cost |
|---|---|---|---|
| 2 s | 10 000 | ≈ 10 | ≈ 860 USD |
| 4 s | 20 000 | ≈ 20 | ≈ 1 720 USD |
| 8 s | 40 000 | ≈ 40 | ≈ 3 440 USD |
| 16 s | 80 000 | ≈ 80 | ≈ 6 880 USD |
| 30 s | 150 000 | ≈ 150 | ≈ 12 900 USD |
| No timeout | unbounded | unbounded | unbounded |
These machines are not busy. They sit holding threads that wait. The successful RPS does not change. The only shift is how long each request occupies a thread. In a fleet with a steady baseline, this pattern can push compute cost toward a tenfold increase during heavy traffic.
A design that finishes quickly, or that hands long work to an asynchronous path, keeps instance counts tied to real demand and stops us paying for capacity whose main purpose is to wait politely.
The contract we inherit
The player never sees the thread count. They see only the spinner that pushed past their patience. The five second threshold is already part of how they interact with us. Our job is to stay comfortably on the near side of it.
The work is not in enforcing the timeout. The work is in designing and scaling services so the timeout rarely enters the conversation at all.