The Five Second Contract

Queueing looks harmless at first. A slow call holds a few threads, then a few more, and if left unchecked it can tip the whole system over.

The Five Second Contract

Players have an internal clock. If something they tap takes more than a few seconds it already feels uncertain. At five seconds most people abandon the flow. They do not complain. They simply adjust their behaviour. The pain shows up in conversion long before it appears in error logs.

That five second line is not guidance. It is the boundary the product has to respect. When we cross it we lose the player and we strain the backend at the same time.

The quiet cost of waiting

Most of our services still rely on synchronous calls for important work. In a synchronous model the request thread waits for the backend. Waiting is not free. It is capacity held in limbo.

The relationship is simple.

threads_in_use = RPS × request_time_seconds

A one second call at scale is a nuisance. A ten or twenty second call becomes the dominant cost. With no timeout the pressure has no ceiling and the system behaves accordingly.

Surges turn delay into load

Take a surge of five thousand requests per second. The thread counts below show what would be required to keep up.

Timeout or latencyThreads at 50 RPSThreads at 200 RPSThreads at 5 000 RPS
0.1 s520500
1 s502005 000
2 s10040010 000
4 s20080020 000
8 s4001 60040 000
16 s8003 20080 000
30 s1 5006 000150 000
No timeoutunboundedunboundedunbounded

The last row is where the picture tilts. With no timeout the system has no upper limit. Threads accumulate until the runtime slows, the autoscaler reacts or the service folds under its own weight. In practice it goes bang long before the maths finishes.

Why long timeouts rarely help

A long timeout can feel gentle. It gives the backend more room to reply. The trouble is that the player has usually moved on long before it expires. The extra time rarely improves the experience. It does, however, hold the thread for longer and raise the load during busy periods. The return is small. The cost is not.

A note on the five second line

It is worth being honest about this boundary. Five seconds is not a comfort zone. A one second delay is unnoticed. Two seconds is tolerable. Around three seconds people begin to doubt the product. Four seconds erodes confidence. By five seconds the experience has already slipped.

So the five second line is not something to aim for. It is the point where we can no longer pretend the interaction is intact. We use it because it is memorable, because it exposes genuine structural issues, and because anything that approaches it deserves a design conversation, not a longer timeout.

The design question

Once an operation stretches beyond five seconds the timeout stops being the interesting part. The shape of the work becomes the question.

At that point the synchronous model shows its limits. Holding a player while a backend completes a long or unpredictable task is rarely a good trade. The experience becomes fragile, retries pile up and the backend carries threads that could serve fresh requests instead.

A better pattern is to change the choreography. Let the player tell us what they want to do. Confirm that we have received it. Move the long work into an asynchronous worker. When the job completes, update the player. The action still happens, but the wait no longer sits on the critical path of their session.

This looks like a small shift in code. It feels larger in production.

Money, Money, Money

There is a financial side to all of this. In AWS we pay for capacity whether it is doing real work or sitting blocked. A waiting thread ties up an instance even though nothing useful is happening. The platform carries that cost simply because the backend might eventually reply.

To make this concrete, consider a c6a.large in São Paulo at roughly 0.1179 USD per hour, about 86 USD per month. Assume a service can handle around one thousand waiting threads per instance before context switching and memory pressure become uncomfortable. These numbers are illustrative, but the pattern is real.

Timeout or latencyApprox. concurrencyInstances needed
(≈ 1 000 waiting threads each)
Approx. monthly cost
2 s10 000≈ 10≈ 860 USD
4 s20 000≈ 20≈ 1 720 USD
8 s40 000≈ 40≈ 3 440 USD
16 s80 000≈ 80≈ 6 880 USD
30 s150 000≈ 150≈ 12 900 USD
No timeoutunboundedunboundedunbounded

These machines are not busy. They sit holding threads that wait. The successful RPS does not change. The only shift is how long each request occupies a thread. In a fleet with a steady baseline, this pattern can push compute cost toward a tenfold increase during heavy traffic.

A design that finishes quickly, or that hands long work to an asynchronous path, keeps instance counts tied to real demand and stops us paying for capacity whose main purpose is to wait politely.

The contract we inherit

The player never sees the thread count. They see only the spinner that pushed past their patience. The five second threshold is already part of how they interact with us. Our job is to stay comfortably on the near side of it.

The work is not in enforcing the timeout. The work is in designing and scaling services so the timeout rarely enters the conversation at all.