
Gatekeepers: The Traffic Controllers of the Gnome Village
In every gnome village you eventually discover a small, serious-looking gnome standing on a bridge, holding a sign.
You Shall Not Pass!
He is not there to forward messages. He is not there to do work. He is there to limit how much chaos reaches the rest of the village.
That gnome is the Gatekeeper.
Gatekeepers control flow: how fast, how often, and under what conditions messages move into the next part of the system. They protect scarce resources, absorb bursts, and stop cascading failures from spreading downstream.
A Gatekeeper is not a Router.
A Gatekeeper is not a Worker.
A Gatekeeper cares about when something happens, not just where it goes.
This post defines the Gatekeeper archetype, shows the common subtypes, and explains how to use them without accidentally turning them into bottlenecks.
What Is a Gatekeeper?
A Gatekeeper enforces one rule:
“Not everything gets through. And not everything gets through at once.”
Where Workers do work and Routers forward messages, Gatekeepers regulate traffic:
- They serialize or sequence flows.
- They protect slow or fragile subsystems.
- They turn unbounded input into bounded throughput.
- They absorb bursts and provide backpressure.
- They cut off failing downstream systems before everything collapses.
A Gatekeeper owns coordination, not domain state.
What Gatekeepers Are Not
To keep this archetype clean, two clarifications:
Not a Router
Gatekeepers do not:
- decide destinations
- classify messages
- route between processes
A Router shapes direction. A Gatekeeper shapes rate and order.
Not a Resource Owner
Gatekeepers do not:
- store domain data
- own sessions or connections
- maintain business invariants
A Resource Owner holds state. A Gatekeeper holds flow control.
Gatekeeper Subtypes
Gatekeepers appear in three main forms. Every one of them exists to prevent a subsystem from being overloaded or corrupted by timing.
1. Flow Controller
also called Sequencer or Coordinator
Ensures only one message at a time enters a subsystem. Useful when the target must process tasks in strict order or must not have concurrent invocations.
Sometimes “one at a time” is not enough. You also need in-order delivery, even if messages arrive out of order.
A Sequencer Gatekeeper works a bit like TCP:
- each message carries a sequence number,
- the Sequencer keeps track of the next expected number,
- out-of-order messages are buffered,
- only in-order messages are forwarded.
The Gatekeeper does not do the work. It only decides when a message is allowed to move on.
Here is a minimal, slightly contrived example:
%% sequencer.erl
-record(state, {
next_seq = 0, %% next expected sequence number
pending = #{} , %% #{Seq => Msg}
downstream %% pid() of the real worker
}).
%% Public API: send a sequenced message
send(Pid, Seq, Msg) ->
gen_server:cast(Pid, {seq, Seq, Msg}).
%% Callbacks
handle_cast({seq, Seq, Msg}, #state{next_seq = Next, pending = Pending} = S) ->
%% Store message in pending buffer
Pending1 = maps:put(Seq, Msg, Pending),
%% Try to flush any in-order messages
S1 = S#state{pending = Pending1},
{noreply, flush_in_order(S1)}.
flush_in_order(#state{next_seq = Next,
pending = Pending,
downstream = Down} = S) ->
case maps:take(Next, Pending) of
{Msg, Pending1} ->
%% We have the next message, forward it
gen_server:cast(Down, {ordered, Next, Msg}),
flush_in_order(S#state{next_seq = Next + 1,
pending = Pending1});
error ->
%% Missing message; stop here and wait
S
end.
This Sequencer:
- accepts
{Seq, Msg}in any order, - forwards them to
Downin strict sequence order, - buffers anything that arrives early,
- never touches the domain logic of
Msg.
In other words, the Sequencer is a tiny user-space TCP: it uses sequence numbers and a buffer to guarantee ordered delivery, without doing the work itself.
2. Rate Limiter
Controls how often something happens.
Examples:
- “Max 100 requests per second to this PSP”
- “Only 5 ledger writes per second per user”
- “Batch up to N messages before flushing”
Rate limiters create backpressure:
- If upstream is too fast → they wait or reject.
- If downstream is slow → they smooth the load.
Typical pattern:
-record(state, {
tokens,
max_tokens,
refill_ms
}).
%% Start by scheduling the first refill:
init(MaxTokens, RefillMs) ->
erlang:send_after(RefillMs, self(), refill),
{ok, #state{tokens = MaxTokens,
max_tokens = MaxTokens,
refill_ms = RefillMs}}.
%% A simple token bucket
handle_call({request, Msg}, _From, #state{tokens = T} = S) when T > 0 ->
Next = S#state{tokens = T - 1},
forward(Msg),
{reply, ok, Next};
handle_call({request, _Msg}, _From, S) ->
{reply, {error, rate_limited}, S}.
%% Refill on each tick
handle_info(refill, #state{max_tokens = Max, refill_ms = Ms} = S) ->
%% Reset token count
S1 = S#state{tokens = Max},
%% Schedule next refill
erlang:send_after(Ms, self(), refill),
{noreply, S1}.
Gatekeepers don’t try to be “helpful.” They enforce limits.
3. Circuit Breaker
A Circuit Breaker is the gnome who says:
“We tried this four times and it exploded every time. Maybe let’s not.”
A Circuit Breaker stops calls to a subsystem that is failing and only retries after a cooldown.
About the Words “Open” and “Closed”
The term circuit breaker comes from electricity. Unfortunately the metaphor brings its vocabulary with it:
- open → no current flows → no calls allowed
- closed → current flows → calls allowed
My little brain finds this backwards for software. “Open” sounds like things should pass through. “Closed” sounds like nothing should.
To avoid confusing myself, I use passing (calls allowed) and blocked (calls rejected). If you prefer open and closed, that’s fine, just be careful. The metaphor is older and stronger than you think, and it tends to drag its meaning along with it.
Example:
-record(state, {
mode = passing, %% passing | blocked
failures = 0,
threshold = 3, %% failures before blocking
backoff_ms = 1000, %% current backoff
min_backoff_ms = 1000,
max_backoff_ms = 30000,
retry_at_ms = 0, %% monotonic time in ms
target, %% pid() or {Mod, Fun} etc
call_timeout = 5000 %% ms
}).
%% Public API
%% A guarded call through the breaker
call(Pid, Request) ->
gen_server:call(Pid, {call, Request}).
handle_call({call, Request}, _From, S0) ->
Now = now_ms(),
case S0#state.mode of
passing ->
do_checked_call(Request, S0, Now);
blocked ->
case Now >= S0#state.retry_at_ms of
false ->
%% Still blocked, do not hit downstream at all
{reply, {error, breaker_blocked}, S0};
true ->
%% Retry window has opened: allow one trial call
do_checked_call(Request, S0, Now)
end
end.
now_ms() ->
erlang:monotonic_time(millisecond).
do_checked_call(Request, S0, Now) ->
case safe_call(S0#state.target, Request, S0#state.call_timeout) of
{ok, Result} ->
%% Success: reset failures, mode, and backoff
S1 = S0#state{
mode = passing,
failures = 0,
backoff_ms = S0#state.min_backoff_ms,
retry_at_ms = 0
},
{reply, {ok, Result}, S1};
{error, Reason} ->
handle_failure(Reason, S0, Now)
end.
safe_call(Target, Request, Timeout) when is_pid(Target) ->
try gen_server:call(Target, Request, Timeout) of
Reply -> {ok, Reply}
catch
Class:Error ->
{error, {Class, Error}}
end;
safe_call({M, F}, Request, Timeout) ->
%% or whatever abstraction you like
try M:F(Request, Timeout) of
Reply -> {ok, Reply}
catch
Class:Error ->
{error, {Class, Error}}
end.
handle_failure(Reason, S0, Now) ->
Fail1 = S0#state.failures + 1,
case Fail1 >= S0#state.threshold of
false ->
%% Below threshold: still passing, but report failure
S1 = S0#state{failures = Fail1},
{reply, {error, {downstream_failed, Reason}}, S1};
true ->
%% Threshold reached: switch to blocked and back off
Backoff0 = S0#state.backoff_ms,
Max = S0#state.max_backoff_ms,
Backoff1 = min(Backoff0 * 2, Max),
RetryAt = Now + Backoff1,
S1 = S0#state{
mode = blocked,
failures = Fail1,
backoff_ms = Backoff1,
retry_at_ms = RetryAt
},
{reply, {error, {downstream_failed,
Reason, breaker_blocked}}, S1}
end.
We use exponential backoff with a cap: start at min_backoff_ms double each time never exceed max_backoff_ms. On the first success after failures, we reset backoff to min_backoff_ms.
Behaviour in plain words
- While things work: mode = passing, all calls go through failures reset to 0
- When failures pile up (≥ threshold): breaker enters mode = blocked sets retry_at_ms = now + backoff_ms returns {error, breaker_blocked} without touching the target
- After the backoff period: first caller after retry_at_ms will trigger a trial call if it succeeds: reset to passing if it fails: stay blocked, increase backoff, set new retry_at_ms
This is what you want in front of a flaky PSP or slow third-party service: protect the rest of the system, limit damage, and probe occasionally for recovery.
Gatekeepers and Backpressure
Backpressure is the polite way of saying:
“Slow down. Or no.”
Gatekeepers apply backpressure by:
- delaying messages
- rejecting messages
- collapsing bursts
- pushing the problem upstream instead of downstream
Routers do not do this. Workers do not do this. Only Gatekeepers regulate flow.
Gatekeepers and Isolation Boundaries
Gatekeepers often sit at domain boundaries:
- between API → ledger
- between ledger → PSP
- between ingestion → analytics
- between users → sessions
Anywhere unbounded input meets bounded capacity, you need a Gatekeeper.
A Note About Latency
Gatekeepers introduce latency on purpose. This is not a bug.
- Rate limiters smooth spikes
- Flow controllers enforce serialization
- Circuit breakers prevent retries from spiraling
In the BEAM, a few microseconds of routing latency is irrelevant compared to the stability gained.
Summary
Gatekeepers protect the system from excess:
- Flow Controllers serialize work
- Rate Limiters regulate volume
- Circuit Breakers isolate failure
Where Routers shape direction, Gatekeepers shape timing. They are the village’s quiet traffic controllers, ensuring the rest of the gnomes can do their work without panic.