Gatekeepers: The Traffic Controllers of the Gnome Village

Processes that protect the system from the outside world, and from itself

Posted: 2025-12-08

Categories: development , Erlang , Processes , Archetypes , Gatekeepers

The Gatekeeper Gnome

Gatekeepers: The Traffic Controllers of the Gnome Village

In every gnome village you eventually discover a small, serious-looking gnome standing on a bridge, holding a sign.

You Shall Not Pass!

He is not there to forward messages. He is not there to do work. He is there to limit how much chaos reaches the rest of the village.

That gnome is the Gatekeeper.

Gatekeepers control flow: how fast, how often, and under what conditions messages move into the next part of the system. They protect scarce resources, absorb bursts, and stop cascading failures from spreading downstream.

A Gatekeeper is not a Router.

A Gatekeeper is not a Worker.

A Gatekeeper cares about when something happens, not just where it goes.

This post defines the Gatekeeper archetype, shows the common subtypes, and explains how to use them without accidentally turning them into bottlenecks.

What Is a Gatekeeper?

A Gatekeeper enforces one rule:

“Not everything gets through. And not everything gets through at once.”

Where Workers do work and Routers forward messages, Gatekeepers regulate traffic:

They serialize or sequence flows.
They protect slow or fragile subsystems.
They turn unbounded input into bounded throughput.
They absorb bursts and provide backpressure.
They cut off failing downstream systems before everything collapses.

A Gatekeeper owns coordination, not domain state.

What Gatekeepers Are Not

To keep this archetype clean, two clarifications:

Not a Router

Gatekeepers do not:

decide destinations
classify messages
route between processes

A Router shapes direction. A Gatekeeper shapes rate and order.

Not a Resource Owner

Gatekeepers do not:

store domain data
own sessions or connections
maintain business invariants

A Resource Owner holds state. A Gatekeeper holds flow control.

Gatekeeper Subtypes

Gatekeepers appear in three main forms. Every one of them exists to prevent a subsystem from being overloaded or corrupted by timing.

1. Flow Controller

also called Sequencer or Coordinator

Ensures only one message at a time enters a subsystem. Useful when the target must process tasks in strict order or must not have concurrent invocations.

Sometimes “one at a time” is not enough. You also need in-order delivery, even if messages arrive out of order.

A Sequencer Gatekeeper works a bit like TCP:

each message carries a sequence number,
the Sequencer keeps track of the next expected number,
out-of-order messages are buffered,
only in-order messages are forwarded.

The Gatekeeper does not do the work. It only decides when a message is allowed to move on.

Here is a minimal, slightly contrived example:

%% sequencer.erl
-record(state, {
    next_seq    = 0,          %% next expected sequence number
    pending     = #{} ,       %% #{Seq => Msg}
    downstream                 %% pid() of the real worker
}).

%% Public API: send a sequenced message
send(Pid, Seq, Msg) ->
    gen_server:cast(Pid, {seq, Seq, Msg}).

%% Callbacks

handle_cast({seq, Seq, Msg}, #state{next_seq = Next, pending = Pending} = S) ->
    %% Store message in pending buffer
    Pending1 = maps:put(Seq, Msg, Pending),
    %% Try to flush any in-order messages
    S1 = S#state{pending = Pending1},
    {noreply, flush_in_order(S1)}.

flush_in_order(#state{next_seq = Next,
                      pending  = Pending,
                      downstream = Down} = S) ->
    case maps:take(Next, Pending) of
        {Msg, Pending1} ->
            %% We have the next message, forward it
            gen_server:cast(Down, {ordered, Next, Msg}),
            flush_in_order(S#state{next_seq = Next + 1,
                                   pending  = Pending1});
        error ->
            %% Missing message; stop here and wait
            S
    end.

This Sequencer:

accepts {Seq, Msg} in any order,
forwards them to Down in strict sequence order,
buffers anything that arrives early,
never touches the domain logic of Msg.

In other words, the Sequencer is a tiny user-space TCP: it uses sequence numbers and a buffer to guarantee ordered delivery, without doing the work itself.

2. Rate Limiter

Controls how often something happens.

Examples:

“Max 100 requests per second to this PSP”
“Only 5 ledger writes per second per user”
“Batch up to N messages before flushing”

Rate limiters create backpressure:

If upstream is too fast → they wait or reject.
If downstream is slow → they smooth the load.

Typical pattern:

-record(state, {
    tokens,
    max_tokens,
    refill_ms
}).

%% Start by scheduling the first refill:
init(MaxTokens, RefillMs) ->
    erlang:send_after(RefillMs, self(), refill),
    {ok, #state{tokens = MaxTokens,
                max_tokens = MaxTokens,
                refill_ms = RefillMs}}.


%% A simple token bucket
handle_call({request, Msg}, _From, #state{tokens = T} = S) when T > 0 ->
    Next = S#state{tokens = T - 1},
    forward(Msg),
    {reply, ok, Next};

handle_call({request, _Msg}, _From, S) ->
    {reply, {error, rate_limited}, S}.

%% Refill on each tick
handle_info(refill, #state{max_tokens = Max, refill_ms = Ms} = S) ->
    %% Reset token count
    S1 = S#state{tokens = Max},

    %% Schedule next refill
    erlang:send_after(Ms, self(), refill),

    {noreply, S1}.

Gatekeepers don’t try to be “helpful.” They enforce limits.

3. Circuit Breaker

A Circuit Breaker is the gnome who says:

“We tried this four times and it exploded every time. Maybe let’s not.”

A Circuit Breaker stops calls to a subsystem that is failing and only retries after a cooldown.

About the Words “Open” and “Closed”

The term circuit breaker comes from electricity. Unfortunately the metaphor brings its vocabulary with it:

open → no current flows → no calls allowed

closed → current flows → calls allowed

My little brain finds this backwards for software. “Open” sounds like things should pass through. “Closed” sounds like nothing should.

To avoid confusing myself, I use passing (calls allowed) and blocked (calls rejected). If you prefer open and closed, that’s fine, just be careful. The metaphor is older and stronger than you think, and it tends to drag its meaning along with it.

Example:

-record(state, {
    mode           = passing,   %% passing | blocked
    failures       = 0,
    threshold      = 3,         %% failures before blocking
    backoff_ms     = 1000,      %% current backoff
    min_backoff_ms = 1000,
    max_backoff_ms = 30000,
    retry_at_ms    = 0,         %% monotonic time in ms
    target,                     %% pid() or {Mod, Fun} etc
    call_timeout   = 5000       %% ms
}).

%% Public API
%% A guarded call through the breaker
call(Pid, Request) ->
    gen_server:call(Pid, {call, Request}).

handle_call({call, Request}, _From, S0) ->
    Now = now_ms(),
    case S0#state.mode of
        passing ->
            do_checked_call(Request, S0, Now);

        blocked ->
            case Now >= S0#state.retry_at_ms of
                false ->
                    %% Still blocked, do not hit downstream at all
                    {reply, {error, breaker_blocked}, S0};
                true ->
                    %% Retry window has opened: allow one trial call
                    do_checked_call(Request, S0, Now)
            end
    end.

now_ms() ->
    erlang:monotonic_time(millisecond).

do_checked_call(Request, S0, Now) ->
    case safe_call(S0#state.target, Request, S0#state.call_timeout) of
        {ok, Result} ->
            %% Success: reset failures, mode, and backoff
            S1 = S0#state{
                mode        = passing,
                failures    = 0,
                backoff_ms  = S0#state.min_backoff_ms,
                retry_at_ms = 0
            },
            {reply, {ok, Result}, S1};

        {error, Reason} ->
            handle_failure(Reason, S0, Now)
    end.

safe_call(Target, Request, Timeout) when is_pid(Target) ->
    try gen_server:call(Target, Request, Timeout) of
        Reply -> {ok, Reply}
    catch
        Class:Error ->
            {error, {Class, Error}}
    end;
safe_call({M, F}, Request, Timeout) ->
    %% or whatever abstraction you like
    try M:F(Request, Timeout) of
        Reply -> {ok, Reply}
    catch
        Class:Error ->
            {error, {Class, Error}}
    end.

handle_failure(Reason, S0, Now) ->
    Fail1 = S0#state.failures + 1,
    case Fail1 >= S0#state.threshold of
        false ->
            %% Below threshold: still passing, but report failure
            S1 = S0#state{failures = Fail1},
            {reply, {error, {downstream_failed, Reason}}, S1};

        true ->
            %% Threshold reached: switch to blocked and back off
            Backoff0 = S0#state.backoff_ms,
            Max      = S0#state.max_backoff_ms,
            Backoff1 = min(Backoff0 * 2, Max),
            RetryAt  = Now + Backoff1,
            S1 = S0#state{
                mode        = blocked,
                failures    = Fail1,
                backoff_ms  = Backoff1,
                retry_at_ms = RetryAt
            },
            {reply, {error, {downstream_failed,
                             Reason, breaker_blocked}}, S1}
    end.

We use exponential backoff with a cap: start at min_backoff_ms double each time never exceed max_backoff_ms. On the first success after failures, we reset backoff to min_backoff_ms.

Behaviour in plain words

While things work: mode = passing, all calls go through failures reset to 0
When failures pile up (≥ threshold): breaker enters mode = blocked sets retry_at_ms = now + backoff_ms returns {error, breaker_blocked} without touching the target
After the backoff period: first caller after retry_at_ms will trigger a trial call if it succeeds: reset to passing if it fails: stay blocked, increase backoff, set new retry_at_ms

This is what you want in front of a flaky PSP or slow third-party service: protect the rest of the system, limit damage, and probe occasionally for recovery.

Gatekeepers and Backpressure

Backpressure is the polite way of saying:

“Slow down. Or no.”

Gatekeepers apply backpressure by:

delaying messages
rejecting messages
collapsing bursts
pushing the problem upstream instead of downstream

Routers do not do this. Workers do not do this. Only Gatekeepers regulate flow.

Gatekeepers and Isolation Boundaries

Gatekeepers often sit at domain boundaries:

between API → ledger
between ledger → PSP
between ingestion → analytics
between users → sessions

Anywhere unbounded input meets bounded capacity, you need a Gatekeeper.

A Note About Latency

Gatekeepers introduce latency on purpose. This is not a bug.

Rate limiters smooth spikes
Flow controllers enforce serialization
Circuit breakers prevent retries from spiraling

In the BEAM, a few microseconds of routing latency is irrelevant compared to the stability gained.

Summary

Gatekeepers protect the system from excess:

Flow Controllers serialize work
Rate Limiters regulate volume
Circuit Breakers isolate failure

Where Routers shape direction, Gatekeepers shape timing. They are the village’s quiet traffic controllers, ensuring the rest of the gnomes can do their work without panic.

Hacker's Handbook

Gatekeepers: The Traffic Controllers of the Gnome Village

What Is a Gatekeeper?

What Gatekeepers Are Not

Not a Router

Not a Resource Owner

Gatekeeper Subtypes

1. Flow Controller

2. Rate Limiter

3. Circuit Breaker

Gatekeepers and Backpressure

Gatekeepers and Isolation Boundaries

A Note About Latency

Summary

Schedule Your Free Strategic Consultation

Book Your Consultation