Observers: The Watchful Gnomes of the Village

Some gnomes in the village don’t build bridges, bake bread, or carry mail. They stand on a hill, hold a lantern, and keep an eye on everyone else. When something goes wrong, they react. They don’t fix the problem themselves—they simply make sure the right thing happens next.

These are Observers. Their job is to keep the system healthy.

Observers come in two useful forms: Sentinels and Supervisors.

Sentinels

A Sentinel watches a process or a condition and acts when reality drifts away from expectations. Timeouts, deadlines, stalled workers, mailbox growth, missing responses—these are all things a Sentinel can detect.

Minimal example

handle_info(check, #state{pid = Pid, threshold = N} = S) ->
    case process_info(Pid, message_queue_len) of
        {message_queue_len, Len} when Len > N ->
            S#state.alert_target ! {overload, Pid, Len};
        _ ->
            ok
    end,
    erlang:send_after(1000, self(), check),
    {noreply, S}.

A small loop, a single condition, and an alert. That’s a Sentinel.

Supervisors

Supervisors are the managers of the gnome village. They don’t write code, carry letters, or fix bridges. They make sure the gnomes who do those things show up for work, behave themselves, and get replaced when they fall into the river.

A Supervisor enforces the structure of the system. It starts processes, restarts them when they crash, and escalates when failures repeat. It is the backbone of OTP fault-tolerance.

Supervisors never do real work. They only make sure the workers are alive, in the right teams, and following the rules. If a Supervisor starts doing work, it has stopped being a manager and started being a liability.

Conceptual example

init([]) ->
    {ok, {
        {one_for_one, 5, 10},
        [
            {worker1, {worker1, start_link, []}, permanent, 5000, worker, []}
        ]
    }}.

One rule: workers may fail. Supervisors may not.

A supervisor can still terminate when its children fail too quickly and the restart rules require escalation. That is intentional. It pushes failure upward until it reaches a level that can handle it.

But a supervisor should never fail because of its own code. It should have no business logic, no parsing, no calculations and nothing that can crash by accident. Its job is to start children, restart them and apply the restart strategy.

This separation is why large BEAM systems stay stable even when individual processes fail repeatedly. Workers fail often. Supervisors fail only when the failure belongs at their level.

They keep the rest of the gnomes working without taking the whole village down. When things really go wrong, they escalate the alarm cleanly and predictably up the tree.

The Observer Mindset

Observers have a simple philosophy:

Let processes run freely.
Let them crash when they must.
Notice when they do.
Clean up, restart, or alert.

Observers are the safety rails of the system.

Summary

Observers keep the system alive:

Sentinels watch and raise alerts.
Supervisors restart and maintain structure.

Hacker's Handbook