Skip to main content

Command Palette

Search for a command to run...

Building an Idempotent Ledger in Go

Updated
18 min read
Building an Idempotent Ledger in Go
L
Backend developer. Expert in Rust and GoLang.

The failure scenario

At 14:03:42, Alice's app sends a £100 transfer to Bob. The server processes it, Alice's balance drops, Bob's rises. Then the network drops. The response never arrives. Alice's app retries at 14:03:45. The server processes it again. Alice has lost £200. Bob has £200. The ledger balances. No error was logged. Nobody knows.

This is not a theoretical edge case. It is the default behaviour of any transfer endpoint that does not explicitly defend against it. Every mobile network hiccup, every load balancer timeout, every client retry library is a trigger. The question is not whether your users will retry -- they will. The question is whether your system is ready for it.

What makes this particularly dangerous in a financial ledger is that the damage is silent. No exception is raised. No constraint is violated. The numbers add up. The bug only surfaces when Alice checks her statement, disputes the charge, and your support team starts manually reconciling entries.


Idempotency -- the principle

An operation is idempotent if applying it multiple times produces the same result as applying it once.

Formally: f(f(x)) = f(x)

Intuitively: pressing a lift button twice does not make the lift arrive twice.

You already rely on idempotency everywhere:

  • HTTP GET is idempotent. Refreshing a page does not create a new resource.

  • Setting a value is idempotent. balance = 100 twice leaves balance = 100.

  • DELETE by ID is idempotent. Deleting something already gone is fine.

  • HTTP POST is not idempotent by default. Submitting a form twice creates two records.

The key insight: idempotency is not a property you add after the fact. It is a property you design in from the start. The mechanism is an idempotency key -- a unique identifier the client generates and sends with every request. The server uses this key to detect replays and return the original result instead of re-executing.


What a ledger is -- and why it raises the stakes

A ledger is an append-only record of every financial movement in a system. Every transfer produces two entries: a debit on the sender's account and a credit on the receiver's. The sum of all entries must always equal zero.

Three invariants must hold at all times:

  1. Balance integrity -- no account balance ever goes negative

  2. Ledger balance -- the sum of all entries across all accounts equals zero

  3. Transfer atomicity -- a transfer either fully completes or has zero effect

Violating any of these means corrupt data and manual reconciliation. In a payment system it means regulatory exposure. This is why the ledger is the hardest place to get idempotency wrong, and the most instructive place to get it right.


The design

Schema

CREATE TABLE accounts (
    id         UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    owner      TEXT        NOT NULL,
    balance    BIGINT      NOT NULL DEFAULT 0,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    CONSTRAINT balance_non_negative CHECK (balance >= 0)
);

CREATE TABLE ledger_entries (
    id          UUID        PRIMARY KEY DEFAULT gen_random_uuid(),
    account_id  UUID        NOT NULL REFERENCES accounts(id),
    amount      BIGINT      NOT NULL,
    transfer_id UUID        NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE idempotency_keys (
    key        TEXT        PRIMARY KEY,
    response   JSONB       NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Every decision here is deliberate:

  • BIGINT for money -- never FLOAT or DECIMAL for storage. Floats introduce rounding errors. BIGINT stores cents. £1.00 is stored as 100. Arithmetic is exact.

  • CHECK (balance >= 0) -- the database enforces the balance invariant even if the application layer has a bug. This is the last line of defence.

  • idempotency_keys.key TEXT PRIMARY KEY -- the PRIMARY KEY constraint means a duplicate insert fails at the database level. Two concurrent requests with the same key cannot both succeed.

  • response JSONB -- the original response is stored verbatim. A replay returns exactly what the first request returned, not a recomputed result.

  • ledger_entries is append-only -- no UPDATE, no DELETE. Every financial movement is a permanent record.

Failure modes

Failure What the system does
Network drops before response arrives Client retries -> idempotency key found -> original response returned
Server crashes mid-transaction Postgres rolls back -> retry re-runs cleanly from scratch
Two concurrent identical requests PRIMARY KEY constraint -> one insert succeeds, one silently ignored
Transfer exceeds balance CHECK (balance >= 0) fires -> error returned, both balances unchanged
Wrong account ID Foreign key constraint -> error returned before any money moves

The implementation -- five phases

The system is built in five phases. Each phase adds exactly one guarantee and leaves the previous phases unchanged.

Phase 1 -- Types + schema    ->  invalid money is unrepresentable
Phase 2 -- DB layer          ->  balance constraint enforced at the database level
Phase 3 -- Core transfer     ->  balance never goes negative
Phase 4 -- Idempotency       ->  exactly one transfer per key
Phase 5 -- HTTP + tests      ->  proof under concurrent load

Phase 1 -- Types and schema: making invalid states hard to construct

Before a single query is written, the type system does what it can.

type Money struct {
    cents int64
}

func FromCents(cents int64) (Money, error) {
    if cents <= 0 {
        return Money{}, fmt.Errorf("amount must be a positive number of cents")
    }
    return Money{cents: cents}, nil
}

func (m Money) Cents() int64 { return m.cents }

Money is a struct with a private cents field. The only way to construct one is through FromCents, which rejects zero and negative amounts at the boundary. A caller cannot pass a float, the type mismatch is caught at compile time. A caller cannot pass -50, FromCents rejects it before the value reaches the database.

type AccountID  uuid.UUID
type TransferID uuid.UUID

AccountID and TransferID are distinct named types. Passing one where the other is expected is a compile error. They are both backed by uuid.UUID, but the compiler treats them as separate types. This is not as strong as Rust's newtype pattern, a cast AccountID(someTransferID) compiles.

IdempotencyKey validates length at construction:

type IdempotencyKey struct {
    value string
}

func NewIdempotencyKey(s string) (IdempotencyKey, error) {
    if len(s) == 0 || len(s) > 255 {
        return IdempotencyKey{}, fmt.Errorf("idempotency key must be 1 to 255 characters")
    }
    return IdempotencyKey{value: s}, nil
}

An empty or oversized key cannot be constructed. The HTTP handler calls NewIdempotencyKey before any database interaction, the validation happens at the boundary.

The error type:

type LedgerError struct {
    kind    ledgerErrorKind
    message string
}

type ledgerErrorKind int

const (
    kindInsufficientFunds ledgerErrorKind = iota
    kindDuplicateKey
    kindInvalidAmount
    kindInvalidIdempotencyKey
    kindAccountNotFound
    kindTransferNotFound
    kindDatabase
    kindSerialization
)

Callers use Is* helpers to distinguish variants without a type switch:

func IsInsufficientFunds(err error) bool { return isKind(err, kindInsufficientFunds) }
func IsAccountNotFound(err error) bool   { return isKind(err, kindAccountNotFound) }

And HTTPStatus maps each variant to the correct status code in one place:

func HTTPStatus(err error) int {
    var le *LedgerError
    if !errors.As(err, &le) {
        return http.StatusInternalServerError
    }
    switch le.kind {
    case kindInsufficientFunds:
        return http.StatusUnprocessableEntity
    case kindInvalidAmount, kindInvalidIdempotencyKey:
        return http.StatusBadRequest
    case kindAccountNotFound, kindTransferNotFound:
        return http.StatusNotFound
    default:
        return http.StatusInternalServerError
    }
}

No handler decides its own status code. The error carries the information; HTTPStatus reads it.


Phase 2 -- Database layer: queries and the balance invariant

LockAccounts acquires row locks on both accounts inside the current transaction. Accounts are always locked in ascending UUID order:

func LockAccounts(ctx context.Context, tx pgx.Tx, a, b uuid.UUID) error {
    first, second := a, b
    if a.String() > b.String() {
        first, second = b, a
    }

    rows, err := tx.Query(ctx,
        "SELECT id FROM accounts WHERE id = ANY($1) ORDER BY id FOR UPDATE",
        []uuid.UUID{first, second},
    )
    if err != nil {
        return ledgererrors.Database(err)
    }
    rows.Close()
    return nil
}

Both rows are locked in a single query, ordered by UUID. Any concurrent transfer involving the same accounts waits at the lock rather than deadlocking.

ApplyEntry updates the balance and inserts a ledger record atomically within the transaction:

func ApplyEntry(ctx context.Context, tx pgx.Tx, accountID, transferID uuid.UUID, amount int64) error {
    _, err := tx.Exec(ctx,
        "UPDATE accounts SET balance = balance + \(1 WHERE id = \)2",
        amount, accountID,
    )
    if err != nil {
        if isCheckViolation(err) {
            return ledgererrors.InsufficientFunds(accountID)
        }
        return ledgererrors.Database(err)
    }

    _, err = tx.Exec(ctx,
        "INSERT INTO ledger_entries (account_id, amount, transfer_id) VALUES (\(1, \)2, $3)",
        accountID, amount, transferID,
    )
    if err != nil {
        return ledgererrors.Database(err)
    }
    return nil
}

The application does not check the balance before attempting the update. It attempts the update and maps the constraint violation to a domain error. This eliminates a check-then-act race where the balance could change between the check and the update.

Error code 23514 is a Postgres check constraint violation:

func isCheckViolation(err error) bool {
    var pgErr *pgconn.PgError
    return errors.As(err, &pgErr) && pgErr.Code == "23514"
}

Phase 3 -- Core transfer: atomicity and the balance invariant

The transfer does four things inside a single transaction:

1. Lock both accounts in UUID order     (prevents deadlocks)
2. Insert debit entry  (amount negative)  (sender loses money)
3. Insert credit entry (amount positive)  (receiver gains money)
4. Commit                               (all or nothing)
func (s *LedgerService) Transfer(ctx context.Context, req types.TransferRequest) (types.TransferResult, error) {
    tx, err := s.db.BeginTx(ctx)
    if err != nil {
        return types.TransferResult{}, err
    }
    defer tx.Rollback(ctx)

    if err := db.LockAccounts(ctx, tx, uuid.UUID(req.FromAccount), uuid.UUID(req.ToAccount)); err != nil {
        return types.TransferResult{}, err
    }

    if err := db.ApplyEntry(ctx, tx, uuid.UUID(req.FromAccount), uuid.UUID(req.TransferID), -req.Amount.Cents()); err != nil {
        return types.TransferResult{}, err
    }

    if err := db.ApplyEntry(ctx, tx, uuid.UUID(req.ToAccount), uuid.UUID(req.TransferID), req.Amount.Cents()); err != nil {
        return types.TransferResult{}, err
    }

    if err := tx.Commit(ctx); err != nil {
        return types.TransferResult{}, err
    }

    return result, nil
}

defer tx.Rollback(ctx) is the transaction safety net. If any error causes an early return before tx.Commit, the deferred rollback fires. Once Commit succeeds, the subsequent Rollback is a no-op commit, pgx handles this correctly.


Phase 4 -- Idempotency: exactly one transfer per key

The naive implementation has a race condition:

Goroutine A: check key -> not found
Goroutine B: check key -> not found     <- both pass the guard
Goroutine A: execute transfer
Goroutine B: execute transfer          <- duplicate
Goroutine A: store key
Goroutine B: store key

Moving the key check inside the transaction, after locking, closes the race:

func (s *LedgerService) Transfer(ctx context.Context, req types.TransferRequest) (types.TransferResult, error) {
    tx, err := s.db.BeginTx(ctx)
    if err != nil {
        return types.TransferResult{}, err
    }
    defer tx.Rollback(ctx)

    // lock accounts FIRST -- concurrent requests on the same accounts wait here
    if err := db.LockAccounts(ctx, tx, uuid.UUID(req.FromAccount), uuid.UUID(req.ToAccount)); err != nil {
        return types.TransferResult{}, err
    }

    // check idempotency key INSIDE the transaction, AFTER locking
    cached, err := db.GetCachedResult(ctx, tx, req.IdempotencyKey)
    if err != nil {
        return types.TransferResult{}, err
    }
    if cached != nil {
        // replay -- return cached result without re-executing
        if err := tx.Commit(ctx); err != nil {
            return types.TransferResult{}, err
        }
        return *cached, nil
    }

    // new request -- execute and store atomically
    if err := db.ApplyEntry(ctx, tx, uuid.UUID(req.FromAccount), uuid.UUID(req.TransferID), -req.Amount.Cents()); err != nil {
        return types.TransferResult{}, err
    }
    if err := db.ApplyEntry(ctx, tx, uuid.UUID(req.ToAccount), uuid.UUID(req.TransferID), req.Amount.Cents()); err != nil {
        return types.TransferResult{}, err
    }

    result := types.TransferResult{ /* ... */ }

    if err := db.CacheResult(ctx, tx, req.IdempotencyKey, result); err != nil {
        return types.TransferResult{}, err
    }

    if err := tx.Commit(ctx); err != nil {
        return types.TransferResult{}, err
    }

    return result, nil
}

Why does this work under READ COMMITTED (Postgres default)?

When goroutine B acquires the FOR UPDATE lock, goroutine A has already committed. Goroutine B takes a fresh snapshot at the point it acquires the lock, sees the committed idempotency key, and returns the cached result. No duplicate transfer.

CacheResult uses ON CONFLICT DO NOTHING as a secondary safety net:

_, err = tx.Exec(ctx,
    `INSERT INTO idempotency_keys (key, response)
     VALUES (\(1, \)2)
     ON CONFLICT (key) DO NOTHING`,
    key.String(), responseJSON,
)

If two requests somehow both reach the insert which the lock prevents, but defence in depth, only one succeeds. The PRIMARY KEY constraint is the final word.


Phase 5 -- HTTP layer and tests

The HTTP handler decodes the request, validates inputs, and delegates to the service:

func (h *Handler) HandleTransfer(w http.ResponseWriter, r *http.Request) {
    var body struct {
        IdempotencyKey string    `json:"idempotency_key"`
        FromAccount    uuid.UUID `json:"from_account"`
        ToAccount      uuid.UUID `json:"to_account"`
        Amount         int64     `json:"amount"`
    }
    if err := json.NewDecoder(r.Body).Decode(&body); err != nil {
        http.Error(w, "invalid request body", http.StatusBadRequest)
        return
    }

    key, err := types.NewIdempotencyKey(body.IdempotencyKey)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    amount, err := types.FromCents(body.Amount)
    if err != nil {
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }

    result, err := h.svc.Transfer(r.Context(), types.TransferRequest{
        IdempotencyKey: key,
        FromAccount:    types.AccountID(body.FromAccount),
        ToAccount:      types.AccountID(body.ToAccount),
        Amount:         amount,
        TransferID:     types.NewTransferID(),
    })
    if err != nil {
        http.Error(w, err.Error(), ledgererrors.HTTPStatus(err))
        return
    }

    w.Header().Set("Content-Type", "application/json")
    json.NewEncoder(w).Encode(result)
}

All validation happens before Transfer is called. Transfer never receives invalid inputs. HTTPStatus(err) maps domain errors to status codes without any if err == ErrX chains in the handler.


Proving it works -- three adversarial tests

The tests do not test the happy path. They test the invariants.

Test 1 -- Concurrent same key produces exactly one transfer

func TestSameKeyConcurrentProducesOneTransfer(t *testing.T) {
    svc, alice, bob := setup(t)
    ctx := context.Background()

    key := mustKey(t, fmt.Sprintf("test-%s", uuid.New()))
    amount := mustMoney(t, 1000)

    results := make([]result, 10)
    var wg sync.WaitGroup
    for i := range 10 {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            req := types.TransferRequest{
                IdempotencyKey: key,
                FromAccount:    alice,
                ToAccount:      bob,
                Amount:         amount,
                TransferID:     types.NewTransferID(),
            }
            res, err := svc.Transfer(ctx, req)
            results[i] = result{res, err}
        }(i)
    }
    wg.Wait()

    ids := make(map[types.TransferID]struct{})
    for _, r := range results {
        if r.err == nil {
            ids[r.res.TransferID] = struct{}{}
        }
    }
    if len(ids) != 1 {
        t.Errorf("expected exactly 1 unique transfer_id, got %d", len(ids))
    }

    balance, _ := svc.GetBalance(ctx, alice)
    if balance != 9000 {
        t.Errorf("alice balance = %d, want 9000", balance)
    }
}

Ten concurrent goroutines fire the same idempotency key at the same time. The map collapsing to a single element is the proof, every response carries the same TransferID. The balance assertion is the second proof, exactly one debit occurred.

If the idempotency check were outside the transaction, all ten would pass the guard simultaneously and all ten would debit Alice. The balance would be 0. The map would contain ten distinct IDs. The test would fail loudly.

Test 2 -- Insufficient funds leaves balances unchanged

func TestTransferExceedingBalanceIsRejected(t *testing.T) {
    svc, alice, _ := setup(t)
    ctx := context.Background()

    _, err := svc.Transfer(ctx, types.TransferRequest{
        IdempotencyKey: mustKey(t, fmt.Sprintf("test-%s", uuid.New())),
        FromAccount:    alice,
        ToAccount:      types.AccountID(uuid.New()),
        Amount:         mustMoney(t, 99_999), // more than 10 000
        TransferID:     types.NewTransferID(),
    })

    if !ledgererrors.IsInsufficientFunds(err) {
        t.Fatalf("expected InsufficientFunds, got %v", err)
    }

    balance, _ := svc.GetBalance(ctx, alice)
    if balance != 10_000 {
        t.Errorf("alice balance = %d, want 10000 (unchanged)", balance)
    }
}

This test proves atomicity. If the debit entry were written before the constraint check fired, even partially, Alice's balance would change despite the error. It does not.

Test 3 -- Sequential transfers maintain the balance invariant

func TestSequentialTransfersMaintainBalanceInvariant(t *testing.T) {
    svc, alice, bob := setup(t)
    ctx := context.Background()

    for i := range 5 {
        _, err := svc.Transfer(ctx, types.TransferRequest{
            IdempotencyKey: mustKey(t, fmt.Sprintf("seq-%d", i)),
            FromAccount:    alice,
            ToAccount:      bob,
            Amount:         mustMoney(t, 1000),
            TransferID:     types.NewTransferID(),
        })
        if err != nil {
            t.Fatalf("transfer %d: %v", i, err)
        }
    }

    aliceBal, _ := svc.GetBalance(ctx, alice)
    bobBal, _ := svc.GetBalance(ctx, bob)

    if aliceBal != 5_000 {
        t.Errorf("alice = %d, want 5000", aliceBal)
    }
    if bobBal != 5_000 {
        t.Errorf("bob = %d, want 5000", bobBal)
    }
    if aliceBal+bobBal != 10_000 {
        t.Errorf("total balance = %d, want 10000", aliceBal+bobBal)
    }
}

Money is conserved. 10_000 cents entered the system. After five transfers, 10_000 cents remain, distributed differently, but not created or destroyed.

Running the tests

docker compose up -d postgres
DATABASE_URL=postgres://postgres:password@localhost:5432/ledger \
  go test ./tests/... -v
--- PASS: TestTransferExceedingBalanceIsRejected (0.09s)
--- PASS: TestSequentialTransfersMaintainBalanceInvariant (0.11s)
--- PASS: TestSameKeyConcurrentProducesOneTransfer (0.14s)
PASS
ok      github.com/lethuzulu/idempotent-ledger-go/tests

The Go vs Rust comparison

Both languages can implement this system correctly. They reach correctness differently.

Guarantee Go mechanism Rust mechanism
Money type safety Money struct, private cents field Money(i64) newtype, private field
Invalid amount rejected FromCents returns error -- caller can _ it from_cents returns Result -- compiler enforces handling
Wrong ID type AccountID vs TransferID -- distinct named types, cast required AccountId vs TransferId -- distinct types, compile error, no cast
Error handling error return value -- if err != nil can be omitted Result<T, E> with ? -- cannot be silently ignored
Exhaustive errors switch le.kind -- new kindX constant silently falls through default enum LedgerError -- unhandled variant = compile error
Transaction rollback defer tx.Rollback(ctx) -- must be written, easy to forget tx dropped on error -- automatic rollback, nothing to forget
SQL correctness Runtime errors from pgx sqlx::query! -- compile-time verification

What Go gets right here

Go's explicit if err != nil is verbose, but it makes error paths visible at every call site. The flow of a Transfer function in Go reads top-to-bottom -- you can see exactly where each error comes from. Rust's ? operator is terser but requires understanding the early-return semantics.

Go's defer tx.Rollback(ctx) pattern is a convention that must be learned and consistently applied. Once learned, it is mechanical, every function that opens a transaction adds the same line immediately after. The risk is a new team member who skips it.

Go compiles in seconds. Rust takes minutes on a cold build.

What the type system cannot enforce

In Go, a caller can write:

// this compiles -- the cast is explicit but not prohibited
db.ApplyEntry(ctx, tx, uuid.UUID(req.ToAccount), uuid.UUID(req.TransferID), ...)

Swapping ToAccount and FromAccount is a bug the compiler does not catch. In Rust, AccountId and TransferId are genuinely different types no cast syntax exists to confuse them. The test suite catches this class of bug in Go; the compiler catches it in Rust.

Similarly, if err != nil can be omitted. FromCents returns an error. A caller can write money, _ := types.FromCents(amount) and silently discard the validation. Go provides no mechanism to make error handling mandatory.


What I learned building this

The most surprising thing was not the concurrency problem, I expected that to be the hard part. The hard part was a single SQL clause: the placement of the idempotency check relative to the FOR UPDATE lock.

The first implementation checked for the idempotency key before opening the transaction, then stored it after. It looked correct. It passed every sequential test. It failed the concurrent test, ten goroutines with the same key would all pass the guard simultaneously, all execute the transfer, and Alice would lose £1000 instead of £100. The map in the test collected ten distinct transfer IDs. The fix was one change: move the GetCachedResult call to after LockAccounts, inside the transaction.

The lesson: when a constraint must hold across concurrent operations, the database is the right place to enforce it. Application-level checks are subject to race conditions. The lock and the constraint together are not.

defer tx.Rollback(ctx) is the second lesson. The pattern is simple: open transaction, immediately defer rollback, proceed. If Commit is never reached because of any early return the rollback fires. Once Commit succeeds, the deferred rollback is a no-op. Writing it wrong (forgetting the defer, or placing it after other calls) is a class of bug that Rust's ownership model eliminates entirely.


Where to go next


Part of a series on building backend systems in Go and Rust.

Building an Idempotent Ledger in Go