Skip to main content

Command Palette

Search for a command to run...

Building a Token Bucket Rate Limiter in Go with Chi and sync primitives

Updated
9 min read
Building a Token Bucket Rate Limiter in Go with Chi and sync primitives
L
Backend developer. Expert in Rust and GoLang.

Rate limiting is a problem every HTTP service eventually faces. In this post I will walk through building an in-memory token bucket rate limiter in Go, using chi for routing and nothing but the standard library's sync package for concurrency. No Redis, no external state. just goroutines, sync.RWMutex, and sync.Mutex used deliberately at two different levels.

The companion Rust implementation lives at rate-limiter-rs and makes the same design decisions using Axum and std::sync::RwLock/Mutex. The locking structure in both projects is intentionally identical, which makes the comparison meaningful.

The token bucket algorithm

Each client gets a bucket that holds up to Capacity tokens. Every request consumes one token. A background goroutine refills tokens at a fixed rate over time. When the bucket is empty the request is rejected with a 429 and a Retry-After header.

Time Event Tokens remaining Response
t=0s Bucket created 10
t=0s 3 requests arrive 7 200
t=1s Refill fires (+1) 8
t=1s 9 requests arrive 0 (last gets rejected) 200 × 8, 429 × 1
t=2s Refill fires (+1) 1
t=2s Next request 0 200

The critical constraint: the check ("do I have a token?") and the decrement ("consume one") must be atomic. Two concurrent requests for the same client, with one token remaining, must not both succeed.

Configuration

type Config struct {
    Capacity        float64
    RefillRate      float64
    RefillInterval  time.Duration
    CleanupTTL      time.Duration
    CleanupInterval time.Duration
}

func DefaultConfig() Config {
    return Config{
        Capacity:        10,
        RefillRate:      1,
        RefillInterval:  100 * time.Millisecond,
        CleanupTTL:      5 * time.Minute,
        CleanupInterval: 60 * time.Second,
    }
}

Tokens is float64 because refills are time-proportional and produce fractional values. We only consume whole tokens, so a client with 0.8 tokens cannot make a request.

The store: two levels of locking

This is where the interesting concurrency work happens.

type Bucket struct {
    Mu         sync.Mutex
    Tokens     float64
    Capacity   float64
    LastRefill time.Time
}

type Store struct {
    mu      sync.RWMutex
    buckets map[string]*Bucket
}

There are two distinct locks:

  • Store.mu is a sync.RWMutex that protects the map itself. Many goroutines can hold the read lock simultaneously to look up existing clients. Only inserting a new client requires the write lock.

  • Bucket.Mu is a sync.Mutex per bucket that protects the token count and the refill timestamp. This lock is held for the duration of a token check-and-decrement.

This two-level approach means contention is per-client, not global. Two requests for different clients can proceed in parallel even while one is holding its bucket lock.

Double-checked insert

func (s *Store) GetOrInsert(clientID string, capacity float64) *Bucket {
    s.mu.RLock()
    b, ok := s.buckets[clientID]
    s.mu.RUnlock()
    if ok {
        return b
    }

    s.mu.Lock()
    defer s.mu.Unlock()
    if b, ok = s.buckets[clientID]; ok {
        return b
    }
    b = &Bucket{
        Tokens:     capacity,
        Capacity:   capacity,
        LastRefill: time.Now(),
    }
    s.buckets[clientID] = b
    return b
}

The first RLock is the optimistic fast path: most clients will already have a bucket and we can return immediately without ever taking a write lock. When a new client is seen, we upgrade to a write lock and check again, the second check is necessary because another goroutine may have inserted a bucket between the two locks.

Refill iteration

func (s *Store) Each(f func(*Bucket)) {
    s.mu.RLock()
    defer s.mu.RUnlock()
    for _, b := range s.buckets {
        f(b)
    }
}

Each holds a read lock on the map while iterating. The function f receives a pointer to each bucket and is expected to lock bucket.Mu itself before touching tokens. This lets refills for different clients happen concurrently if the caller chooses, though in practice the refill loop is single-goroutine, so it locks and unlocks each bucket sequentially.

The rate limiter

func (rl *RateLimiter) Allow(clientID string) error {
    bucket := rl.store.GetOrInsert(clientID, rl.config.Capacity)

    bucket.Mu.Lock()
    defer bucket.Mu.Unlock()

    if bucket.Tokens >= 1 {
        bucket.Tokens--
        return nil
    }

    retryAfter := int(math.Ceil(1 / rl.config.RefillRate))
    return &limiterrors.RateLimitExceeded{RetryAfterSecs: retryAfter}
}

GetOrInsert returns a pointer to the bucket. We then lock it, check the token count, decrement if possible, and return. The lock is released by the deferred Unlock. If two goroutines arrive simultaneously for the same client with one token remaining, only one will find Tokens >= 1 while holding the lock. The other will see 0 and get a 429.

Time-proportional refill

func (rl *RateLimiter) RefillAll() {
    rl.store.Each(func(b *store.Bucket) {
        b.Mu.Lock()
        defer b.Mu.Unlock()

        elapsed := time.Since(b.LastRefill).Seconds()
        tokensToAdd := elapsed * rl.config.RefillRate
        b.Tokens = math.Min(b.Tokens+tokensToAdd, b.Capacity)
        b.LastRefill = time.Now()
    })
}

Rather than adding a fixed number of tokens on every tick, we compute how many seconds have passed since the last refill and multiply by the refill rate. This means the refill is correct regardless of how often the task fires or whether a tick was delayed. A bucket untouched for 3 seconds gets 3 * RefillRate tokens added in one pass.

Background goroutines

func StartRefillTask(rl *limiter.RateLimiter, stop <-chan struct{}) {
    interval := rl.Config().RefillInterval
    go func() {
        ticker := time.NewTicker(interval)
        defer ticker.Stop()
        for {
            select {
            case <-stop:
                return
            case <-ticker.C:
                rl.RefillAll()
            }
        }
    }()
}

func StartCleanupTask(rl *limiter.RateLimiter, stop <-chan struct{}) {
    interval := rl.Config().CleanupInterval
    go func() {
        ticker := time.NewTicker(interval)
        defer ticker.Stop()
        for {
            select {
            case <-stop:
                return
            case <-ticker.C:
                before := rl.Store().Len()
                rl.Cleanup()
                after := rl.Store().Len()
                if removed := before - after; removed > 0 {
                    slog.Info("cleaned up inactive buckets",
                        "removed", removed,
                        "remaining", after,
                    )
                }
            }
        }
    }()
}

Both tasks receive a stop <-chan struct{} channel. When main receives SIGINT or SIGTERM it calls close(stop), which unblocks both select cases and lets the goroutines return cleanly before the server shuts down.

The cleanup task prevents unbounded memory growth. Without it every unique client IP stays in the map forever. The RemoveInactive method holds the write lock on the map while going over it and deleting stale entries.

Chi middleware

func RateLimit(rl *limiter.RateLimiter) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            clientID := extractClientID(r)

            if err := rl.Allow(clientID); err != nil {
                if rle, ok := err.(*limiterrors.RateLimitExceeded); ok {
                    rle.Write(w)
                    return
                }
                ie := &limiterrors.InternalError{Message: err.Error()}
                ie.Write(w)
                return
            }

            next.ServeHTTP(w, r)
        })
    }
}

func extractClientID(r *http.Request) string {
    if key := r.Header.Get("X-API-Key"); key != "" {
        return key
    }
    if forwarded := r.Header.Get("X-Forwarded-For"); forwarded != "" {
        if ip, _, found := strings.Cut(forwarded, ","); found {
            return strings.TrimSpace(ip)
        }
        return strings.TrimSpace(forwarded)
    }
    addr := r.RemoteAddr
    if i := strings.LastIndex(addr, ":"); i != -1 {
        return addr[:i]
    }
    return addr
}

The middleware is a closure that captures *RateLimiter. It type-asserts the error to *RateLimitExceeded to write the 429 with the correct Retry-After header. Any other error falls through to the internal error handler.

Client ID resolution prefers an explicit X-API-Key, falls back to the first IP in X-Forwarded-For, and finally strips the port from RemoteAddr.

Graceful shutdown

quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)

go func() {
    slog.Info("listening", "addr", addr)
    if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
        slog.Error("server error", "err", err)
        os.Exit(1)
    }
}()

<-quit
slog.Info("shutdown signal received -- draining requests")
close(stop)

ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
srv.Shutdown(ctx)

close(stop) signals both background goroutines to exit. srv.Shutdown(ctx) stops accepting new connections and waits for in-flight requests to complete, with a 10-second timeout.

Testing

func TestConcurrentRequestsSameClientOnlyCapacitySucceed(t *testing.T) {
    rl := makeLimiter()
    total := 20
    capacity := int(testConfig().Capacity)

    var mu sync.Mutex
    var ok, fail int
    var wg sync.WaitGroup

    for range total {
        wg.Add(1)
        go func() {
            defer wg.Done()
            err := rl.Allow("192.168.1.3")
            mu.Lock()
            if err == nil {
                ok++
            } else {
                fail++
            }
            mu.Unlock()
        }()
    }
    wg.Wait()

    if ok != capacity {
        t.Errorf("expected %d allowed, got %d", capacity, ok)
    }
    if fail != total-capacity {
        t.Errorf("expected %d rejected, got %d", total-capacity, fail)
    }
}

Twenty goroutines fire simultaneously for the same client. Exactly Capacity (5 in the test config) must succeed. This test directly validates that bucket.Mu.Lock() in Allow prevents any two goroutines from both passing the Tokens >= 1 check when only one token remains.

The full test suite:

Test Scenario
TestClientExhaustsBucketAndIsRejected 5 requests succeed, 6th is 429
TestTokensRefillAndRequestsSucceedAgain Sleep 1.1s, call RefillAll, next request succeeds
TestConcurrentRequestsSameClientOnlyCapacitySucceed 20 goroutines, exactly capacity succeed
TestDifferentClientsHaveIndependentBuckets Client A exhausted does not affect client B
TestInactiveBucketsAreCleanedUp TTL=0 cleanup removes all buckets

Go vs Rust: the same locking structure, different enforcement

Both implementations use the same two-level locking pattern:

Go Rust
Map guard sync.RWMutex + map std::sync::RwLock<HashMap>
Per-bucket guard sync.Mutex per bucket std::sync::Mutex<Bucket>

The algorithm is identical. The difference is how the language enforces the contract.

In Go, bucket.Mu.Lock() and defer bucket.Mu.Unlock() are explicit calls. defer makes the release automatic on function exit, but nothing in the type system prevents you from accessing bucket.Tokens without holding the lock. The correctness is a discipline enforced by the programmer, not the compiler.

In Rust, mutex.lock().unwrap() returns a MutexGuard. You cannot access the data inside the Mutex without holding the guard, because the data is only accessible through the guard's Deref impl. The guard releases the lock when it goes out of scope --- enforced by the borrow checker. You cannot forget to release it and you cannot access the protected data without it.

Running it

go run ./cmd/server
# GET http://localhost:3000/ping
for i in $(seq 1 11); do
  curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/ping
done
go test ./tests/... -v

Project structure

cmd/server/main.go
internal/
  types/types.go
  store/store.go
  limiter/limiter.go
  tasks/tasks.go
  middleware/middleware.go
  errors/errors.go
tests/ratelimit_test.go

Source code: github.com/lethuzulu/rate-limiter-go

The Rust version using Axum and DashMap: github.com/lethuzulu/rate-limiter-rs